qid & accept id: (100210, 100345) query: What is the standard way to add N seconds to datetime.time in Python? soup:

You can use full datetime variables with timedelta, and by providing a dummy date then using time to just get the time value.

\n

For example:

\n
import datetime\na = datetime.datetime(100,1,1,11,34,59)\nb = a + datetime.timedelta(0,3) # days, seconds, then other fields.\nprint a.time()\nprint b.time()\n
\n

results in the two values, three seconds apart:

\n
11:34:59\n11:35:02\n
\n

You could also opt for the more readable

\n
b = a + datetime.timedelta(seconds=3)\n
\n

if you're so inclined.

\n
\n

If you're after a function that can do this, you can look into using addSecs below:

\n
import datetime\n\ndef addSecs(tm, secs):\n    fulldate = datetime.datetime(100, 1, 1, tm.hour, tm.minute, tm.second)\n    fulldate = fulldate + datetime.timedelta(seconds=secs)\n    return fulldate.time()\n\na = datetime.datetime.now().time()\nb = addSecs(a, 300)\nprint a\nprint b\n
\n

This outputs:

\n
 09:11:55.775695\n 09:16:55\n
\n soup wrap:

You can use full datetime variables with timedelta, and by providing a dummy date then using time to just get the time value.

For example:

import datetime
a = datetime.datetime(100,1,1,11,34,59)
b = a + datetime.timedelta(0,3) # days, seconds, then other fields.
print a.time()
print b.time()

results in the two values, three seconds apart:

11:34:59
11:35:02

You could also opt for the more readable

b = a + datetime.timedelta(seconds=3)

if you're so inclined.


If you're after a function that can do this, you can look into using addSecs below:

import datetime

def addSecs(tm, secs):
    fulldate = datetime.datetime(100, 1, 1, tm.hour, tm.minute, tm.second)
    fulldate = fulldate + datetime.timedelta(seconds=secs)
    return fulldate.time()

a = datetime.datetime.now().time()
b = addSecs(a, 300)
print a
print b

This outputs:

 09:11:55.775695
 09:16:55
qid & accept id: (121025, 121030) query: How do I get the modified date/time of a file in Python? soup:
os.path.getmtime(filepath)\n
\n

or

\n
os.stat(filepath).st_mtime\n
\n soup wrap:
os.path.getmtime(filepath)

or

os.stat(filepath).st_mtime
qid & accept id: (168409, 539024) query: How do you get a directory listing sorted by creation date in python? soup:

Here's a more verbose version of @Greg Hewgill's answer. It is the most conforming to the question requirements. It makes a distinction between creation and modification dates (at least on Windows).

\n
#!/usr/bin/env python\nfrom stat import S_ISREG, ST_CTIME, ST_MODE\nimport os, sys, time\n\n# path to the directory (relative or absolute)\ndirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'\n\n# get all entries in the directory w/ stats\nentries = (os.path.join(dirpath, fn) for fn in os.listdir(dirpath))\nentries = ((os.stat(path), path) for path in entries)\n\n# leave only regular files, insert creation date\nentries = ((stat[ST_CTIME], path)\n           for stat, path in entries if S_ISREG(stat[ST_MODE]))\n#NOTE: on Windows `ST_CTIME` is a creation date \n#  but on Unix it could be something else\n#NOTE: use `ST_MTIME` to sort by a modification date\n\nfor cdate, path in sorted(entries):\n    print time.ctime(cdate), os.path.basename(path)\n
\n

Example:

\n
$ python stat_creation_date.py\nThu Feb 11 13:31:07 2009 stat_creation_date.py\n
\n soup wrap:

Here's a more verbose version of @Greg Hewgill's answer. It is the most conforming to the question requirements. It makes a distinction between creation and modification dates (at least on Windows).

#!/usr/bin/env python
from stat import S_ISREG, ST_CTIME, ST_MODE
import os, sys, time

# path to the directory (relative or absolute)
dirpath = sys.argv[1] if len(sys.argv) == 2 else r'.'

# get all entries in the directory w/ stats
entries = (os.path.join(dirpath, fn) for fn in os.listdir(dirpath))
entries = ((os.stat(path), path) for path in entries)

# leave only regular files, insert creation date
entries = ((stat[ST_CTIME], path)
           for stat, path in entries if S_ISREG(stat[ST_MODE]))
#NOTE: on Windows `ST_CTIME` is a creation date 
#  but on Unix it could be something else
#NOTE: use `ST_MTIME` to sort by a modification date

for cdate, path in sorted(entries):
    print time.ctime(cdate), os.path.basename(path)

Example:

$ python stat_creation_date.py
Thu Feb 11 13:31:07 2009 stat_creation_date.py
qid & accept id: (187273, 187536) query: Base-2 (Binary) Representation Using Python soup:

For best efficiency, you generally want to process more than a single bit at a time.\nYou can use a simple method to get a fixed width binary representation. eg.

\n
def _bin(x, width):\n    return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))\n
\n

_bin(x, 8) will now give a zero padded representation of x's lower 8 bits. This can be used to build a lookup table, allowing your converter to process 8 bits at a time (or more if you want to devote the memory to it).

\n
_conv_table = [_bin(x,8) for x in range(256)]\n
\n

Then you can use this in your real function, stripping off leading zeroes when returning it. I've also added handling for signed numbers, as without it you will get an infinite loop (Negative integers conceptually have an infinite number of set sign bits.)

\n
def bin(x):\n    if x == 0: \n        return '0' #Special case: Don't strip leading zero if no other digits\n    elif x < 0:\n        sign='-'\n        x*=-1\n    else:\n        sign = ''\n    l=[]\n    while x:\n        l.append(_conv_table[x & 0xff])\n        x >>= 8\n    return sign + ''.join(reversed(l)).lstrip("0")\n
\n

[Edit] Changed code to handle signed integers.
\n[Edit2] Here are some timing figures of the various solutions. bin is the function above, constantin_bin is from Constantin's answer and num_bin is the original version. Out of curiosity, I also tried a 16 bit lookup table variant of the above (bin16 below), and tried out Python3's builtin bin() function. All timings were for 100000 runs using an 01010101 bit pattern.

\n
Num Bits:              8       16       32       64      128      256\n---------------------------------------------------------------------\nbin                0.544    0.586    0.744    1.942    1.854    3.357 \nbin16              0.542    0.494    0.592    0.773    1.150    1.886\nconstantin_bin     2.238    3.803    7.794   17.869   34.636   94.799\nnum_bin            3.712    5.693   12.086   32.566   67.523  128.565\nPython3's bin      0.079    0.045    0.062    0.069    0.212    0.201 \n
\n

As you can see, when processing long values using large chunks really pays off, but nothing beats the low-level C code of python3's builtin (which bizarrely seems consistently faster at 256 bits than 128!). Using a 16 bit lookup table improves things, but probably isn't worth it unless you really need it, as it uses up a large chunk of memory, and can introduce a small but noticalbe startup delay to precompute the table.

\n soup wrap:

For best efficiency, you generally want to process more than a single bit at a time. You can use a simple method to get a fixed width binary representation. eg.

def _bin(x, width):
    return ''.join(str((x>>i)&1) for i in xrange(width-1,-1,-1))

_bin(x, 8) will now give a zero padded representation of x's lower 8 bits. This can be used to build a lookup table, allowing your converter to process 8 bits at a time (or more if you want to devote the memory to it).

_conv_table = [_bin(x,8) for x in range(256)]

Then you can use this in your real function, stripping off leading zeroes when returning it. I've also added handling for signed numbers, as without it you will get an infinite loop (Negative integers conceptually have an infinite number of set sign bits.)

def bin(x):
    if x == 0: 
        return '0' #Special case: Don't strip leading zero if no other digits
    elif x < 0:
        sign='-'
        x*=-1
    else:
        sign = ''
    l=[]
    while x:
        l.append(_conv_table[x & 0xff])
        x >>= 8
    return sign + ''.join(reversed(l)).lstrip("0")

[Edit] Changed code to handle signed integers.
[Edit2] Here are some timing figures of the various solutions. bin is the function above, constantin_bin is from Constantin's answer and num_bin is the original version. Out of curiosity, I also tried a 16 bit lookup table variant of the above (bin16 below), and tried out Python3's builtin bin() function. All timings were for 100000 runs using an 01010101 bit pattern.

Num Bits:              8       16       32       64      128      256
---------------------------------------------------------------------
bin                0.544    0.586    0.744    1.942    1.854    3.357 
bin16              0.542    0.494    0.592    0.773    1.150    1.886
constantin_bin     2.238    3.803    7.794   17.869   34.636   94.799
num_bin            3.712    5.693   12.086   32.566   67.523  128.565
Python3's bin      0.079    0.045    0.062    0.069    0.212    0.201 

As you can see, when processing long values using large chunks really pays off, but nothing beats the low-level C code of python3's builtin (which bizarrely seems consistently faster at 256 bits than 128!). Using a 16 bit lookup table improves things, but probably isn't worth it unless you really need it, as it uses up a large chunk of memory, and can introduce a small but noticalbe startup delay to precompute the table.

qid & accept id: (227461, 230416) query: Open file, read it, process, and write back - shortest method in Python soup:

Actually an easier way using fileinput is to use the inplace parameter:

\n
import fileinput\nfor line in fileinput.input (filenameToProcess, inplace=1):\n    process (line)\n
\n

If you use the inplace parameter it will redirect stdout to your file, so that if you do a print it will write back to your file.

\n

This example adds line numbers to your file:

\n
import fileinput\n\nfor line in fileinput.input ("b.txt",inplace=1):\n    print "%d: %s" % (fileinput.lineno(),line),\n
\n soup wrap:

Actually an easier way using fileinput is to use the inplace parameter:

import fileinput
for line in fileinput.input (filenameToProcess, inplace=1):
    process (line)

If you use the inplace parameter it will redirect stdout to your file, so that if you do a print it will write back to your file.

This example adds line numbers to your file:

import fileinput

for line in fileinput.input ("b.txt",inplace=1):
    print "%d: %s" % (fileinput.lineno(),line),
qid & accept id: (296055, 296334) query: In IPython how do I create aliases for %magics? soup:

Update: The first response( below) does not accept parameters. So put this snippet at the end of the ipy_user_conf.py file ( it is in your home directory ).

\n
def ed_xed(self,arg):\n    ip = self.api\n    return ip.magic.im_class.magic_edit(ip.IP," -x %s "%arg)\n\nip.expose_magic('xed',ed_xed)\n
\n

Before update:\nDoes it has to be %magic?\nYou can use the macro and store magic to reproduce this behavior without the magic %.

\n
In [5]: %edit -x\nIn [6]: macro xed 5\nIn [7]: store xed\nIn [8]: xed\n
\n

for magic alias from the documentation ( %magic? ):

\n
\n

You can also define your own aliased\n names for magic functions. In your\n ipythonrc file, placing a line like:

\n

execute IPYTHON.magic_pf =\n IPYTHON.magic_profile

\n

will define %pf as a new name for\n %profile.

\n
\n

But I don't know how too add the parameter.

\n soup wrap:

Update: The first response( below) does not accept parameters. So put this snippet at the end of the ipy_user_conf.py file ( it is in your home directory ).

def ed_xed(self,arg):
    ip = self.api
    return ip.magic.im_class.magic_edit(ip.IP," -x %s "%arg)

ip.expose_magic('xed',ed_xed)

Before update: Does it has to be %magic? You can use the macro and store magic to reproduce this behavior without the magic %.

In [5]: %edit -x
In [6]: macro xed 5
In [7]: store xed
In [8]: xed

for magic alias from the documentation ( %magic? ):

You can also define your own aliased names for magic functions. In your ipythonrc file, placing a line like:

execute IPYTHON.magic_pf = IPYTHON.magic_profile

will define %pf as a new name for %profile.

But I don't know how too add the parameter.

qid & accept id: (296499, 296722) query: How do I zip the contents of a folder using python (version 2.5)? soup:

Adapted version of the script is:

\n
#!/usr/bin/env python\nfrom __future__ import with_statement\nfrom contextlib import closing\nfrom zipfile import ZipFile, ZIP_DEFLATED\nimport os\n\ndef zipdir(basedir, archivename):\n    assert os.path.isdir(basedir)\n    with closing(ZipFile(archivename, "w", ZIP_DEFLATED)) as z:\n        for root, dirs, files in os.walk(basedir):\n            #NOTE: ignore empty directories\n            for fn in files:\n                absfn = os.path.join(root, fn)\n                zfn = absfn[len(basedir)+len(os.sep):] #XXX: relative path\n                z.write(absfn, zfn)\n\nif __name__ == '__main__':\n    import sys\n    basedir = sys.argv[1]\n    archivename = sys.argv[2]\n    zipdir(basedir, archivename)\n
\n

Example:

\n
C:\zipdir> python -mzipdir c:\tmp\test test.zip\n
\n

It creates 'C:\zipdir\test.zip' archive with the contents of the 'c:\tmp\test' directory.

\n soup wrap:

Adapted version of the script is:

#!/usr/bin/env python
from __future__ import with_statement
from contextlib import closing
from zipfile import ZipFile, ZIP_DEFLATED
import os

def zipdir(basedir, archivename):
    assert os.path.isdir(basedir)
    with closing(ZipFile(archivename, "w", ZIP_DEFLATED)) as z:
        for root, dirs, files in os.walk(basedir):
            #NOTE: ignore empty directories
            for fn in files:
                absfn = os.path.join(root, fn)
                zfn = absfn[len(basedir)+len(os.sep):] #XXX: relative path
                z.write(absfn, zfn)

if __name__ == '__main__':
    import sys
    basedir = sys.argv[1]
    archivename = sys.argv[2]
    zipdir(basedir, archivename)

Example:

C:\zipdir> python -mzipdir c:\tmp\test test.zip

It creates 'C:\zipdir\test.zip' archive with the contents of the 'c:\tmp\test' directory.

qid & accept id: (324214, 326541) query: What is the fastest way to parse large XML docs in Python? soup:

I looks to me as if you do not need any DOM capabilities from your program. I would second the use of the (c)ElementTree library. If you use the iterparse function of the cElementTree module, you can work your way through the xml and deal with the events as they occur.

\n

Note however, Fredriks advice on using cElementTree iterparse function:

\n
\n

to parse large files, you can get rid of elements as soon as you’ve processed them:

\n
\n
for event, elem in iterparse(source):\n    if elem.tag == "record":\n        ... process record elements ...\n        elem.clear()\n
\n
\n

The above pattern has one drawback; it does not clear the root element, so you will end up with a single element with lots of empty child elements. If your files are huge, rather than just large, this might be a problem. To work around this, you need to get your hands on the root element. The easiest way to do this is to enable start events, and save a reference to the first element in a variable:

\n
\n
# get an iterable\ncontext = iterparse(source, events=("start", "end"))\n\n# turn it into an iterator\ncontext = iter(context)\n\n# get the root element\nevent, root = context.next()\n\nfor event, elem in context:\n    if event == "end" and elem.tag == "record":\n        ... process record elements ...\n        root.clear()\n
\n

The lxml.iterparse() does not allow this.

\n soup wrap:

I looks to me as if you do not need any DOM capabilities from your program. I would second the use of the (c)ElementTree library. If you use the iterparse function of the cElementTree module, you can work your way through the xml and deal with the events as they occur.

Note however, Fredriks advice on using cElementTree iterparse function:

to parse large files, you can get rid of elements as soon as you’ve processed them:

for event, elem in iterparse(source):
    if elem.tag == "record":
        ... process record elements ...
        elem.clear()

The above pattern has one drawback; it does not clear the root element, so you will end up with a single element with lots of empty child elements. If your files are huge, rather than just large, this might be a problem. To work around this, you need to get your hands on the root element. The easiest way to do this is to enable start events, and save a reference to the first element in a variable:

# get an iterable
context = iterparse(source, events=("start", "end"))

# turn it into an iterator
context = iter(context)

# get the root element
event, root = context.next()

for event, elem in context:
    if event == "end" and elem.tag == "record":
        ... process record elements ...
        root.clear()

The lxml.iterparse() does not allow this.

qid & accept id: (359903, 359945) query: Comparing List of Arguments to it self? soup:

Use list.count to get the number of items in a list that match a value. If that number doesn't match the number of items, you know they aren't all the same.

\n
if a.count( "foo" ) != len(a)\n
\n

Which would look like...

\n
if a.count( a[0] ) != len(a)\n
\n

...in production code.

\n soup wrap:

Use list.count to get the number of items in a list that match a value. If that number doesn't match the number of items, you know they aren't all the same.

if a.count( "foo" ) != len(a)

Which would look like...

if a.count( a[0] ) != len(a)

...in production code.

qid & accept id: (409732, 410067) query: Python: Alter elements of a list soup:
bool_list[:] = [False] * len(bool_list)\n
\n

or

\n
bool_list[:] = [False for item in bool_list]\n
\n soup wrap:
bool_list[:] = [False] * len(bool_list)

or

bool_list[:] = [False for item in bool_list]
qid & accept id: (465144, 465391) query: Tools for creating text as bitmaps (anti-aliased text, custom spacing, transparent background) soup:

Here's the SVG + ImageMagick solution:

\n

Programmatically create SVG documents based on this template, replacing "TEXT HERE" with the desired text content:

\n
\n\n\n  \n    TEXT HERE\n  \n\n
\n

Convert the documents to background-transparent PNGs with ImageMagick's convert:

\n
$ convert -background none input.svg output.png\n
\n soup wrap:

Here's the SVG + ImageMagick solution:

Programmatically create SVG documents based on this template, replacing "TEXT HERE" with the desired text content:




  
    TEXT HERE
  

Convert the documents to background-transparent PNGs with ImageMagick's convert:

$ convert -background none input.svg output.png
qid & accept id: (508657, 508677) query: Multidimensional array in Python soup:

You can create it using nested lists:

\n
matrix = [[a,b],[c,d],[e,f]]\n
\n

If it has to be dynamic it's more complicated, why not write a small class yourself?

\n
class Matrix(object):\n    def __init__(self, rows, columns, default=0):\n        self.m = []\n        for i in range(rows):\n            self.m.append([default for j in range(columns)])\n\n    def __getitem__(self, index):\n        return self.m[index]\n
\n

This can be used like this:

\n
m = Matrix(10,5)\nm[3][6] = 7\nprint m[3][6] // -> 7\n
\n

I'm sure one could implement it much more efficient. :)

\n

If you need multidimensional arrays you can either create an array and calculate the offset or you'd use arrays in arrays in arrays, which can be pretty bad for memory. (Could be faster though…) I've implemented the first idea like this:

\n
class Matrix(object):\n    def __init__(self, *dims):\n        self._shortcuts = [i for i in self._create_shortcuts(dims)]\n        self._li = [None] * (self._shortcuts.pop())\n        self._shortcuts.reverse()\n\n    def _create_shortcuts(self, dims):\n        dimList = list(dims)\n        dimList.reverse()\n        number = 1\n        yield 1\n        for i in dimList:\n            number *= i\n            yield number\n\n    def _flat_index(self, index):\n        if len(index) != len(self._shortcuts):\n            raise TypeError()\n\n        flatIndex = 0\n        for i, num in enumerate(index):\n            flatIndex += num * self._shortcuts[i]\n        return flatIndex\n\n    def __getitem__(self, index):\n        return self._li[self._flat_index(index)]\n\n    def __setitem__(self, index, value):\n        self._li[self._flat_index(index)] = value\n
\n

Can be used like this:

\n
m = Matrix(4,5,2,6)\nm[2,3,1,3] = 'x'\nm[2,3,1,3] // -> 'x'\n
\n soup wrap:

You can create it using nested lists:

matrix = [[a,b],[c,d],[e,f]]

If it has to be dynamic it's more complicated, why not write a small class yourself?

class Matrix(object):
    def __init__(self, rows, columns, default=0):
        self.m = []
        for i in range(rows):
            self.m.append([default for j in range(columns)])

    def __getitem__(self, index):
        return self.m[index]

This can be used like this:

m = Matrix(10,5)
m[3][6] = 7
print m[3][6] // -> 7

I'm sure one could implement it much more efficient. :)

If you need multidimensional arrays you can either create an array and calculate the offset or you'd use arrays in arrays in arrays, which can be pretty bad for memory. (Could be faster though…) I've implemented the first idea like this:

class Matrix(object):
    def __init__(self, *dims):
        self._shortcuts = [i for i in self._create_shortcuts(dims)]
        self._li = [None] * (self._shortcuts.pop())
        self._shortcuts.reverse()

    def _create_shortcuts(self, dims):
        dimList = list(dims)
        dimList.reverse()
        number = 1
        yield 1
        for i in dimList:
            number *= i
            yield number

    def _flat_index(self, index):
        if len(index) != len(self._shortcuts):
            raise TypeError()

        flatIndex = 0
        for i, num in enumerate(index):
            flatIndex += num * self._shortcuts[i]
        return flatIndex

    def __getitem__(self, index):
        return self._li[self._flat_index(index)]

    def __setitem__(self, index, value):
        self._li[self._flat_index(index)] = value

Can be used like this:

m = Matrix(4,5,2,6)
m[2,3,1,3] = 'x'
m[2,3,1,3] // -> 'x'
qid & accept id: (519633, 519653) query: Lazy Method for Reading Big File in Python? soup:

To write a lazy function, just use yield:

\n
def read_in_chunks(file_object, chunk_size=1024):\n    """Lazy function (generator) to read a file piece by piece.\n    Default chunk size: 1k."""\n    while True:\n        data = file_object.read(chunk_size)\n        if not data:\n            break\n        yield data\n\n\nf = open('really_big_file.dat')\nfor piece in read_in_chunks(f):\n    process_data(piece)\n
\n
\n

Another option would be to use iter and a helper function:

\n
f = open('really_big_file.dat')\ndef read1k():\n    return f.read(1024)\n\nfor piece in iter(read1k, ''):\n    process_data(piece)\n
\n
\n

If the file is line-based, the file object is already a lazy generator of lines:

\n
for line in open('really_big_file.dat'):\n    process_data(line)\n
\n soup wrap:

To write a lazy function, just use yield:

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


f = open('really_big_file.dat')
for piece in read_in_chunks(f):
    process_data(piece)

Another option would be to use iter and a helper function:

f = open('really_big_file.dat')
def read1k():
    return f.read(1024)

for piece in iter(read1k, ''):
    process_data(piece)

If the file is line-based, the file object is already a lazy generator of lines:

for line in open('really_big_file.dat'):
    process_data(line)
qid & accept id: (544923, 551704) query: Switching Printer Trays soup:

Ok, I figured this out. The answer is:
\n
\n 1. you need a local printer (if you need to print to a network printer, download the drivers and add it as a local printer)
\n 2. use win32print to get and set default printer
\n 3. also using win32print, use the following code:
\n

\n
import win32print\nPRINTER_DEFAULTS = {"DesiredAccess":win32print.PRINTER_ALL_ACCESS}\npHandle = win32print.OpenPrinter('RICOH-LOCAL', PRINTER_DEFAULTS)\nproperties = win32print.GetPrinter(pHandle, 2) #get the properties\npDevModeObj = properties["pDevMode"] #get the devmode\nautomaticTray = 7\ntray_one = 1\ntray_two = 3\ntray_three = 2\nprinter_tray = []\npDevModeObj.DefaultSource = tray_three #set the tray\nproperties["pDevMode"]=pDevModeObj #write the devmode back to properties\nwin32print.SetPrinter(pHandle,2,properties,0) #save the properties to the printer\n
\n
    \n
  1. that's it, the tray has been changed
  2. \n
  3. printing is accomplished using internet explorer (from Graham King's blog)

    \n
    from win32com import client\n    import time\n    ie = client.Dispatch("InternetExplorer.Application")\n    def printPDFDocument(filename):\n        ie.Navigate(filename)\n        if ie.Busy:\n            time.sleep(1)\n        ie.Document.printAll()\n    ie.Quit()\n
  4. \n
\n

Done

\n soup wrap:

Ok, I figured this out. The answer is:

1. you need a local printer (if you need to print to a network printer, download the drivers and add it as a local printer)
2. use win32print to get and set default printer
3. also using win32print, use the following code:

import win32print
PRINTER_DEFAULTS = {"DesiredAccess":win32print.PRINTER_ALL_ACCESS}
pHandle = win32print.OpenPrinter('RICOH-LOCAL', PRINTER_DEFAULTS)
properties = win32print.GetPrinter(pHandle, 2) #get the properties
pDevModeObj = properties["pDevMode"] #get the devmode
automaticTray = 7
tray_one = 1
tray_two = 3
tray_three = 2
printer_tray = []
pDevModeObj.DefaultSource = tray_three #set the tray
properties["pDevMode"]=pDevModeObj #write the devmode back to properties
win32print.SetPrinter(pHandle,2,properties,0) #save the properties to the printer
  1. that's it, the tray has been changed
  2. printing is accomplished using internet explorer (from Graham King's blog)

    from win32com import client
        import time
        ie = client.Dispatch("InternetExplorer.Application")
        def printPDFDocument(filename):
            ie.Navigate(filename)
            if ie.Busy:
                time.sleep(1)
            ie.Document.printAll()
        ie.Quit()
    

Done

qid & accept id: (555344, 555404) query: Match series of (non-nested) balanced parentheses at end of string soup:
paren_pattern = re.compile(r"\(([^()]*)\)(?=(?:\s*\([^()]*\))*\s*$)")\n\ndef getParens(s):\n  return paren_pattern.findall(s)\n
\n

or even shorter:

\n
getParens = re.compile(r"\(([^()]*)\)(?=(?:\s*\([^()]*\))*\s*$)").findall\n
\n

explaination:

\n
\(                     # opening paren\n([^()]*)               # content, captured into group 1\n\)                     # closing paren\n(?=                    # look ahead for...\n  (?:\s*\([^()]*\))*   #   a series of parens, separated by whitespace\n  \s*                  #   possibly more whitespace after\n  $                    #   end of string\n)                      # end of look ahead\n
\n soup wrap:
paren_pattern = re.compile(r"\(([^()]*)\)(?=(?:\s*\([^()]*\))*\s*$)")

def getParens(s):
  return paren_pattern.findall(s)

or even shorter:

getParens = re.compile(r"\(([^()]*)\)(?=(?:\s*\([^()]*\))*\s*$)").findall

explaination:

\(                     # opening paren
([^()]*)               # content, captured into group 1
\)                     # closing paren
(?=                    # look ahead for...
  (?:\s*\([^()]*\))*   #   a series of parens, separated by whitespace
  \s*                  #   possibly more whitespace after
  $                    #   end of string
)                      # end of look ahead
qid & accept id: (572263, 574460) query: How do I create a Django form that displays a checkbox label to the right of the checkbox? soup:

Here's what I ended up doing. I wrote a custom template stringfilter to switch the tags around. Now, my template code looks like this:

\n
{% load pretty_forms %}\n
\n{{ form.as_p|pretty_checkbox }}\n

\n
\n
\n

The only difference from a plain Django template is the addition of the {% load %} template tag and the pretty_checkbox filter.

\n

Here's a functional but ugly implementation of pretty_checkbox - this code doesn't have any error handling, it assumes that the Django generated attributes are formatted in a very specific way, and it would be a bad idea to use anything like this in your code:

\n
from django import template\nfrom django.template.defaultfilters import stringfilter\nimport logging\n\nregister=template.Library()\n\n@register.filter(name='pretty_checkbox')\n@stringfilter\ndef pretty_checkbox(value):\n    # Iterate over the HTML fragment, extract ')\n                ins = scratch.find('', ins)\n                # Check whether we're dealing with a checkbox:\n                if scratch[ins:ine+2].find(' type="checkbox" ')>-1:\n                    # Switch the tags\n                    output += scratch[:ls]\n                    output += scratch[ins:ine+2]\n                    output += scratch[ls:le-1]+scratch[le:le+8]\n                else:\n                    output += scratch[:ine+2]\n                scratch = scratch[ine+2:]\n            else:\n                output += scratch\n                break\n    except:\n        logging.error("pretty_checkbox caught an exception")\n    return output\n
\n

pretty_checkbox scans its string argument, finds pairs of

\n

Advantages:

\n
    \n
  1. No futzing with CSS.
  2. \n
  3. The markup ends up looking the way it's supposed to.
  4. \n
  5. I didn't hack Django internals.
  6. \n
  7. The template is nice, compact and idiomatic.
  8. \n
\n

Disadvantages:

\n
    \n
  1. The filter code needs to be tested for exciting values of the labels and input field names.
  2. \n
  3. There's probably something somewhere out there that does it better and faster.
  4. \n
  5. More work than I planned on doing on a Saturday.
  6. \n
\n soup wrap:

Here's what I ended up doing. I wrote a custom template stringfilter to switch the tags around. Now, my template code looks like this:

{% load pretty_forms %}
{{ form.as_p|pretty_checkbox }}

The only difference from a plain Django template is the addition of the {% load %} template tag and the pretty_checkbox filter.

Here's a functional but ugly implementation of pretty_checkbox - this code doesn't have any error handling, it assumes that the Django generated attributes are formatted in a very specific way, and it would be a bad idea to use anything like this in your code:

from django import template
from django.template.defaultfilters import stringfilter
import logging

register=template.Library()

@register.filter(name='pretty_checkbox')
@stringfilter
def pretty_checkbox(value):
    # Iterate over the HTML fragment, extract ')
                ins = scratch.find('', ins)
                # Check whether we're dealing with a checkbox:
                if scratch[ins:ine+2].find(' type="checkbox" ')>-1:
                    # Switch the tags
                    output += scratch[:ls]
                    output += scratch[ins:ine+2]
                    output += scratch[ls:le-1]+scratch[le:le+8]
                else:
                    output += scratch[:ine+2]
                scratch = scratch[ine+2:]
            else:
                output += scratch
                break
    except:
        logging.error("pretty_checkbox caught an exception")
    return output

pretty_checkbox scans its string argument, finds pairs of

Advantages:

  1. No futzing with CSS.
  2. The markup ends up looking the way it's supposed to.
  3. I didn't hack Django internals.
  4. The template is nice, compact and idiomatic.

Disadvantages:

  1. The filter code needs to be tested for exciting values of the labels and input field names.
  2. There's probably something somewhere out there that does it better and faster.
  3. More work than I planned on doing on a Saturday.
qid & accept id: (582723, 583065) query: How to import classes defined in __init__.py soup:
    \n
  1. 'lib/'s parent directory must be in sys.path.

  2. \n
  3. Your 'lib/__init__.py' might look like this:

    \n
    from . import settings # or just 'import settings' on old Python versions\nclass Helper(object):\n      pass\n
  4. \n
\n

Then the following example should work:

\n
from lib.settings import Values\nfrom lib import Helper\n
\n

Answer to the edited version of the question:

\n

__init__.py defines how your package looks from outside. If you need to use Helper in settings.py then define Helper in a different file e.g., 'lib/helper.py'.

\n
\n.\n|   `-- import_submodule.py\n    `-- lib\n    |-- __init__.py\n    |-- foo\n    |   |-- __init__.py\n    |   `-- someobject.py\n    |-- helper.py\n    `-- settings.py\n\n2 directories, 6 files\n
\n

The command:

\n
$ python import_submodule.py\n
\n

Output:

\n
settings\nhelper\nHelper in lib.settings\nsomeobject\nHelper in lib.foo.someobject\n\n# ./import_submodule.py\nimport fnmatch, os\nfrom lib.settings import Values\nfrom lib import Helper\n\nprint\nfor root, dirs, files in os.walk('.'):\n    for f in fnmatch.filter(files, '*.py'):\n        print "# %s/%s" % (os.path.basename(root), f)\n        print open(os.path.join(root, f)).read()\n        print\n\n\n# lib/helper.py\nprint 'helper'\nclass Helper(object):\n    def __init__(self, module_name):\n        print "Helper in", module_name\n\n\n# lib/settings.py\nprint "settings"\nimport helper\n\nclass Values(object):\n    pass\n\nhelper.Helper(__name__)\n\n\n# lib/__init__.py\n#from __future__ import absolute_import\nimport settings, foo.someobject, helper\n\nHelper = helper.Helper\n\n\n# foo/someobject.py\nprint "someobject"\nfrom .. import helper\n\nhelper.Helper(__name__)\n\n\n# foo/__init__.py\nimport someobject\n
\n soup wrap:
  1. 'lib/'s parent directory must be in sys.path.

  2. Your 'lib/__init__.py' might look like this:

    from . import settings # or just 'import settings' on old Python versions
    class Helper(object):
          pass
    

Then the following example should work:

from lib.settings import Values
from lib import Helper

Answer to the edited version of the question:

__init__.py defines how your package looks from outside. If you need to use Helper in settings.py then define Helper in a different file e.g., 'lib/helper.py'.

.
|   `-- import_submodule.py
    `-- lib
    |-- __init__.py
    |-- foo
    |   |-- __init__.py
    |   `-- someobject.py
    |-- helper.py
    `-- settings.py

2 directories, 6 files

The command:

$ python import_submodule.py

Output:

settings
helper
Helper in lib.settings
someobject
Helper in lib.foo.someobject

# ./import_submodule.py
import fnmatch, os
from lib.settings import Values
from lib import Helper

print
for root, dirs, files in os.walk('.'):
    for f in fnmatch.filter(files, '*.py'):
        print "# %s/%s" % (os.path.basename(root), f)
        print open(os.path.join(root, f)).read()
        print


# lib/helper.py
print 'helper'
class Helper(object):
    def __init__(self, module_name):
        print "Helper in", module_name


# lib/settings.py
print "settings"
import helper

class Values(object):
    pass

helper.Helper(__name__)


# lib/__init__.py
#from __future__ import absolute_import
import settings, foo.someobject, helper

Helper = helper.Helper


# foo/someobject.py
print "someobject"
from .. import helper

helper.Helper(__name__)


# foo/__init__.py
import someobject
qid & accept id: (645864, 646103) query: Changing prompt working directory via Python script soup:

I have a Python script to make moving around a file tree easier: xdir.py

\n

Briefly, I have an xdir.py file, which writes Windows commands to stdout:

\n
# Obviously, this should be more interesting..\nimport sys\nprint "cd", sys.argv[1]\n
\n

Then an xdir.cmd file:

\n
@echo off\npython xdir.py %* >%TEMP%\__xdir.cmd\ncall %TEMP%\__xdir.cmd\n
\n

Then I create a doskey alias:

\n
doskey x=xdir.cmd $*\n
\n

The end result is that I can type

\n
$ x subdir\n
\n

and change into subdir.

\n

The script I linked to above does much more, including remembering history, maintaining a stack of directories, accepting shorthand for directories, and so on.

\n soup wrap:

I have a Python script to make moving around a file tree easier: xdir.py

Briefly, I have an xdir.py file, which writes Windows commands to stdout:

# Obviously, this should be more interesting..
import sys
print "cd", sys.argv[1]

Then an xdir.cmd file:

@echo off
python xdir.py %* >%TEMP%\__xdir.cmd
call %TEMP%\__xdir.cmd

Then I create a doskey alias:

doskey x=xdir.cmd $*

The end result is that I can type

$ x subdir

and change into subdir.

The script I linked to above does much more, including remembering history, maintaining a stack of directories, accepting shorthand for directories, and so on.

qid & accept id: (682504, 682513) query: What is a clean, pythonic way to have multiple constructors in Python? soup:

Actually None is much better for "magic" values:

\n
class Cheese():\n    def __init__(self, num_holes = None):\n        if num_holes is None:\n            ...\n
\n

Now if you want complete freedom of adding more parameters:

\n
class Cheese():\n    def __init__(self, *args, **kwargs):\n        #args -- tuple of anonymous arguments\n        #kwargs -- dictionary of named arguments\n        self.num_holes = kwargs.get('num_holes',random_holes())\n
\n

To better explain the concept of *args and **kwargs (you can actually change these names):

\n
def f(*args, **kwargs):\n   print 'args: ', args, ' kwargs: ', kwargs\n\n>>> f('a')\nargs:  ('a',)  kwargs:  {}\n>>> f(ar='a')\nargs:  ()  kwargs:  {'ar': 'a'}\n>>> f(1,2,param=3)\nargs:  (1, 2)  kwargs:  {'param': 3}\n
\n

http://docs.python.org/reference/expressions.html#calls

\n soup wrap:

Actually None is much better for "magic" values:

class Cheese():
    def __init__(self, num_holes = None):
        if num_holes is None:
            ...

Now if you want complete freedom of adding more parameters:

class Cheese():
    def __init__(self, *args, **kwargs):
        #args -- tuple of anonymous arguments
        #kwargs -- dictionary of named arguments
        self.num_holes = kwargs.get('num_holes',random_holes())

To better explain the concept of *args and **kwargs (you can actually change these names):

def f(*args, **kwargs):
   print 'args: ', args, ' kwargs: ', kwargs

>>> f('a')
args:  ('a',)  kwargs:  {}
>>> f(ar='a')
args:  ()  kwargs:  {'ar': 'a'}
>>> f(1,2,param=3)
args:  (1, 2)  kwargs:  {'param': 3}

http://docs.python.org/reference/expressions.html#calls

qid & accept id: (706755, 706770) query: How do you safely and efficiently get the row id after an insert with mysql using MySQLdb in python? soup:

I think it might be

\n
newID = db.insert_id()\n
\n
\n

Edit by Original Poster

\n

Turns out, in the version of MySQLdb that I am using (1.2.2)\nYou would do the following:

\n
conn = MySQLdb(host...)\n\nc = conn.cursor()\nc.execute("INSERT INTO...")\nnewID = c.lastrowid\n
\n

I am leaving this as the correct answer, since it got me pointed in the right direction.

\n soup wrap:

I think it might be

newID = db.insert_id()

Edit by Original Poster

Turns out, in the version of MySQLdb that I am using (1.2.2) You would do the following:

conn = MySQLdb(host...)

c = conn.cursor()
c.execute("INSERT INTO...")
newID = c.lastrowid

I am leaving this as the correct answer, since it got me pointed in the right direction.

qid & accept id: (706813, 706876) query: What is the best way to pass a method (with parameters) to another method in python soup:

You could do it this way:

\n
def method1(name):\n    def wrapper():\n        return 'Hello ' + name\n    return wrapper\n\ndef method2(method, question):\n    output = method()\n    return output + ', ' + question\n\nmethod2(method1(name = 'Sam'), 'How are you?')\n
\n

You can of course pass some variables in the method() call too:

\n
def method1(name):\n    def wrapper(greeting):\n        return greeting + name\n    return wrapper\n\ndef method2(method, question):\n    output = method(greeting = 'Hello ')\n    return output + ', ' + question\n\nmethod2(method1(name = 'Sam'), 'How are you?')\n
\n soup wrap:

You could do it this way:

def method1(name):
    def wrapper():
        return 'Hello ' + name
    return wrapper

def method2(method, question):
    output = method()
    return output + ', ' + question

method2(method1(name = 'Sam'), 'How are you?')

You can of course pass some variables in the method() call too:

def method1(name):
    def wrapper(greeting):
        return greeting + name
    return wrapper

def method2(method, question):
    output = method(greeting = 'Hello ')
    return output + ', ' + question

method2(method1(name = 'Sam'), 'How are you?')
qid & accept id: (765305, 765436) query: Proxy Check in python soup:

The simplest was is to simply catch the IOError exception from urllib:

\n
try:\n    urllib.urlopen(\n        "http://example.com",\n        proxies={'http':'http://example.com:8080'}\n    )\nexcept IOError:\n    print "Connection error! (Check proxy)"\nelse:\n    print "All was fine"\n
\n

Also, from this blog post - "check status proxy address" (with some slight improvements):

\n
import urllib2\nimport socket\n\ndef is_bad_proxy(pip):    \n    try:\n        proxy_handler = urllib2.ProxyHandler({'http': pip})\n        opener = urllib2.build_opener(proxy_handler)\n        opener.addheaders = [('User-agent', 'Mozilla/5.0')]\n        urllib2.install_opener(opener)\n        req=urllib2.Request('http://www.example.com')  # change the URL to test here\n        sock=urllib2.urlopen(req)\n    except urllib2.HTTPError, e:\n        print 'Error code: ', e.code\n        return e.code\n    except Exception, detail:\n        print "ERROR:", detail\n        return True\n    return False\n\ndef main():\n    socket.setdefaulttimeout(120)\n\n    # two sample proxy IPs\n    proxyList = ['125.76.226.9:80', '213.55.87.162:6588']\n\n    for currentProxy in proxyList:\n        if is_bad_proxy(currentProxy):\n            print "Bad Proxy %s" % (currentProxy)\n        else:\n            print "%s is working" % (currentProxy)\n\nif __name__ == '__main__':\n    main()\n
\n

Remember this could double the time the script takes, if the proxy is down (as you will have to wait for two connection-timeouts).. Unless you specifically have to know the proxy is at fault, handling the IOError is far cleaner, simpler and quicker..

\n soup wrap:

The simplest was is to simply catch the IOError exception from urllib:

try:
    urllib.urlopen(
        "http://example.com",
        proxies={'http':'http://example.com:8080'}
    )
except IOError:
    print "Connection error! (Check proxy)"
else:
    print "All was fine"

Also, from this blog post - "check status proxy address" (with some slight improvements):

import urllib2
import socket

def is_bad_proxy(pip):    
    try:
        proxy_handler = urllib2.ProxyHandler({'http': pip})
        opener = urllib2.build_opener(proxy_handler)
        opener.addheaders = [('User-agent', 'Mozilla/5.0')]
        urllib2.install_opener(opener)
        req=urllib2.Request('http://www.example.com')  # change the URL to test here
        sock=urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print 'Error code: ', e.code
        return e.code
    except Exception, detail:
        print "ERROR:", detail
        return True
    return False

def main():
    socket.setdefaulttimeout(120)

    # two sample proxy IPs
    proxyList = ['125.76.226.9:80', '213.55.87.162:6588']

    for currentProxy in proxyList:
        if is_bad_proxy(currentProxy):
            print "Bad Proxy %s" % (currentProxy)
        else:
            print "%s is working" % (currentProxy)

if __name__ == '__main__':
    main()

Remember this could double the time the script takes, if the proxy is down (as you will have to wait for two connection-timeouts).. Unless you specifically have to know the proxy is at fault, handling the IOError is far cleaner, simpler and quicker..

qid & accept id: (870652, 870677) query: Pythonic way to split comma separated numbers into pairs soup:

Something like:

\n
zip(t[::2], t[1::2])\n
\n

Full example:

\n
>>> s = ','.join(str(i) for i in range(10))\n>>> s\n'0,1,2,3,4,5,6,7,8,9'\n>>> t = [int(i) for i in s.split(',')]\n>>> t\n[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n>>> p = zip(t[::2], t[1::2])\n>>> p\n[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]\n>>>\n
\n

If the number of items is odd, the last element will be ignored. Only complete pairs will be included.

\n soup wrap:

Something like:

zip(t[::2], t[1::2])

Full example:

>>> s = ','.join(str(i) for i in range(10))
>>> s
'0,1,2,3,4,5,6,7,8,9'
>>> t = [int(i) for i in s.split(',')]
>>> t
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> p = zip(t[::2], t[1::2])
>>> p
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
>>>

If the number of items is odd, the last element will be ignored. Only complete pairs will be included.

qid & accept id: (897362, 897373) query: What is the idiomatic way of invoking a list of functions in Python? soup:

Use map only for functions without side effects (like print). That is, use it only for functions that just return something. In this case a regular loop is more idiomatic:

\n
for f in lst:\n    f("event_info")\n
\n

Edit: also, as of Python 3.0, map returns an iterator instead of a list. Hence in Python 3.0 the code given in the question will not call any function, unless all elements in the generator are evaluated explicitly (e.g. by encapsulating the call to map inside list). Luckily the 2to3 tool will warn about this:

\n

File map.py:

\n
map(lambda x: x, range(10))\n
\n

2to3-3.0 map.py output:

\n
RefactoringTool: Skipping implicit fixer: buffer\nRefactoringTool: Skipping implicit fixer: idioms\nRefactoringTool: Skipping implicit fixer: set_literal\nRefactoringTool: Skipping implicit fixer: ws_comma\n--- map.py (original)\n+++ map.py (refactored)\n@@ -1,1 +1,1 @@\n-map(lambda x: x, range(10))\n+list(map(lambda x: x, list(range(10))))\nRefactoringTool: Files that need to be modified:\nRefactoringTool: map.py\nRefactoringTool: Warnings/messages while refactoring:\nRefactoringTool: ### In file map.py ###\nRefactoringTool: Line 1: You should use a for loop here\n
\n soup wrap:

Use map only for functions without side effects (like print). That is, use it only for functions that just return something. In this case a regular loop is more idiomatic:

for f in lst:
    f("event_info")

Edit: also, as of Python 3.0, map returns an iterator instead of a list. Hence in Python 3.0 the code given in the question will not call any function, unless all elements in the generator are evaluated explicitly (e.g. by encapsulating the call to map inside list). Luckily the 2to3 tool will warn about this:

File map.py:

map(lambda x: x, range(10))

2to3-3.0 map.py output:

RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
--- map.py (original)
+++ map.py (refactored)
@@ -1,1 +1,1 @@
-map(lambda x: x, range(10))
+list(map(lambda x: x, list(range(10))))
RefactoringTool: Files that need to be modified:
RefactoringTool: map.py
RefactoringTool: Warnings/messages while refactoring:
RefactoringTool: ### In file map.py ###
RefactoringTool: Line 1: You should use a for loop here
qid & accept id: (899138, 899172) query: Python-like list comprehension in Java soup:

Basically, you create a Function interface:

\n
public interface Func {\n    public Out apply(In in);\n}\n
\n

and then pass in an anonymous subclass to your method.

\n

Your method could either apply the function to each element in-place:

\n
public static  void applyToListInPlace(List list, Func f) {\n    ListIterator itr = list.listIterator();\n    while (itr.hasNext()) {\n        T output = f.apply(itr.next());\n        itr.set(output);\n    }\n}\n// ...\nList myList = ...;\napplyToListInPlace(myList, new Func() {\n    public String apply(String in) {\n        return in.toLowerCase();\n    }\n});\n
\n

or create a new List (basically creating a mapping from the input list to the output list):

\n
public static  List map(List in, Func f) {\n    List out = new ArrayList(in.size());\n    for (In inObj : in) {\n        out.add(f.apply(inObj));\n    }\n    return out;\n}\n// ...\nList myList = ...;\nList lowerCased = map(myList, new Func() {\n    public String apply(String in) {\n        return in.toLowerCase();\n    }\n});\n
\n

Which one is preferable depends on your use case. If your list is extremely large, the in-place solution may be the only viable one; if you wish to apply many different functions to the same original list to make many derivative lists, you will want the map version.

\n soup wrap:

Basically, you create a Function interface:

public interface Func {
    public Out apply(In in);
}

and then pass in an anonymous subclass to your method.

Your method could either apply the function to each element in-place:

public static  void applyToListInPlace(List list, Func f) {
    ListIterator itr = list.listIterator();
    while (itr.hasNext()) {
        T output = f.apply(itr.next());
        itr.set(output);
    }
}
// ...
List myList = ...;
applyToListInPlace(myList, new Func() {
    public String apply(String in) {
        return in.toLowerCase();
    }
});

or create a new List (basically creating a mapping from the input list to the output list):

public static  List map(List in, Func f) {
    List out = new ArrayList(in.size());
    for (In inObj : in) {
        out.add(f.apply(inObj));
    }
    return out;
}
// ...
List myList = ...;
List lowerCased = map(myList, new Func() {
    public String apply(String in) {
        return in.toLowerCase();
    }
});

Which one is preferable depends on your use case. If your list is extremely large, the in-place solution may be the only viable one; if you wish to apply many different functions to the same original list to make many derivative lists, you will want the map version.

qid & accept id: (933612, 933633) query: What is the best way to fetch/render one-to-many relationships? soup:

Just cut your view code to this line:

\n
entries = Entry.objects.filter(user=request.user).order_by("-timestamp")\n
\n

And do this in the template:

\n
{% for entry in entries %}\n    {{ entry.datadesc }}\n    \n    {% for file in entry.entryfile_set.all %}\n        \n        \n        \n        \n    {% endfor %}\n    
{{ file.datafile.name|split:"/"|last }}{{ file.datafile.size|filesizeformat }}downloaddelete
\n{% endfor %}\n
\n

I am a big fan of using related_name in Models, however, so you could change this line:

\n
entry = models.ForeignKey(Entry)\n
\n

To this:

\n
entry = models.ForeignKey(Entry, related_name='files')\n
\n

And then you can access all the files for a particular entry by changing this:

\n
{% for file in files.entryfile_set.all %}\n
\n

To the more readable/obvious:

\n
{% for file in entry.files.all %}\n
\n soup wrap:

Just cut your view code to this line:

entries = Entry.objects.filter(user=request.user).order_by("-timestamp")

And do this in the template:

{% for entry in entries %}
    {{ entry.datadesc }}
    
    {% for file in entry.entryfile_set.all %}
        
    {% endfor %}
    
{{ file.datafile.name|split:"/"|last }} {{ file.datafile.size|filesizeformat }} download delete
{% endfor %}

I am a big fan of using related_name in Models, however, so you could change this line:

entry = models.ForeignKey(Entry)

To this:

entry = models.ForeignKey(Entry, related_name='files')

And then you can access all the files for a particular entry by changing this:

{% for file in files.entryfile_set.all %}

To the more readable/obvious:

{% for file in entry.files.all %}
qid & accept id: (956820, 956852) query: Iterating through large lists with potential conditions in Python soup:

You could define a little inline function:

\n
def EntryMatches(e):\n  if use_currency and not (e.currency == currency):\n    return False\n  if use_category and not (e.category == category):\n    return False\n  return True\n
\n

then

\n
totals['quantity'] = sum([e.quantity for e in entries if EntryMatches(e)])\n
\n

EntryMatches() will have access to all variables in enclosing scope, so no need to pass in any more arguments. You get the advantage that all of the logic for which entries to use is in one place, you still get to use the list comprehension to make the sum() more readable, but you can have arbitrary logic in EntryMatches() now.

\n soup wrap:

You could define a little inline function:

def EntryMatches(e):
  if use_currency and not (e.currency == currency):
    return False
  if use_category and not (e.category == category):
    return False
  return True

then

totals['quantity'] = sum([e.quantity for e in entries if EntryMatches(e)])

EntryMatches() will have access to all variables in enclosing scope, so no need to pass in any more arguments. You get the advantage that all of the logic for which entries to use is in one place, you still get to use the list comprehension to make the sum() more readable, but you can have arbitrary logic in EntryMatches() now.

qid & accept id: (973481, 973567) query: Dynamic Table Creation and ORM mapping in SqlAlchemy soup:

We are absolutely spoiled by SqlAlchemy.
\nWhat follows below is taken directly from the tutorial,
\nand is really easy to setup and get working.

\n

And because it is done so often,
\nMike Bayer has made this even easier
\nwith the all-in-one "declarative" method.

\n

Setup your environment (I'm using the SQLite in-memory db to test):

\n
>>> from sqlalchemy import create_engine\n>>> engine = create_engine('sqlite:///:memory:', echo=True)\n>>> from sqlalchemy import Table, Column, Integer, String, MetaData\n>>> metadata = MetaData()\n
\n

Define your table:

\n
>>> players_table = Table('players', metadata,\n...   Column('id', Integer, primary_key=True),\n...   Column('name', String),\n...   Column('score', Integer)\n... )\n>>> metadata.create_all(engine) # create the table\n
\n

If you have logging turned on, you'll see the SQL that SqlAlchemy creates for you.

\n

Define your class:

\n
>>> class Player(object):\n...     def __init__(self, name, score):\n...         self.name = name\n...         self.score = score\n...\n...     def __repr__(self):\n...        return "" % (self.name, self.score)\n
\n

Map the class to your table:

\n
>>> from sqlalchemy.orm import mapper\n>>> mapper(Player, players_table) \n\n
\n

Create a player:

\n
>>> a_player = Player('monty', 0)\n>>> a_player.name\n'monty'\n>>> a_player.score\n0\n
\n

That's it, you now have a your player table.
\nAlso, the SqlAlchemy googlegroup is great.
\nMike Bayer is very quick to answer questions.

\n soup wrap:

We are absolutely spoiled by SqlAlchemy.
What follows below is taken directly from the tutorial,
and is really easy to setup and get working.

And because it is done so often,
Mike Bayer has made this even easier
with the all-in-one "declarative" method.

Setup your environment (I'm using the SQLite in-memory db to test):

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:', echo=True)
>>> from sqlalchemy import Table, Column, Integer, String, MetaData
>>> metadata = MetaData()

Define your table:

>>> players_table = Table('players', metadata,
...   Column('id', Integer, primary_key=True),
...   Column('name', String),
...   Column('score', Integer)
... )
>>> metadata.create_all(engine) # create the table

If you have logging turned on, you'll see the SQL that SqlAlchemy creates for you.

Define your class:

>>> class Player(object):
...     def __init__(self, name, score):
...         self.name = name
...         self.score = score
...
...     def __repr__(self):
...        return "" % (self.name, self.score)

Map the class to your table:

>>> from sqlalchemy.orm import mapper
>>> mapper(Player, players_table) 

Create a player:

>>> a_player = Player('monty', 0)
>>> a_player.name
'monty'
>>> a_player.score
0

That's it, you now have a your player table.
Also, the SqlAlchemy googlegroup is great.
Mike Bayer is very quick to answer questions.

qid & accept id: (1008038, 1008223) query: How do I test if a string exists in a Genshi stream? soup:

Aha!! I have solved this by first attempting to remove the function from the stream:

\n
stream = stream | Transformer('.//head/script["functionName()"]').remove()\n
\n

and then adding the updated/new version:

\n
stream = stream | Transformer('.//head').append(tag.script(functionNameCode, type="text/javascript"))\n
\n soup wrap:

Aha!! I have solved this by first attempting to remove the function from the stream:

stream = stream | Transformer('.//head/script["functionName()"]').remove()

and then adding the updated/new version:

stream = stream | Transformer('.//head').append(tag.script(functionNameCode, type="text/javascript"))
qid & accept id: (1029207, 1031510) query: Interpolation in SciPy: Finding X that produces Y soup:

The UnivariateSpline class in scipy makes doing splines much more pythonic.

\n
x = [70, 80, 90, 100, 110]\ny = [49.7, 80.6, 122.5, 153.8, 163.0]\nf = interpolate.UnivariateSpline(x, y, s=0)\nxnew = np.arange(70,111,1)\n\nplt.plot(x,y,'x',xnew,f(xnew))\n
\n

To find x at y then do:

\n
yToFind = 140\nyreduced = np.array(y) - yToFind\nfreduced = interpolate.UnivariateSpline(x, yreduced, s=0)\nfreduced.roots()\n
\n

I thought interpolating x in terms of y might work but it takes a somewhat different route. It might be closer with more points.

\n soup wrap:

The UnivariateSpline class in scipy makes doing splines much more pythonic.

x = [70, 80, 90, 100, 110]
y = [49.7, 80.6, 122.5, 153.8, 163.0]
f = interpolate.UnivariateSpline(x, y, s=0)
xnew = np.arange(70,111,1)

plt.plot(x,y,'x',xnew,f(xnew))

To find x at y then do:

yToFind = 140
yreduced = np.array(y) - yToFind
freduced = interpolate.UnivariateSpline(x, yreduced, s=0)
freduced.roots()

I thought interpolating x in terms of y might work but it takes a somewhat different route. It might be closer with more points.

qid & accept id: (1042751, 1042756) query: Splitting a string @ once using different seps soup:

One idea would be something like this (untested):

\n
years, months, days = the_string.split('-')\ndays, time = days.split(' ')\ntime = time.split(':')\n
\n

Or this, which fits your data better.

\n
date, time = the_string.split(' ')\nyears, months, days = date.split('-')\nhours, minute, seconds = time.split(":")\n
\n soup wrap:

One idea would be something like this (untested):

years, months, days = the_string.split('-')
days, time = days.split(' ')
time = time.split(':')

Or this, which fits your data better.

date, time = the_string.split(' ')
years, months, days = date.split('-')
hours, minute, seconds = time.split(":")
qid & accept id: (1060193, 1060244) query: Python Decorator 3.0 and arguments to the decorator soup:

In this case, you need to make your function return the decorator. (Anything can be solved by another level of indirection...)

\n
from decorator import decorator\ndef substitute_args(arg_sub_dict):\n  @decorator\n  def wrapper(fun, arg):\n    new_arg = arg_sub_dict.get(arg, arg)\n    return fun(new_arg)\n  return wrapper\n
\n

This means substitute_args isn't a decorator itself, it's a decorator factory. Here's the equivalent without the decorator module.

\n
def substitute_args(arg_sub_dict):\n  def my_decorator(fun):\n    def wrapper(arg):\n      new_arg = arg_sub_dict.get(arg, arg)\n      return fun(new_arg)\n    # magic to update __name__, etc.\n    return wrapper\n  return my_decorator\n
\n

Three levels deep isn't very convenient, but remember two of them are when the function is defined:

\n
@substitute_args({}) # this function is called and return value is the decorator\ndef f(x):\n  return x\n# that (anonymous) decorator is applied to f\n
\n

Which is equivalent to:

\n
def f(x):\n  return x\nf = substitude_args({})(f) # notice the double call\n
\n soup wrap:

In this case, you need to make your function return the decorator. (Anything can be solved by another level of indirection...)

from decorator import decorator
def substitute_args(arg_sub_dict):
  @decorator
  def wrapper(fun, arg):
    new_arg = arg_sub_dict.get(arg, arg)
    return fun(new_arg)
  return wrapper

This means substitute_args isn't a decorator itself, it's a decorator factory. Here's the equivalent without the decorator module.

def substitute_args(arg_sub_dict):
  def my_decorator(fun):
    def wrapper(arg):
      new_arg = arg_sub_dict.get(arg, arg)
      return fun(new_arg)
    # magic to update __name__, etc.
    return wrapper
  return my_decorator

Three levels deep isn't very convenient, but remember two of them are when the function is defined:

@substitute_args({}) # this function is called and return value is the decorator
def f(x):
  return x
# that (anonymous) decorator is applied to f

Which is equivalent to:

def f(x):
  return x
f = substitude_args({})(f) # notice the double call
qid & accept id: (1123337, 1123603) query: Django: Converting an entire set of a Model's objects into a single dictionary soup:

Does this need to create an actual dict? could you get by with only something that looked like a dict?

\n
class DictModelAdaptor():\n    def __init__(self, model):\n        self.model = model\n\n    def __getitem__(self, key):\n        return self.model.objects.get(key=key)\n\n    def __setitem__(self, key, item):\n        pair = self.model()\n        pair.key = key\n        pair.value = item\n        pair.save()\n\n    def __contains__(self, key):\n        ...\n
\n

You could then wrap a model in this way:

\n
modelDict = DictModelAdaptor(DictModel)\nmodelDict["name"] = "Bob Jones"\n
\n

etc...

\n soup wrap:

Does this need to create an actual dict? could you get by with only something that looked like a dict?

class DictModelAdaptor():
    def __init__(self, model):
        self.model = model

    def __getitem__(self, key):
        return self.model.objects.get(key=key)

    def __setitem__(self, key, item):
        pair = self.model()
        pair.key = key
        pair.value = item
        pair.save()

    def __contains__(self, key):
        ...

You could then wrap a model in this way:

modelDict = DictModelAdaptor(DictModel)
modelDict["name"] = "Bob Jones"

etc...

qid & accept id: (1144702, 1144726) query: Using Eval in Python to create class variables soup:

You can use the setattr function, which takes three arguments: the object, the name of the attribute, and it's value. For example,

\n
setattr(self, 'wavelength', wavelength_val)\n
\n

is equivalent to:

\n
self.wavelength = wavelength_val\n
\n

So you could do something like this:

\n
for variable in self.variable_list:\n       var_type,var_text_ctrl,var_name = variable\n       if var_type == 'f' :\n           setattr(self, var_name, var_text_ctrl.GetValue())\n
\n soup wrap:

You can use the setattr function, which takes three arguments: the object, the name of the attribute, and it's value. For example,

setattr(self, 'wavelength', wavelength_val)

is equivalent to:

self.wavelength = wavelength_val

So you could do something like this:

for variable in self.variable_list:
       var_type,var_text_ctrl,var_name = variable
       if var_type == 'f' :
           setattr(self, var_name, var_text_ctrl.GetValue())
qid & accept id: (1175208, 1176023) query: Elegant Python function to convert CamelCase to snake_case? soup:

This is pretty thorough:

\n
def convert(name):\n    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)\n    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()\n
\n

Works with all these (and doesn't harm already-un-cameled versions):

\n
>>> convert('CamelCase')\n'camel_case'\n>>> convert('CamelCamelCase')\n'camel_camel_case'\n>>> convert('Camel2Camel2Case')\n'camel2_camel2_case'\n>>> convert('getHTTPResponseCode')\n'get_http_response_code'\n>>> convert('get2HTTPResponseCode')\n'get2_http_response_code'\n>>> convert('HTTPResponseCode')\n'http_response_code'\n>>> convert('HTTPResponseCodeXYZ')\n'http_response_code_xyz'\n
\n

Or if you're going to call it a zillion times, you can pre-compile the regexes:

\n
first_cap_re = re.compile('(.)([A-Z][a-z]+)')\nall_cap_re = re.compile('([a-z0-9])([A-Z])')\ndef convert(name):\n    s1 = first_cap_re.sub(r'\1_\2', name)\n    return all_cap_re.sub(r'\1_\2', s1).lower()\n
\n

Don't forget to import the regular expression module

\n
import re\n
\n soup wrap:

This is pretty thorough:

def convert(name):
    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
    return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()

Works with all these (and doesn't harm already-un-cameled versions):

>>> convert('CamelCase')
'camel_case'
>>> convert('CamelCamelCase')
'camel_camel_case'
>>> convert('Camel2Camel2Case')
'camel2_camel2_case'
>>> convert('getHTTPResponseCode')
'get_http_response_code'
>>> convert('get2HTTPResponseCode')
'get2_http_response_code'
>>> convert('HTTPResponseCode')
'http_response_code'
>>> convert('HTTPResponseCodeXYZ')
'http_response_code_xyz'

Or if you're going to call it a zillion times, you can pre-compile the regexes:

first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def convert(name):
    s1 = first_cap_re.sub(r'\1_\2', name)
    return all_cap_re.sub(r'\1_\2', s1).lower()

Don't forget to import the regular expression module

import re
qid & accept id: (1210099, 1210157) query: How to find number of users, number of users with a profile object, and monthly logins in Django soup:

Count the number of users:

\n
import django.contrib.auth\ndjango.contrib.auth.models.User.objects.all().count()\n
\n

You can use the same to count the number of profile objects (assuming every user has at most 1 profile), e.g. if Profile is the profile model:

\n
Profile.objects.all().count()\n
\n

To count the number of logins in a month you'd need to create a table logging each login with a time stamp. Then it's a matter of using count() again.

\n soup wrap:

Count the number of users:

import django.contrib.auth
django.contrib.auth.models.User.objects.all().count()

You can use the same to count the number of profile objects (assuming every user has at most 1 profile), e.g. if Profile is the profile model:

Profile.objects.all().count()

To count the number of logins in a month you'd need to create a table logging each login with a time stamp. Then it's a matter of using count() again.

qid & accept id: (1267314, 1267487) query: How do I calculate the numeric value of a string with unicode components in python? soup:

I think this is what you want...

\n
import unicodedata\ndef eval_unicode(s):\n    #sum all the unicode fractions\n    u = sum(map(unicodedata.numeric, filter(lambda x: unicodedata.category(x)=="No",s)))\n    #eval the regular digits (with optional dot) as a float, or default to 0\n    n = float("".join(filter(lambda x:x.isdigit() or x==".", s)) or 0)\n    return n+u\n
\n

or the "comprehensive" solution, for those who prefer that style:

\n
import unicodedata\ndef eval_unicode(s):\n    #sum all the unicode fractions\n    u = sum(unicodedata.numeric(i) for i in s if unicodedata.category(i)=="No")\n    #eval the regular digits (with optional dot) as a float, or default to 0\n    n = float("".join(i for i in s if i.isdigit() or i==".") or 0)\n    return n+u\n
\n

But beware, there are many unicode values that seem to not have a numeric value assigned in python (for example ⅜⅝ don't work... or maybe is just a matter with my keyboard xD).

\n

Another note on the implementation: it's "too robust", it will work even will malformed numbers like "123½3 ½" and will eval it to 1234.0... but it won't work if there are more than one dots.

\n soup wrap:

I think this is what you want...

import unicodedata
def eval_unicode(s):
    #sum all the unicode fractions
    u = sum(map(unicodedata.numeric, filter(lambda x: unicodedata.category(x)=="No",s)))
    #eval the regular digits (with optional dot) as a float, or default to 0
    n = float("".join(filter(lambda x:x.isdigit() or x==".", s)) or 0)
    return n+u

or the "comprehensive" solution, for those who prefer that style:

import unicodedata
def eval_unicode(s):
    #sum all the unicode fractions
    u = sum(unicodedata.numeric(i) for i in s if unicodedata.category(i)=="No")
    #eval the regular digits (with optional dot) as a float, or default to 0
    n = float("".join(i for i in s if i.isdigit() or i==".") or 0)
    return n+u

But beware, there are many unicode values that seem to not have a numeric value assigned in python (for example ⅜⅝ don't work... or maybe is just a matter with my keyboard xD).

Another note on the implementation: it's "too robust", it will work even will malformed numbers like "123½3 ½" and will eval it to 1234.0... but it won't work if there are more than one dots.

qid & accept id: (1295415, 1295443) query: How to replace Python function while supporting all passed in parameters soup:

why don't you just try:

\n
f = replacement_f\n
\n

example:

\n
>>> def rep(*args):\n    print(*args, sep=' -- ')\n\n>>> def ori(*args):\n    print(args)\n\n>>> ori('dfef', 32)\n('dfef', 32)\n>>> ori = rep\n>>> ori('dfef', 32)\ndfef -- 32\n
\n soup wrap:

why don't you just try:

f = replacement_f

example:

>>> def rep(*args):
    print(*args, sep=' -- ')

>>> def ori(*args):
    print(args)

>>> ori('dfef', 32)
('dfef', 32)
>>> ori = rep
>>> ori('dfef', 32)
dfef -- 32
qid & accept id: (1305532, 1305663) query: Convert Python dict to object? soup:

Update: In Python 2.6 and onwards, consider whether the namedtuple data structure suits your needs:

\n
>>> from collections import namedtuple\n>>> MyStruct = namedtuple('MyStruct', 'a b d')\n>>> s = MyStruct(a=1, b={'c': 2}, d=['hi'])\n>>> s\nMyStruct(a=1, b={'c': 2}, d=['hi'])\n>>> s.a\n1\n>>> s.b\n{'c': 2}\n>>> s.c\nTraceback (most recent call last):\n  File "", line 1, in \nAttributeError: 'MyStruct' object has no attribute 'c'\n>>> s.d\n['hi']\n
\n

The alternative (original answer contents) is:

\n
class Struct:\n    def __init__(self, **entries):\n        self.__dict__.update(entries)\n
\n

Then, you can use:

\n
>>> args = {'a': 1, 'b': 2}\n>>> s = Struct(**args)\n>>> s\n<__main__.Struct instance at 0x01D6A738>\n>>> s.a\n1\n>>> s.b\n2\n
\n soup wrap:

Update: In Python 2.6 and onwards, consider whether the namedtuple data structure suits your needs:

>>> from collections import namedtuple
>>> MyStruct = namedtuple('MyStruct', 'a b d')
>>> s = MyStruct(a=1, b={'c': 2}, d=['hi'])
>>> s
MyStruct(a=1, b={'c': 2}, d=['hi'])
>>> s.a
1
>>> s.b
{'c': 2}
>>> s.c
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'MyStruct' object has no attribute 'c'
>>> s.d
['hi']

The alternative (original answer contents) is:

class Struct:
    def __init__(self, **entries):
        self.__dict__.update(entries)

Then, you can use:

>>> args = {'a': 1, 'b': 2}
>>> s = Struct(**args)
>>> s
<__main__.Struct instance at 0x01D6A738>
>>> s.a
1
>>> s.b
2
qid & accept id: (1389180, 1389216) query: Python: Automatically initialize instance variables? soup:

Edit: extended the solution to honor default arguments also

\n

Here is the complete solution:

\n
from functools import wraps\nimport inspect\n\n\ndef initializer(func):\n    """\n    Automatically assigns the parameters.\n\n    >>> class process:\n    ...     @initializer\n    ...     def __init__(self, cmd, reachable=False, user='root'):\n    ...         pass\n    >>> p = process('halt', True)\n    >>> p.cmd, p.reachable, p.user\n    ('halt', True, 'root')\n    """\n    names, varargs, keywords, defaults = inspect.getargspec(func)\n\n    @wraps(func)\n    def wrapper(self, *args, **kargs):\n        for name, arg in list(zip(names[1:], args)) + list(kargs.items()):\n            setattr(self, name, arg)\n\n        for name, default in zip(reversed(names), reversed(defaults)):\n            if not hasattr(self, name):\n                setattr(self, name, default)\n\n        func(self, *args, **kargs)\n\n    return wrapper\n
\n
\n

Edit: Adam asked me to extend the solution to support keyword arguments

\n
from functools import wraps\nimport inspect\n\ndef initializer(fun):\n   names, varargs, keywords, defaults = inspect.getargspec(fun)\n   @wraps(fun)\n   def wrapper(self, *args, **kargs):\n       for name, arg in zip(names[1:], args) + kargs.items():\n           setattr(self, name, arg)\n       fun(self, *args, **kargs)\n   return wrapper\n
\n
\n

You can use a decorator:

\n
from functools import wraps\nimport inspect\n\ndef initializer(fun):\n    names, varargs, keywords, defaults = inspect.getargspec(fun)\n    @wraps(fun)\n    def wrapper(self, *args):\n        for name, arg in zip(names[1:], args):\n            setattr(self, name, arg)\n        fun(self, *args)\n    return wrapper\n\nclass process:\n    @initializer\n    def __init__(self, PID, PPID, cmd, FDs, reachable, user):\n        pass\n
\n

Output:

\n
>>> c = process(1, 2, 3, 4, 5, 6)\n>>> c.PID\n1\n>>> dir(c)\n['FDs', 'PID', 'PPID', '__doc__', '__init__', '__module__', 'cmd', 'reachable', 'user'\n
\n soup wrap:

Edit: extended the solution to honor default arguments also

Here is the complete solution:

from functools import wraps
import inspect


def initializer(func):
    """
    Automatically assigns the parameters.

    >>> class process:
    ...     @initializer
    ...     def __init__(self, cmd, reachable=False, user='root'):
    ...         pass
    >>> p = process('halt', True)
    >>> p.cmd, p.reachable, p.user
    ('halt', True, 'root')
    """
    names, varargs, keywords, defaults = inspect.getargspec(func)

    @wraps(func)
    def wrapper(self, *args, **kargs):
        for name, arg in list(zip(names[1:], args)) + list(kargs.items()):
            setattr(self, name, arg)

        for name, default in zip(reversed(names), reversed(defaults)):
            if not hasattr(self, name):
                setattr(self, name, default)

        func(self, *args, **kargs)

    return wrapper

Edit: Adam asked me to extend the solution to support keyword arguments

from functools import wraps
import inspect

def initializer(fun):
   names, varargs, keywords, defaults = inspect.getargspec(fun)
   @wraps(fun)
   def wrapper(self, *args, **kargs):
       for name, arg in zip(names[1:], args) + kargs.items():
           setattr(self, name, arg)
       fun(self, *args, **kargs)
   return wrapper

You can use a decorator:

from functools import wraps
import inspect

def initializer(fun):
    names, varargs, keywords, defaults = inspect.getargspec(fun)
    @wraps(fun)
    def wrapper(self, *args):
        for name, arg in zip(names[1:], args):
            setattr(self, name, arg)
        fun(self, *args)
    return wrapper

class process:
    @initializer
    def __init__(self, PID, PPID, cmd, FDs, reachable, user):
        pass

Output:

>>> c = process(1, 2, 3, 4, 5, 6)
>>> c.PID
1
>>> dir(c)
['FDs', 'PID', 'PPID', '__doc__', '__init__', '__module__', 'cmd', 'reachable', 'user'
qid & accept id: (1423251, 1424893) query: talking between python tcp server and a c++ client soup:
\n

client sends a PSH,ACK and then the\n server sends a PSH,ACK and a\n FIN,PSH,ACK

\n
\n

There is a FIN, so could it be that the Python version of your server is closing the connection immediately after the initial read?

\n

If you are not explicitly closing the server's socket, it's probable that the server's remote socket variable is going out of scope, thus closing it (and that this bug is not present in your C++ version)?

\n

Assuming that this is the case, I can cause a very similar TCP sequence with this code for the server:

\n
# server.py\nimport socket\nfrom time import sleep\n\ndef f(s):\n        r,a = s.accept()\n        print r.recv(100)\n\ns = socket.socket()\ns.bind(('localhost',1234))\ns.listen(1)\n\nf(s)\n# wait around a bit for the client to send it's second packet\nsleep(10)\n
\n

and this for the client:

\n
# client.py\nimport socket\nfrom time import sleep\n\ns = socket.socket()\ns.connect(('localhost',1234))\n\ns.send('hello 1')\n# wait around for a while so that the socket in server.py goes out of scope\nsleep(5)\ns.send('hello 2')\n
\n

Start your packet sniffer, then run server.py and then, client.py. Here is the outout of tcpdump -A -i lo, which matches your observations:

\n
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode\nlistening on lo, link-type EN10MB (Ethernet), capture size 96 bytes\n12:42:37.683710 IP localhost:33491 > localhost.1234: S 1129726741:1129726741(0) win 32792 \nE.. localhost:33491: S 1128039653:1128039653(0) ack 1129726742 win 32768 \nE..<..@.@.<.............C<..CVC.....Ia....@....\n&3..&3......\n12:42:37.684087 IP localhost:33491 > localhost.1234: . ack 1 win 257 \nE..4R.@.@...............CVC.C<......1......\n&3..&3..\n12:42:37.684220 IP localhost:33491 > localhost.1234: P 1:8(7) ack 1 win 257 \nE..;R.@.@...............CVC.C<......./.....\n&3..&3..hello 1\n12:42:37.684271 IP localhost.1234 > localhost:33491: . ack 8 win 256 \nE..4.(@.@...............C<..CVC.....1}.....\n&3..&3..\n12:42:37.684755 IP localhost.1234 > localhost:33491: F 1:1(0) ack 8 win 256 \nE..4.)@.@...............C<..CVC.....1{.....\n&3..&3..\n12:42:37.685639 IP localhost:33491 > localhost.1234: . ack 2 win 257 \nE..4R.@.@...............CVC.C<......1x.....\n&3..&3..\n12:42:42.683367 IP localhost:33491 > localhost.1234: P 8:15(7) ack 2 win 257 \nE..;R.@.@...............CVC.C<......./.....\n&3%W&3..hello 2\n12:42:42.683401 IP localhost.1234 > localhost:33491: R 1128039655:1128039655(0) win 0\nE..(..@.@.<.............C<......P...b...\n\n9 packets captured\n27 packets received by filter\n0 packets dropped by kernel\n
\n soup wrap:

client sends a PSH,ACK and then the server sends a PSH,ACK and a FIN,PSH,ACK

There is a FIN, so could it be that the Python version of your server is closing the connection immediately after the initial read?

If you are not explicitly closing the server's socket, it's probable that the server's remote socket variable is going out of scope, thus closing it (and that this bug is not present in your C++ version)?

Assuming that this is the case, I can cause a very similar TCP sequence with this code for the server:

# server.py
import socket
from time import sleep

def f(s):
        r,a = s.accept()
        print r.recv(100)

s = socket.socket()
s.bind(('localhost',1234))
s.listen(1)

f(s)
# wait around a bit for the client to send it's second packet
sleep(10)

and this for the client:

# client.py
import socket
from time import sleep

s = socket.socket()
s.connect(('localhost',1234))

s.send('hello 1')
# wait around for a while so that the socket in server.py goes out of scope
sleep(5)
s.send('hello 2')

Start your packet sniffer, then run server.py and then, client.py. Here is the outout of tcpdump -A -i lo, which matches your observations:

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
12:42:37.683710 IP localhost:33491 > localhost.1234: S 1129726741:1129726741(0) win 32792 
E.. localhost:33491: S 1128039653:1128039653(0) ack 1129726742 win 32768 
E..<..@.@.<.............C<..CVC.....Ia....@....
&3..&3......
12:42:37.684087 IP localhost:33491 > localhost.1234: . ack 1 win 257 
E..4R.@.@...............CVC.C<......1......
&3..&3..
12:42:37.684220 IP localhost:33491 > localhost.1234: P 1:8(7) ack 1 win 257 
E..;R.@.@...............CVC.C<......./.....
&3..&3..hello 1
12:42:37.684271 IP localhost.1234 > localhost:33491: . ack 8 win 256 
E..4.(@.@...............C<..CVC.....1}.....
&3..&3..
12:42:37.684755 IP localhost.1234 > localhost:33491: F 1:1(0) ack 8 win 256 
E..4.)@.@...............C<..CVC.....1{.....
&3..&3..
12:42:37.685639 IP localhost:33491 > localhost.1234: . ack 2 win 257 
E..4R.@.@...............CVC.C<......1x.....
&3..&3..
12:42:42.683367 IP localhost:33491 > localhost.1234: P 8:15(7) ack 2 win 257 
E..;R.@.@...............CVC.C<......./.....
&3%W&3..hello 2
12:42:42.683401 IP localhost.1234 > localhost:33491: R 1128039655:1128039655(0) win 0
E..(..@.@.<.............C<......P...b...

9 packets captured
27 packets received by filter
0 packets dropped by kernel
qid & accept id: (1448820, 1448834) query: variable length of %s with the % operator in python soup:

This is a carryover from the C formatting markup:

\n
print "%*s, blah" % (max_title_width,column)\n
\n

If you want left-justified text (for entries shorter than max_title_width), put a '-' before the '*'.

\n
>>> text = "abcdef"\n>>> print "<%*s>" % (len(text)+2,text)\n<  abcdef>\n>>> print "<%-*s>" % (len(text)+2,text)\n\n>>>\n
\n

If the len field is shorter than the text string, the string just overflows:

\n
>>> print "<%*s>" % (len(text)-2,text)\n\n
\n

If you want to clip at a maximum length, use the '.' precision field of the format placeholder:

\n
>>> print "<%.*s>" % (len(text)-2,text)\n\n
\n

Put them all together this way:

\n
%\n- if left justified\n* or integer - min width (if '*', insert variable length in data tuple)\n.* or .integer - max width (if '*', insert variable length in data tuple)\n
\n soup wrap:

This is a carryover from the C formatting markup:

print "%*s, blah" % (max_title_width,column)

If you want left-justified text (for entries shorter than max_title_width), put a '-' before the '*'.

>>> text = "abcdef"
>>> print "<%*s>" % (len(text)+2,text)
<  abcdef>
>>> print "<%-*s>" % (len(text)+2,text)

>>>

If the len field is shorter than the text string, the string just overflows:

>>> print "<%*s>" % (len(text)-2,text)

If you want to clip at a maximum length, use the '.' precision field of the format placeholder:

>>> print "<%.*s>" % (len(text)-2,text)

Put them all together this way:

%
- if left justified
* or integer - min width (if '*', insert variable length in data tuple)
.* or .integer - max width (if '*', insert variable length in data tuple)
qid & accept id: (1470453, 1470876) query: How to check which part of app is consuming CPU? soup:

I am able to solve my problem by writing a modifed version of python trace module , which can be enabled disabled, basically modify Trace class something like this

\n
import sys\nimport trace\n\nclass MyTrace(trace.Trace):\n    def __init__(self, *args, **kwargs):\n        trace.Trace.__init__(self, *args, **kwargs)\n        self.enabled = False\n\n    def localtrace_trace_and_count(self, *args, **kwargs):\n        if not self.enabled:\n            return None \n        return trace.Trace.localtrace_trace_and_count(self, *args, **kwargs)\n\ntracer = MyTrace(ignoredirs=[sys.prefix, sys.exec_prefix],)\n\ndef main():\n    a = 1\n    tracer.enabled = True\n    a = 2\n    tracer.enabled = False\n    a = 3\n\n# run the new command using the given tracer\ntracer.run('main()')\n
\n

Output:

\n
 --- modulename: untitled-2, funcname: main\nuntitled-2.py(19):     a = 2\nuntitled-2.py(20):     tracer.enabled = False\n
\n

Enabling it at the critical points helps me to trace line by line which code statements are executing most.

\n soup wrap:

I am able to solve my problem by writing a modifed version of python trace module , which can be enabled disabled, basically modify Trace class something like this

import sys
import trace

class MyTrace(trace.Trace):
    def __init__(self, *args, **kwargs):
        trace.Trace.__init__(self, *args, **kwargs)
        self.enabled = False

    def localtrace_trace_and_count(self, *args, **kwargs):
        if not self.enabled:
            return None 
        return trace.Trace.localtrace_trace_and_count(self, *args, **kwargs)

tracer = MyTrace(ignoredirs=[sys.prefix, sys.exec_prefix],)

def main():
    a = 1
    tracer.enabled = True
    a = 2
    tracer.enabled = False
    a = 3

# run the new command using the given tracer
tracer.run('main()')

Output:

 --- modulename: untitled-2, funcname: main
untitled-2.py(19):     a = 2
untitled-2.py(20):     tracer.enabled = False

Enabling it at the critical points helps me to trace line by line which code statements are executing most.

qid & accept id: (1480655, 1480829) query: How can I, on some global keystroke, paste some text to current active application in linux with Python or C++ soup:

You can use the xmacroplay utility from xmacro to do this under X windows I think. Either use it directly - send it commands to standard input using the subprocess module, or read the source code and find out how it does it! I don't think there are python bindings for it.

\n

From the xmacroplay website

\n
xmacroplay:\nReads lines from the standard input. It can understand the following lines:\n\nDelay [sec]     - delays the program with [sec] secundums\nButtonPress [n] - sends a ButtonPress event with button [n]\n          this emulates the pressing of the mouse button [n]\nButtonRelease [n]   - sends a ButtonRelease event with button [n]\n          this emulates the releasing of the mouse button [n]\n... snip lots more ...\n
\n

This is probably the command you are interested in

\n
String [max. 1024 long string]\n        - Sends the string as single characters converted to\n          KeyPress and KeyRelease events based on a\n          character table in chartbl.h (currently only\n          Latin1 is used...)\n
\n

There is also Xnee which does a similar thing.

\n soup wrap:

You can use the xmacroplay utility from xmacro to do this under X windows I think. Either use it directly - send it commands to standard input using the subprocess module, or read the source code and find out how it does it! I don't think there are python bindings for it.

From the xmacroplay website

xmacroplay:
Reads lines from the standard input. It can understand the following lines:

Delay [sec]     - delays the program with [sec] secundums
ButtonPress [n] - sends a ButtonPress event with button [n]
          this emulates the pressing of the mouse button [n]
ButtonRelease [n]   - sends a ButtonRelease event with button [n]
          this emulates the releasing of the mouse button [n]
... snip lots more ...

This is probably the command you are interested in

String [max. 1024 long string]
        - Sends the string as single characters converted to
          KeyPress and KeyRelease events based on a
          character table in chartbl.h (currently only
          Latin1 is used...)

There is also Xnee which does a similar thing.

qid & accept id: (1527689, 1528023) query: exit from ipython soup:
>>> import sys\n>>> class Quitter(object):\n...     def __repr__(self):\n...         sys.exit()\n... \n>>> exit = Quitter()\n
\n

You can use it like this:

\n
>>> exit\n
\n

EDIT:

\n

I dont use ipython myself, but it seems to have some wierd sys.exit handler.\nThe solution I found is as follows:

\n
In [1]: type(exit).__repr__ = lambda s: setattr(s.shell, 'exit_now', True) or ''\n
\n

Usage:

\n
In [2]: exit\n
\n soup wrap:
>>> import sys
>>> class Quitter(object):
...     def __repr__(self):
...         sys.exit()
... 
>>> exit = Quitter()

You can use it like this:

>>> exit

EDIT:

I dont use ipython myself, but it seems to have some wierd sys.exit handler. The solution I found is as follows:

In [1]: type(exit).__repr__ = lambda s: setattr(s.shell, 'exit_now', True) or ''

Usage:

In [2]: exit
qid & accept id: (1598932, 1599090) query: Atomic increment of a counter in django soup:

New in Django 1.1

\n
Counter.objects.get_or_create(name = name)\nCounter.objects.filter(name = name).update(count = F('count')+1)\n
\n

or using an F expression:

\n
counter = Counter.objects.get_or_create(name = name)\ncounter.count = F('count') +1\ncounter.save()\n
\n soup wrap:

New in Django 1.1

Counter.objects.get_or_create(name = name)
Counter.objects.filter(name = name).update(count = F('count')+1)

or using an F expression:

counter = Counter.objects.get_or_create(name = name)
counter.count = F('count') +1
counter.save()
qid & accept id: (1606436, 1606478) query: Adding docstrings to namedtuples? soup:

You can achieve this by creating a simple, empty wrapper class around the returned value from namedtuple. Contents of a file I created (nt.py):

\n
from collections import namedtuple\n\nPoint_ = namedtuple("Point", ["x", "y"])\n\nclass Point(Point_):\n    """ A point in 2d space """\n    pass\n
\n

Then in the Python REPL:

\n
>>> print nt.Point.__doc__\n A point in 2d space \n
\n

Or you could do:

\n
>>> help(nt.Point)  # which outputs...\n
\n
\nHelp on class Point in module nt:\n\nclass Point(Point)\n |  A point in 2d space\n |  \n |  Method resolution order:\n |      Point\n |      Point\n |      __builtin__.tuple\n |      __builtin__.object\n ...\n
\n

If you don't like doing that by hand every time, it's trivial to write a sort-of factory function to do this:

\n
def NamedTupleWithDocstring(docstring, *ntargs):\n    nt = namedtuple(*ntargs)\n    class NT(nt):\n        __doc__ = docstring\n    return NT\n\nPoint3D = NamedTupleWithDocstring("A point in 3d space", "Point3d", ["x", "y", "z"])\n\np3 = Point3D(1,2,3)\n\nprint p3.__doc__\n
\n

which outputs:

\n
A point in 3d space\n
\n soup wrap:

You can achieve this by creating a simple, empty wrapper class around the returned value from namedtuple. Contents of a file I created (nt.py):

from collections import namedtuple

Point_ = namedtuple("Point", ["x", "y"])

class Point(Point_):
    """ A point in 2d space """
    pass

Then in the Python REPL:

>>> print nt.Point.__doc__
 A point in 2d space 

Or you could do:

>>> help(nt.Point)  # which outputs...
Help on class Point in module nt:

class Point(Point)
 |  A point in 2d space
 |  
 |  Method resolution order:
 |      Point
 |      Point
 |      __builtin__.tuple
 |      __builtin__.object
 ...

If you don't like doing that by hand every time, it's trivial to write a sort-of factory function to do this:

def NamedTupleWithDocstring(docstring, *ntargs):
    nt = namedtuple(*ntargs)
    class NT(nt):
        __doc__ = docstring
    return NT

Point3D = NamedTupleWithDocstring("A point in 3d space", "Point3d", ["x", "y", "z"])

p3 = Point3D(1,2,3)

print p3.__doc__

which outputs:

A point in 3d space
qid & accept id: (1673483, 1673882) query: How to store callback methods? soup:

I have asked the same question here! In my question, I talk about GObject, but recognize it is a general problem in any kind of Python! I got help by lioro there, and what I use in my current code is below. Some important points:

\n\n

.

\n
class WeakCallback (object):\n    """A Weak Callback object that will keep a reference to\n    the connecting object with weakref semantics.\n\n    This allows object A to pass a callback method to object S,\n    without object S keeping A alive.\n    """\n    def __init__(self, mcallback):\n        """Create a new Weak Callback calling the method @mcallback"""\n        obj = mcallback.im_self\n        attr = mcallback.im_func.__name__\n        self.wref = weakref.ref(obj, self.object_deleted)\n        self.callback_attr = attr\n        self.token = None\n\n    def __call__(self, *args, **kwargs):\n        obj = self.wref()\n        if obj:\n            attr = getattr(obj, self.callback_attr)\n            attr(*args, **kwargs)\n        else:\n            self.default_callback(*args, **kwargs)\n\n    def default_callback(self, *args, **kwargs):\n        """Called instead of callback when expired"""\n        pass\n\n    def object_deleted(self, wref):\n        """Called when callback expires"""\n        pass\n
\n

Usage notes:

\n
# illustration how I typically use it\nweak_call = WeakCallback(self._something_changed)\nlong_lived_object.connect("on_change", weak_call)\n
\n

I use the WeakCallback.token attribute in subclasses I've made to manage disconnecting the callback when the connecter goes away

\n soup wrap:

I have asked the same question here! In my question, I talk about GObject, but recognize it is a general problem in any kind of Python! I got help by lioro there, and what I use in my current code is below. Some important points:

.

class WeakCallback (object):
    """A Weak Callback object that will keep a reference to
    the connecting object with weakref semantics.

    This allows object A to pass a callback method to object S,
    without object S keeping A alive.
    """
    def __init__(self, mcallback):
        """Create a new Weak Callback calling the method @mcallback"""
        obj = mcallback.im_self
        attr = mcallback.im_func.__name__
        self.wref = weakref.ref(obj, self.object_deleted)
        self.callback_attr = attr
        self.token = None

    def __call__(self, *args, **kwargs):
        obj = self.wref()
        if obj:
            attr = getattr(obj, self.callback_attr)
            attr(*args, **kwargs)
        else:
            self.default_callback(*args, **kwargs)

    def default_callback(self, *args, **kwargs):
        """Called instead of callback when expired"""
        pass

    def object_deleted(self, wref):
        """Called when callback expires"""
        pass

Usage notes:

# illustration how I typically use it
weak_call = WeakCallback(self._something_changed)
long_lived_object.connect("on_change", weak_call)

I use the WeakCallback.token attribute in subclasses I've made to manage disconnecting the callback when the connecter goes away

qid & accept id: (1738633, 1738653) query: More pythonic way to find a complementary DNA strand soup:

Probably the most efficient way to do it, if the string is long enough:

\n
import string\n\ndef complementary_strand(self, strand):\n    return strand.translate(string.maketrans('TAGCtagc', 'ATCGATCG'))\n
\n

This is making use of the translate and maketrans methods. You can also move the translate table creation outside the function:

\n
import string\ndef __init__(self, ...):\n    self.trans = string.maketrans('TAGCtagc', 'ATCGATCG')\n\ndef complementary_strand(self, strand):\n    return strand.translate(self.trans)\n
\n soup wrap:

Probably the most efficient way to do it, if the string is long enough:

import string

def complementary_strand(self, strand):
    return strand.translate(string.maketrans('TAGCtagc', 'ATCGATCG'))

This is making use of the translate and maketrans methods. You can also move the translate table creation outside the function:

import string
def __init__(self, ...):
    self.trans = string.maketrans('TAGCtagc', 'ATCGATCG')

def complementary_strand(self, strand):
    return strand.translate(self.trans)
qid & accept id: (1767565, 1767569) query: Plotting Histogram: How can I do it from scratch using data stored in a database? soup:

The solution below assumes that you have MySQL, Python and GNUPlot. The specific details can be fine tuned if necessary. Posting it so that it could be a baseline for other peers.

\n

Step #1: Decide the type of graph.

\n

If it is a frequency plot of some kind, then a simple SQL query should do the trick:

\n
select total, count(total) from faults GROUP BY total;\n
\n

If you need to specify bin sizes, then proceed to the next step.

\n

Step #2: Make sure you are able to connect to MySQL using Python. You can use the MySQLdb import to do this.

\n

After that, the python code to generate data for a histogram plot is the following (this was written precisely in 5 minutes so it is very crude):

\n
import MySQLdb\n\ndef DumpHistogramData(databaseHost, databaseName, databaseUsername, databasePassword, dataTableName, binsTableName, binSize, histogramDataFilename):\n    #Open a file for writing into\n    output = open("./" + histogramDataFilename, "w")\n\n    #Connect to the database\n    db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)\n    cursor = db.cursor()\n\n    #Form the query\n    sql = """select b.*, count(*) as total \n            FROM """ + binsTableName + """ b \n            LEFT OUTER JOIN """ + dataTableName + """ a \n            ON a.total between b.min AND b.max \n            group by b.min;"""\n    cursor.execute(sql)\n\n    #Get the result and print it into a file for further processing\n    count = 0;\n    while True:\n        results = cursor.fetchmany(10000)\n        if not results:\n            break\n        for result in results:\n            #print >> output, str(result[0]) + "-" + str(result[1]) + "\t" + str(result[2])\n    db.close()\n\ndef PrepareHistogramBins(databaseHost, databaseName, databaseUsername, databasePassword, binsTableName, maxValue, totalBins):\n\n    #Connect to the database    \n    db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)\n    cursor = db.cursor()\n\n    #Check if the table was already created\n    sql = """DROP TABLE IF EXISTS """ + binsTableName\n    cursor.execute(sql)\n\n    #Create the table\n    sql = """CREATE TABLE """ + binsTableName + """(min int(11), max int(11));"""\n    cursor.execute(sql)\n\n    #Calculate the bin size\n    binSize = maxValue/totalBins\n\n    #Generate the bin sizes\n    for i in range(0, maxValue, binSize):\n        if i is 0:\n            min = i\n            max = i+binSize\n        else:\n            min = i+1\n            max = i+binSize\n        sql = """INSERT INTO """ + binsTableName + """(min, max) VALUES(""" + str(min) + """, """ + str(max) + """);"""\n        cursor.execute(sql)\n    db.close()\n    return binSize\n\nbinSize = PrepareHistogramBins("localhost", "testing", "root", "", "bins", 5000, 100)\nDumpHistogramData("localhost", "testing", "root", "", "faults", "bins", binSize, "histogram")\n
\n

Step #3: Use GNUPlot to generate the histogram. You can use the following script as a starting point (generates an eps image file):

\n
set terminal postscript eps color lw 2 "Helvetica" 20\nset output "output.eps"\nset xlabel "XLABEL"\nset ylabel "YLABEL"\nset title "TITLE"\nset style data histogram\nset style histogram cluster gap 1\nset style fill solid border -1\nset boxwidth 0.9\nset key autotitle columnheader\nset xtics rotate by -45\nplot "input" using 1:2 with linespoints ls 1\n
\n

Save the above script into some arbitrary file say, sample.script. Proceed to the next step.

\n

Step #4: Use gnuplot with the above input script to generate an eps file

\n
gnuplot sample.script\n
\n

Nothing complicated but I figured a couple of bits from this code can be reused. Again, like I said, it is not perfect but you can get the job done :)

\n

Credits:

\n\n soup wrap:

The solution below assumes that you have MySQL, Python and GNUPlot. The specific details can be fine tuned if necessary. Posting it so that it could be a baseline for other peers.

Step #1: Decide the type of graph.

If it is a frequency plot of some kind, then a simple SQL query should do the trick:

select total, count(total) from faults GROUP BY total;

If you need to specify bin sizes, then proceed to the next step.

Step #2: Make sure you are able to connect to MySQL using Python. You can use the MySQLdb import to do this.

After that, the python code to generate data for a histogram plot is the following (this was written precisely in 5 minutes so it is very crude):

import MySQLdb

def DumpHistogramData(databaseHost, databaseName, databaseUsername, databasePassword, dataTableName, binsTableName, binSize, histogramDataFilename):
    #Open a file for writing into
    output = open("./" + histogramDataFilename, "w")

    #Connect to the database
    db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
    cursor = db.cursor()

    #Form the query
    sql = """select b.*, count(*) as total 
            FROM """ + binsTableName + """ b 
            LEFT OUTER JOIN """ + dataTableName + """ a 
            ON a.total between b.min AND b.max 
            group by b.min;"""
    cursor.execute(sql)

    #Get the result and print it into a file for further processing
    count = 0;
    while True:
        results = cursor.fetchmany(10000)
        if not results:
            break
        for result in results:
            #print >> output, str(result[0]) + "-" + str(result[1]) + "\t" + str(result[2])
    db.close()

def PrepareHistogramBins(databaseHost, databaseName, databaseUsername, databasePassword, binsTableName, maxValue, totalBins):

    #Connect to the database    
    db = MySQLdb.connect(databaseHost, databaseUsername, databasePassword, databaseName)
    cursor = db.cursor()

    #Check if the table was already created
    sql = """DROP TABLE IF EXISTS """ + binsTableName
    cursor.execute(sql)

    #Create the table
    sql = """CREATE TABLE """ + binsTableName + """(min int(11), max int(11));"""
    cursor.execute(sql)

    #Calculate the bin size
    binSize = maxValue/totalBins

    #Generate the bin sizes
    for i in range(0, maxValue, binSize):
        if i is 0:
            min = i
            max = i+binSize
        else:
            min = i+1
            max = i+binSize
        sql = """INSERT INTO """ + binsTableName + """(min, max) VALUES(""" + str(min) + """, """ + str(max) + """);"""
        cursor.execute(sql)
    db.close()
    return binSize

binSize = PrepareHistogramBins("localhost", "testing", "root", "", "bins", 5000, 100)
DumpHistogramData("localhost", "testing", "root", "", "faults", "bins", binSize, "histogram")

Step #3: Use GNUPlot to generate the histogram. You can use the following script as a starting point (generates an eps image file):

set terminal postscript eps color lw 2 "Helvetica" 20
set output "output.eps"
set xlabel "XLABEL"
set ylabel "YLABEL"
set title "TITLE"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
set key autotitle columnheader
set xtics rotate by -45
plot "input" using 1:2 with linespoints ls 1

Save the above script into some arbitrary file say, sample.script. Proceed to the next step.

Step #4: Use gnuplot with the above input script to generate an eps file

gnuplot sample.script

Nothing complicated but I figured a couple of bits from this code can be reused. Again, like I said, it is not perfect but you can get the job done :)

Credits:

qid & accept id: (1777344, 1777365) query: How to detect Mac OS version using Python? soup:
>>> import platform\n>>> platform.mac_ver()\n('10.5.8', ('', '', ''), 'i386')\n
\n

As you see, the first item of the tuple mac_ver returns is a string, not a number (hard to make '10.5.8' into a number!-), but it's pretty easy to manipulate the 10.x.y string into the kind of numbers you want. For example,

\n
>>> v, _, _ = platform.mac_ver()\n>>> v = float('.'.join(v.split('.')[:2]))\n>>> print v\n10.5\n
\n

If you prefer the Darwin kernel version rather than the MacOSX version, that's also easy to access -- use the similarly-formatted string that's the third item of the tuple returned by platform.uname().

\n soup wrap:
>>> import platform
>>> platform.mac_ver()
('10.5.8', ('', '', ''), 'i386')

As you see, the first item of the tuple mac_ver returns is a string, not a number (hard to make '10.5.8' into a number!-), but it's pretty easy to manipulate the 10.x.y string into the kind of numbers you want. For example,

>>> v, _, _ = platform.mac_ver()
>>> v = float('.'.join(v.split('.')[:2]))
>>> print v
10.5

If you prefer the Darwin kernel version rather than the MacOSX version, that's also easy to access -- use the similarly-formatted string that's the third item of the tuple returned by platform.uname().

qid & accept id: (1781554, 1781605) query: regular expression matching everything except a given regular expression soup:
^(?!mpeg).*\n
\n

This uses a negative lookahead to only match a string where the beginning doesn't match mpeg. Essentially, it requires that "the position at the beginning of the string cannot be a position where if we started matching the regex mpeg, we could successfully match" - thus matching anything which doesn't start with mpeg, and not matching anything that does.

\n

However, I'd be curious about the context in which you're using this - there might be other options aside from regex which would be either more efficient or more readable, such as...

\n
if not inputstring.startswith("mpeg"):\n
\n soup wrap:
^(?!mpeg).*

This uses a negative lookahead to only match a string where the beginning doesn't match mpeg. Essentially, it requires that "the position at the beginning of the string cannot be a position where if we started matching the regex mpeg, we could successfully match" - thus matching anything which doesn't start with mpeg, and not matching anything that does.

However, I'd be curious about the context in which you're using this - there might be other options aside from regex which would be either more efficient or more readable, such as...

if not inputstring.startswith("mpeg"):
qid & accept id: (1783251, 1790187) query: Growing matrices columnwise in NumPy soup:

NumPy actually does have an append function, which it seems might do what you want, e.g.,

\n
import numpy as NP\nmy_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)\nnew_col = NP.array((5, 5, 5)).reshape(3, 1)\nres = NP.append(my_data, new_col, axis=1)\n
\n

your second snippet (hstack) will work if you add another line, e.g.,

\n
my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)\n# the line to add--does not depend on array dimensions\nnew_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)\nres = NP.hstack((my_data, new_col))\n
\n

hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.

\n
\n

While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:

\n

initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).

\n

For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:

\n
>>> # initialize your skeleton array using 'empty' for lowest-memory footprint \n>>> M = NP.empty(shape=(10, 5), dtype=float)\n\n>>> # create a small function to mimic step-wise populating this empty 2D array:\n>>> fnx = lambda v : NP.random.randint(0, 10, v)\n
\n

populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets

\n
>>> for index, itm in enumerate(range(5)):    \n        M[:,index] = fnx(10)\n\n>>> M\n  array([[ 1.,  7.,  0.,  8.,  7.],\n         [ 9.,  0.,  6.,  9.,  4.],\n         [ 2.,  3.,  6.,  3.,  4.],\n         [ 3.,  4.,  1.,  0.,  5.],\n         [ 2.,  3.,  5.,  3.,  0.],\n         [ 4.,  6.,  5.,  6.,  2.],\n         [ 0.,  6.,  1.,  6.,  8.],\n         [ 3.,  8.,  0.,  8.,  0.],\n         [ 5.,  2.,  5.,  0.,  1.],\n         [ 0.,  6.,  5.,  9.,  1.]])\n
\n

of course if you don't known in advance what size your array should be\njust create one much bigger than you need and trim the 'unused' portions\nwhen you finish populating it

\n
>>> M[:3,:3]\n  array([[ 9.,  3.,  1.],\n         [ 9.,  6.,  8.],\n         [ 9.,  7.,  5.]])\n
\n soup wrap:

NumPy actually does have an append function, which it seems might do what you want, e.g.,

import numpy as NP
my_data = NP.random.random_integers(0, 9, 9).reshape(3, 3)
new_col = NP.array((5, 5, 5)).reshape(3, 1)
res = NP.append(my_data, new_col, axis=1)

your second snippet (hstack) will work if you add another line, e.g.,

my_data = NP.random.random_integers(0, 9, 16).reshape(4, 4)
# the line to add--does not depend on array dimensions
new_col = NP.zeros_like(my_data[:,-1]).reshape(-1, 1)
res = NP.hstack((my_data, new_col))

hstack gives the same result as concatenate((my_data, new_col), axis=1), i'm not sure how they compare performance-wise.


While that's the most direct answer to your question, i should mention that looping through a data source to populate a target via append, while just fine in python, is not idiomatic NumPy. Here's why:

initializing a NumPy array is relatively expensive, and with this conventional python pattern, you incur that cost, more or less, at each loop iteration (i.e., each append to a NumPy array is roughly like initializing a new array with a different size).

For that reason, the common pattern in NumPy for iterative addition of columns to a 2D array is to initialize an empty target array once(or pre-allocate a single 2D NumPy array having all of the empty columns) the successively populate those empty columns by setting the desired column-wise offset (index)--much easier to show than to explain:

>>> # initialize your skeleton array using 'empty' for lowest-memory footprint 
>>> M = NP.empty(shape=(10, 5), dtype=float)

>>> # create a small function to mimic step-wise populating this empty 2D array:
>>> fnx = lambda v : NP.random.randint(0, 10, v)

populate NumPy array as in the OP, except each iteration just re-sets the values of M at successive column-wise offsets

>>> for index, itm in enumerate(range(5)):    
        M[:,index] = fnx(10)

>>> M
  array([[ 1.,  7.,  0.,  8.,  7.],
         [ 9.,  0.,  6.,  9.,  4.],
         [ 2.,  3.,  6.,  3.,  4.],
         [ 3.,  4.,  1.,  0.,  5.],
         [ 2.,  3.,  5.,  3.,  0.],
         [ 4.,  6.,  5.,  6.,  2.],
         [ 0.,  6.,  1.,  6.,  8.],
         [ 3.,  8.,  0.,  8.,  0.],
         [ 5.,  2.,  5.,  0.,  1.],
         [ 0.,  6.,  5.,  9.,  1.]])

of course if you don't known in advance what size your array should be just create one much bigger than you need and trim the 'unused' portions when you finish populating it

>>> M[:3,:3]
  array([[ 9.,  3.,  1.],
         [ 9.,  6.,  8.],
         [ 9.,  7.,  5.]])
qid & accept id: (1794346, 1795498) query: Accessing xrange internal structure soup:

You can access data you need without ctypes:

\n
>>> obj = xrange(1,11,2)\n>>> obj.__reduce__()[1]\n(1, 11, 2)\n>>> len(obj)\n5\n
\n

Note, that __reduce__() method is exactly for serialization. Read this chapter in documentation for more information.

\n

Update: But sure you can access internal data with ctypes too:

\n
from ctypes import *\n\nPyObject_HEAD = [\n    ('ob_refcnt', c_size_t),\n    ('ob_type', c_void_p),\n]\n\nclass XRangeType(Structure):\n    _fields_ = PyObject_HEAD + [\n        ('start', c_long),\n        ('step', c_long),\n        ('len', c_long),\n    ]\n\nrange_obj = xrange(1, 11, 2)\n\nc_range_obj = cast(c_void_p(id(range_obj)), POINTER(XRangeType)).contents\nprint c_range_obj.start, c_range_obj.step, c_range_obj.len\n
\n soup wrap:

You can access data you need without ctypes:

>>> obj = xrange(1,11,2)
>>> obj.__reduce__()[1]
(1, 11, 2)
>>> len(obj)
5

Note, that __reduce__() method is exactly for serialization. Read this chapter in documentation for more information.

Update: But sure you can access internal data with ctypes too:

from ctypes import *

PyObject_HEAD = [
    ('ob_refcnt', c_size_t),
    ('ob_type', c_void_p),
]

class XRangeType(Structure):
    _fields_ = PyObject_HEAD + [
        ('start', c_long),
        ('step', c_long),
        ('len', c_long),
    ]

range_obj = xrange(1, 11, 2)

c_range_obj = cast(c_void_p(id(range_obj)), POINTER(XRangeType)).contents
print c_range_obj.start, c_range_obj.step, c_range_obj.len
qid & accept id: (1822934, 1822969) query: How do I reference classes using IronPython? soup:

It should just be

\n
import [namespace]\n
\n

for common .NET libraries and namespaces, such as System

\n

to use additional assemblies, first need to import clr then add a reference to additional assemblies

\n
import clr\nclr.AddReference("System.Xml")\nfrom System.Xml import *\n
\n

Take a look at

\n\n

Also, have a look at where you installed IronPython. There is a lot of detail in the Tutorial.htm that can be found in \IronPython 2.0.1\Tutorial\Tutorial.htm

\n

You generally create instance of classes like so

\n
from System.Collections import *\n# create an instance of Hashtable\nh = Hashtable() \n\nfrom System.Collections.Generic import *\n# create an instance of List\nl = List[str]()\n
\n soup wrap:

It should just be

import [namespace]

for common .NET libraries and namespaces, such as System

to use additional assemblies, first need to import clr then add a reference to additional assemblies

import clr
clr.AddReference("System.Xml")
from System.Xml import *

Take a look at

Also, have a look at where you installed IronPython. There is a lot of detail in the Tutorial.htm that can be found in \IronPython 2.0.1\Tutorial\Tutorial.htm

You generally create instance of classes like so

from System.Collections import *
# create an instance of Hashtable
h = Hashtable() 

from System.Collections.Generic import *
# create an instance of List
l = List[str]()
qid & accept id: (1869034, 1869082) query: Removing custom widget from QVBoxLayout soup:

You can do this:

\n
import sip # you'll need this import (no worries, it ships with your pyqt install)\nsip.delete(self.sv_widgets[purchase.id])\n
\n

sip.delete(obj) explicitely calls the destructor on the corresponding C++ object. removeWidget does not cause this destructor to be called (it still has a parent at that point) and del only marks the Python object for garbage collection.

\n

You can achieve the same by doing (propably cleaner):

\n
self.vl_seatView.removeWidget(self.sv_widgets[purchase.id])\nself.sv_widgets[purchase.id].setParent(None)\ndel self.sv_widgets[purchase.id]\n
\n soup wrap:

You can do this:

import sip # you'll need this import (no worries, it ships with your pyqt install)
sip.delete(self.sv_widgets[purchase.id])

sip.delete(obj) explicitely calls the destructor on the corresponding C++ object. removeWidget does not cause this destructor to be called (it still has a parent at that point) and del only marks the Python object for garbage collection.

You can achieve the same by doing (propably cleaner):

self.vl_seatView.removeWidget(self.sv_widgets[purchase.id])
self.sv_widgets[purchase.id].setParent(None)
del self.sv_widgets[purchase.id]
qid & accept id: (1885314, 1885447) query: Parsing multilevel text list soup:
class ListParser:\n\n def __init__(self, s):\n  self.str = s.split("\n")\n  print self.str\n  self.answer = []\n\n def parse(self):\n  self.nextLine()\n  self.topList()\n  return\n\n def topList(self):\n  while(len(self.str) > 0):\n   self.topListItem()\n\n def topListItem(self):\n  l = self.nextLine()\n  print "TOP: " + l\n  l = self.nextLine()\n  if l != '':\n   raise Exception("expected blank line but found '%s'" % l)\n  sub = self.sublist()\n\n def nextLine(self):\n  return self.str.pop(0)\n\n def sublist(self):\n  while True:\n   l = self.nextLine()\n   if l == '':\n    return # end of sublist marked by blank line\n   else:\n    print "SUB: " + l\n\nparser = ListParser(s)\nparser.parse() \nprint "done"\n
\n

prints

\n
TOP: 1 List name\nSUB: 1 item\nSUB: 2 item\nSUB: 3 item\nTOP: 2 List name\nSUB: 1 item\nSUB: 2 item\nSUB: 3 item\nTOP: 3 List name\nSUB: 1 item\nSUB: 2 item\nSUB: 3 item\ndone\n
\n soup wrap:
class ListParser:

 def __init__(self, s):
  self.str = s.split("\n")
  print self.str
  self.answer = []

 def parse(self):
  self.nextLine()
  self.topList()
  return

 def topList(self):
  while(len(self.str) > 0):
   self.topListItem()

 def topListItem(self):
  l = self.nextLine()
  print "TOP: " + l
  l = self.nextLine()
  if l != '':
   raise Exception("expected blank line but found '%s'" % l)
  sub = self.sublist()

 def nextLine(self):
  return self.str.pop(0)

 def sublist(self):
  while True:
   l = self.nextLine()
   if l == '':
    return # end of sublist marked by blank line
   else:
    print "SUB: " + l

parser = ListParser(s)
parser.parse() 
print "done"

prints

TOP: 1 List name
SUB: 1 item
SUB: 2 item
SUB: 3 item
TOP: 2 List name
SUB: 1 item
SUB: 2 item
SUB: 3 item
TOP: 3 List name
SUB: 1 item
SUB: 2 item
SUB: 3 item
done
qid & accept id: (1933784, 1933811) query: How do you clone a class in Python? soup:

I'm pretty sure whatever you are trying to do can be solved in a better way, but here is something that gives you a clone of the class with a new id:

\n
def c():\n    class Clone(object):\n        pass\n\n    return Clone\n\nc1 = c()\nc2 = c()\nprint id(c1)\nprint id(c2)\n
\n

gives:

\n
4303713312\n4303831072\n
\n soup wrap:

I'm pretty sure whatever you are trying to do can be solved in a better way, but here is something that gives you a clone of the class with a new id:

def c():
    class Clone(object):
        pass

    return Clone

c1 = c()
c2 = c()
print id(c1)
print id(c2)

gives:

4303713312
4303831072
qid & accept id: (1938894, 1939102) query: csv to sparse matrix in python soup:

Example using lil_matrix (list of list matrix) of scipy.

\n
\n

Row-based linked list matrix.

\n

This contains a list (self.rows) of rows, each of which is a sorted list of column indices of non-zero elements. It also contains a list (self.data) of lists of these elements.

\n
\n
$ cat 1938894-simplified.csv\n0,32\n1,21\n1,23\n1,32\n2,23\n2,53\n2,82\n3,82\n4,46\n5,75\n7,86\n8,28\n
\n

Code:

\n
#!/usr/bin/env python\n\nimport csv\nfrom scipy import sparse\n\nrows, columns = 10, 100\nmatrix = sparse.lil_matrix( (rows, columns) )\n\ncsvreader = csv.reader(open('1938894-simplified.csv'))\nfor line in csvreader:\n    row, column = map(int, line)\n    matrix.data[row].append(column)\n\nprint matrix.data\n
\n

Output:

\n
[[32] [21, 23, 32] [23, 53, 82] [82] [46] [75] [] [86] [28] []]\n
\n soup wrap:

Example using lil_matrix (list of list matrix) of scipy.

Row-based linked list matrix.

This contains a list (self.rows) of rows, each of which is a sorted list of column indices of non-zero elements. It also contains a list (self.data) of lists of these elements.

$ cat 1938894-simplified.csv
0,32
1,21
1,23
1,32
2,23
2,53
2,82
3,82
4,46
5,75
7,86
8,28

Code:

#!/usr/bin/env python

import csv
from scipy import sparse

rows, columns = 10, 100
matrix = sparse.lil_matrix( (rows, columns) )

csvreader = csv.reader(open('1938894-simplified.csv'))
for line in csvreader:
    row, column = map(int, line)
    matrix.data[row].append(column)

print matrix.data

Output:

[[32] [21, 23, 32] [23, 53, 82] [82] [46] [75] [] [86] [28] []]
qid & accept id: (1960516, 1960649) query: Python JSON serialize a Decimal object soup:

How about subclassing json.JSONEncoder?

\n
class DecimalEncoder(json.JSONEncoder):\n    def _iterencode(self, o, markers=None):\n        if isinstance(o, decimal.Decimal):\n            # wanted a simple yield str(o) in the next line,\n            # but that would mean a yield on the line with super(...),\n            # which wouldn't work (see my comment below), so...\n            return (str(o) for o in [o])\n        return super(DecimalEncoder, self)._iterencode(o, markers)\n
\n

Then use it like so:

\n
json.dumps({'x': decimal.Decimal('5.5')}, cls=DecimalEncoder)\n
\n soup wrap:

How about subclassing json.JSONEncoder?

class DecimalEncoder(json.JSONEncoder):
    def _iterencode(self, o, markers=None):
        if isinstance(o, decimal.Decimal):
            # wanted a simple yield str(o) in the next line,
            # but that would mean a yield on the line with super(...),
            # which wouldn't work (see my comment below), so...
            return (str(o) for o in [o])
        return super(DecimalEncoder, self)._iterencode(o, markers)

Then use it like so:

json.dumps({'x': decimal.Decimal('5.5')}, cls=DecimalEncoder)
qid & accept id: (2005234, 2054374) query: Asynchronous data through Bloomberg's new data API (COM v3) with Python? soup:

I finally figured it out. I did a fair bit of combrowse.py detective work, and I compared with the JAVA, C, C++, and .NET examples in the BBG API download. Interestingly enough the Bloomberg Helpdesk people knew pretty much null when it came to these things, or perhaps I was just talking to the wrong person.

\n

Here is my code.

\n

asynchronousHandler.py:

\n
import win32com.client\nfrom pythoncom import PumpWaitingMessages\nfrom time import time, strftime\nimport constants\n\nclass EventHandler:\n    def OnProcessEvent(self, result):\n        event = win32com.client.gencache.EnsureDispatch(result) \n        if event.EventType == constants.SUBSCRIPTION_DATA:\n            self.getData(event)\n        elif event.EventType == constants.SUBSCRIPTION_STATUS:\n            self.getStatus(event)\n        else:\n            self.getMisc(event)\n    def getData(self, event):\n        iterator = event.CreateMessageIterator()\n        while iterator.Next():\n            message = iterator.Message  \n            dataString = ''\n            for fieldIndex, field in enumerate(constants.fields):           \n                if message.AsElement.HasElement(field):\n                    element = message.GetElement(field)\n                    if element.IsNull:\n                        theValue = ''\n                    else:\n                        theValue = ', Value: ' + str(element.Value) \n                    dataString = dataString + ', (Type: ' + element.Name + theValue + ')'\n            print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + dataString\n    def getMisc(self, event):\n        iterator = event.CreateMessageIterator()\n        while iterator.Next():\n            message = iterator.Message\n            print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString\n    def getStatus(self, event):\n        iterator = event.CreateMessageIterator()\n        while iterator.Next():\n            message = iterator.Message\n            if message.AsElement.HasElement('reason'):\n                element = message.AsElement.GetElement('reason')\n                print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + ', Category: ' + element.GetElement('category').Value + ', Description: ' + element.GetElement('description').Value \n            if message.AsElement.HasElement('exceptions'):\n                element = message.AsElement.GetElement('exceptions')\n                exceptionString = ''\n                for n in range(element.NumValues):\n                    exceptionInfo = element.GetValue(n)\n                    fieldId = exceptionInfo.GetElement('fieldId')\n                    reason = exceptionInfo.GetElement('reason')\n                    exceptionString = exceptionString + ', (Field: ' + fieldId.Value + ', Category: ' + reason.GetElement('category').Value + ', Description: ' + reason.GetElement('description').Value + ') ' \n                print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + exceptionString\n\nclass bloombergSource:\n    def __init__(self):\n        session = win32com.client.DispatchWithEvents('blpapicom.Session' , EventHandler)\n        session.Start()\n        started = session.OpenService('//blp/mktdata')\n        subscriptions = session.CreateSubscriptionList()\n        for tickerIndex, ticker in enumerate(constants.tickers):\n            if len(constants.interval) > 0:\n                subscriptions.AddEx(ticker, constants.fields, constants.interval, session.CreateCorrelationId(tickerIndex))\n            else:\n                subscriptions.Add(ticker, constants.fields, session.CreateCorrelationId(tickerIndex))   \n        session.Subscribe(subscriptions)\n        endTime = time() + 2\n        while True:\n            PumpWaitingMessages()\n            if endTime < time():                \n                break               \n\nif __name__ == "__main__":\n    aBloombergSource = bloombergSource()\n
\n

constants.py:

\n
ADMIN = 1\nAUTHORIZATION_STATUS = 11\nBLPSERVICE_STATUS = 9\nPARTIAL_RESPONSE = 6\nPUBLISHING_DATA = 13\nREQUEST_STATUS = 4\nRESOLUTION_STATUS = 12\nRESPONSE = 5\nSESSION_STATUS = 2\nSUBSCRIPTION_DATA = 8\nSUBSCRIPTION_STATUS = 3\nTIMEOUT = 10\nTOKEN_STATUS = 15\nTOPIC_STATUS = 14\nUNKNOWN = -1\nfields = ['BID']\ntickers = ['AUD Curncy']\ninterval = '' #'interval=5.0'\n
\n

For historical data I used this simple script:

\n
import win32com.client\n\nsession = win32com.client.Dispatch('blpapicom.Session')\nsession.QueueEvents = True\nsession.Start()\nstarted = session.OpenService('//blp/refdata')\ndataService = session.GetService('//blp/refdata')\nrequest = dataService.CreateRequest('HistoricalDataRequest')\nrequest.GetElement('securities').AppendValue('5 HK Equity')\nrequest.GetElement('fields').AppendValue('PX_LAST')\nrequest.Set('periodicitySelection', 'DAILY')\nrequest.Set('startDate', '20090119')\nrequest.Set('endDate', '20090130')\ncid = session.SendRequest(request)\nADMIN = 1\nAUTHORIZATION_STATUS = 11\nBLPSERVICE_STATUS = 9\nPARTIAL_RESPONSE = 6\nPUBLISHING_DATA = 13\nREQUEST_STATUS = 4\nRESOLUTION_STATUS = 12\nRESPONSE = 5\nSESSION_STATUS = 2\nSUBSCRIPTION_DATA = 8\nSUBSCRIPTION_STATUS = 3\nTIMEOUT = 10\nTOKEN_STATUS = 15\nTOPIC_STATUS = 14\nUNKNOWN = -1\nstayHere = True\nwhile stayHere:\n    event = session.NextEvent();\n    if event.EventType == PARTIAL_RESPONSE or event.EventType == RESPONSE:\n        iterator = event.CreateMessageIterator()\n        iterator.Next()\n        message = iterator.Message\n        securityData = message.GetElement('securityData')\n        securityName = securityData.GetElement('security')\n        fieldData = securityData.GetElement('fieldData')\n        returnList = [[0 for col in range(fieldData.GetValue(row).NumValues+1)] for row in range(fieldData.NumValues)]\n        for row in range(fieldData.NumValues):\n            rowField = fieldData.GetValue(row)\n            for col in range(rowField.NumValues+1):\n                colField = rowField.GetElement(col)\n                returnList[row][col] = colField.Value\n        stayHere = False\n        break\nelement = None\niterator = None\nmessage = None\nevent = None\nsession = None\nprint returnList\n
\n soup wrap:

I finally figured it out. I did a fair bit of combrowse.py detective work, and I compared with the JAVA, C, C++, and .NET examples in the BBG API download. Interestingly enough the Bloomberg Helpdesk people knew pretty much null when it came to these things, or perhaps I was just talking to the wrong person.

Here is my code.

asynchronousHandler.py:

import win32com.client
from pythoncom import PumpWaitingMessages
from time import time, strftime
import constants

class EventHandler:
    def OnProcessEvent(self, result):
        event = win32com.client.gencache.EnsureDispatch(result) 
        if event.EventType == constants.SUBSCRIPTION_DATA:
            self.getData(event)
        elif event.EventType == constants.SUBSCRIPTION_STATUS:
            self.getStatus(event)
        else:
            self.getMisc(event)
    def getData(self, event):
        iterator = event.CreateMessageIterator()
        while iterator.Next():
            message = iterator.Message  
            dataString = ''
            for fieldIndex, field in enumerate(constants.fields):           
                if message.AsElement.HasElement(field):
                    element = message.GetElement(field)
                    if element.IsNull:
                        theValue = ''
                    else:
                        theValue = ', Value: ' + str(element.Value) 
                    dataString = dataString + ', (Type: ' + element.Name + theValue + ')'
            print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + dataString
    def getMisc(self, event):
        iterator = event.CreateMessageIterator()
        while iterator.Next():
            message = iterator.Message
            print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString
    def getStatus(self, event):
        iterator = event.CreateMessageIterator()
        while iterator.Next():
            message = iterator.Message
            if message.AsElement.HasElement('reason'):
                element = message.AsElement.GetElement('reason')
                print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + ', Category: ' + element.GetElement('category').Value + ', Description: ' + element.GetElement('description').Value 
            if message.AsElement.HasElement('exceptions'):
                element = message.AsElement.GetElement('exceptions')
                exceptionString = ''
                for n in range(element.NumValues):
                    exceptionInfo = element.GetValue(n)
                    fieldId = exceptionInfo.GetElement('fieldId')
                    reason = exceptionInfo.GetElement('reason')
                    exceptionString = exceptionString + ', (Field: ' + fieldId.Value + ', Category: ' + reason.GetElement('category').Value + ', Description: ' + reason.GetElement('description').Value + ') ' 
                print strftime('%m/%d/%y %H:%M:%S') + ', MessageType: ' + message.MessageTypeAsString + ', CorrelationId: ' + str(message.CorrelationId) + exceptionString

class bloombergSource:
    def __init__(self):
        session = win32com.client.DispatchWithEvents('blpapicom.Session' , EventHandler)
        session.Start()
        started = session.OpenService('//blp/mktdata')
        subscriptions = session.CreateSubscriptionList()
        for tickerIndex, ticker in enumerate(constants.tickers):
            if len(constants.interval) > 0:
                subscriptions.AddEx(ticker, constants.fields, constants.interval, session.CreateCorrelationId(tickerIndex))
            else:
                subscriptions.Add(ticker, constants.fields, session.CreateCorrelationId(tickerIndex))   
        session.Subscribe(subscriptions)
        endTime = time() + 2
        while True:
            PumpWaitingMessages()
            if endTime < time():                
                break               

if __name__ == "__main__":
    aBloombergSource = bloombergSource()

constants.py:

ADMIN = 1
AUTHORIZATION_STATUS = 11
BLPSERVICE_STATUS = 9
PARTIAL_RESPONSE = 6
PUBLISHING_DATA = 13
REQUEST_STATUS = 4
RESOLUTION_STATUS = 12
RESPONSE = 5
SESSION_STATUS = 2
SUBSCRIPTION_DATA = 8
SUBSCRIPTION_STATUS = 3
TIMEOUT = 10
TOKEN_STATUS = 15
TOPIC_STATUS = 14
UNKNOWN = -1
fields = ['BID']
tickers = ['AUD Curncy']
interval = '' #'interval=5.0'

For historical data I used this simple script:

import win32com.client

session = win32com.client.Dispatch('blpapicom.Session')
session.QueueEvents = True
session.Start()
started = session.OpenService('//blp/refdata')
dataService = session.GetService('//blp/refdata')
request = dataService.CreateRequest('HistoricalDataRequest')
request.GetElement('securities').AppendValue('5 HK Equity')
request.GetElement('fields').AppendValue('PX_LAST')
request.Set('periodicitySelection', 'DAILY')
request.Set('startDate', '20090119')
request.Set('endDate', '20090130')
cid = session.SendRequest(request)
ADMIN = 1
AUTHORIZATION_STATUS = 11
BLPSERVICE_STATUS = 9
PARTIAL_RESPONSE = 6
PUBLISHING_DATA = 13
REQUEST_STATUS = 4
RESOLUTION_STATUS = 12
RESPONSE = 5
SESSION_STATUS = 2
SUBSCRIPTION_DATA = 8
SUBSCRIPTION_STATUS = 3
TIMEOUT = 10
TOKEN_STATUS = 15
TOPIC_STATUS = 14
UNKNOWN = -1
stayHere = True
while stayHere:
    event = session.NextEvent();
    if event.EventType == PARTIAL_RESPONSE or event.EventType == RESPONSE:
        iterator = event.CreateMessageIterator()
        iterator.Next()
        message = iterator.Message
        securityData = message.GetElement('securityData')
        securityName = securityData.GetElement('security')
        fieldData = securityData.GetElement('fieldData')
        returnList = [[0 for col in range(fieldData.GetValue(row).NumValues+1)] for row in range(fieldData.NumValues)]
        for row in range(fieldData.NumValues):
            rowField = fieldData.GetValue(row)
            for col in range(rowField.NumValues+1):
                colField = rowField.GetElement(col)
                returnList[row][col] = colField.Value
        stayHere = False
        break
element = None
iterator = None
message = None
event = None
session = None
print returnList
qid & accept id: (2005759, 2006439) query: Visual module in python assign objects soup:

First things first. Your code uses list concatenation to add stuff to the list. It is better to use the .append() method of lists. Also, the last loop could iterate directly on the objects instead of using an index. It is more elegant and easy to understand this way.

\n

The pseudo-code below is equivalent to yours, but with the above corrections applied:

\n
from visual import *\nstars = []\ngalaxies = []    \nfor i in  range(10):\n   stars.append(sphere(...))\nfor j in range(20):\n   galaxies.append(sphere(...))\nfor star, galaxy, starpos, galaxypos in zip(stars, galaxies, \n                                            position, G_position):\n   star.pos = starpos\n   galaxy.pos = galaxypos\n
\n

With that out of the way, I can explain how visual works.

\n

Visual module updates the screen as soon as the object is changed. The animation is done by that alteration, in realtime, there's no need for a show() or start_animation() - it happens as it goes. An example you can run on python command line:

\n
>>> from visual import sphere\n>>> s = sphere()\n
\n

That line creates a sphere, and a window, and shows the sphere in the window already!!!

\n
>>> s.x = -100\n
\n

That line changes the sphere position on x axis to -100. The change happens immediatelly on the screen. Just after this line runs, you see the sphere appear to the left of the window.

\n

So the animation happens by changing the values of the objects.

\n soup wrap:

First things first. Your code uses list concatenation to add stuff to the list. It is better to use the .append() method of lists. Also, the last loop could iterate directly on the objects instead of using an index. It is more elegant and easy to understand this way.

The pseudo-code below is equivalent to yours, but with the above corrections applied:

from visual import *
stars = []
galaxies = []    
for i in  range(10):
   stars.append(sphere(...))
for j in range(20):
   galaxies.append(sphere(...))
for star, galaxy, starpos, galaxypos in zip(stars, galaxies, 
                                            position, G_position):
   star.pos = starpos
   galaxy.pos = galaxypos

With that out of the way, I can explain how visual works.

Visual module updates the screen as soon as the object is changed. The animation is done by that alteration, in realtime, there's no need for a show() or start_animation() - it happens as it goes. An example you can run on python command line:

>>> from visual import sphere
>>> s = sphere()

That line creates a sphere, and a window, and shows the sphere in the window already!!!

>>> s.x = -100

That line changes the sphere position on x axis to -100. The change happens immediatelly on the screen. Just after this line runs, you see the sphere appear to the left of the window.

So the animation happens by changing the values of the objects.

qid & accept id: (2012611, 2012631) query: any() function in Python with a callback soup:

How about:

\n
>>> any(isinstance(e, int) and e > 0 for e in [1,2,'joe'])\nTrue\n
\n

It also works with all() of course:

\n
>>> all(isinstance(e, int) and e > 0 for e in [1,2,'joe'])\nFalse\n
\n soup wrap:

How about:

>>> any(isinstance(e, int) and e > 0 for e in [1,2,'joe'])
True

It also works with all() of course:

>>> all(isinstance(e, int) and e > 0 for e in [1,2,'joe'])
False
qid & accept id: (2034584, 2034989) query: Datastore Design Inquiry soup:

This isn't as complicated as you might think. Here's a Category class:

\n
class Category(db.Model):\n    title = db.StringProperty()\n    subcategories = db.ListProperty(db.Key)\n    quizzes = db.ListProperty(db.Key)\n\n    def add_sub_category(self, title):\n        new_category = Category(title)\n        new_category.put()\n        self.subcategories.append(new_category)\n        self.put()\n\n        return new_category\n
\n

By keeping both the subcategories and quizzes that are assocaited with this Category in a ListProperty, getting a count of them is as simple as using the len() operator.

\n

You could use it something like this:

\n
main_category = Category("Main")\nmain_category.put()\n\nsports_category = main_category.add_sub_category("Sports")\nbaseball_category = sports_category.add_sub_category("Baseball")\nfootball_category = sports_category.add_sub_category("Football")\nhockey_category = sports_category.add_sub_category("Hockey")\n\ntv_category = main_category.add_sub_category("TV")\n
\n

...etc...

\n soup wrap:

This isn't as complicated as you might think. Here's a Category class:

class Category(db.Model):
    title = db.StringProperty()
    subcategories = db.ListProperty(db.Key)
    quizzes = db.ListProperty(db.Key)

    def add_sub_category(self, title):
        new_category = Category(title)
        new_category.put()
        self.subcategories.append(new_category)
        self.put()

        return new_category

By keeping both the subcategories and quizzes that are assocaited with this Category in a ListProperty, getting a count of them is as simple as using the len() operator.

You could use it something like this:

main_category = Category("Main")
main_category.put()

sports_category = main_category.add_sub_category("Sports")
baseball_category = sports_category.add_sub_category("Baseball")
football_category = sports_category.add_sub_category("Football")
hockey_category = sports_category.add_sub_category("Hockey")

tv_category = main_category.add_sub_category("TV")

...etc...

qid & accept id: (2082387, 4653306) query: Reading input from raw_input() without having the prompt overwritten by other threads in Python soup:

I recently encountered this problem, and would like to leave this solution here for future reference.\nThese solutions clear the pending raw_input (readline) text from the terminal, print the new text, then reprint to the terminal what was in the raw_input buffer.

\n

This first program is pretty simple, but only works correctly when there is only 1 line of text waiting for raw_input:

\n
#!/usr/bin/python\n\nimport time,readline,thread,sys\n\ndef noisy_thread():\n    while True:\n        time.sleep(3)\n        sys.stdout.write('\r'+' '*(len(readline.get_line_buffer())+2)+'\r')\n        print 'Interrupting text!'\n        sys.stdout.write('> ' + readline.get_line_buffer())\n        sys.stdout.flush()\n\nthread.start_new_thread(noisy_thread, ())\nwhile True:\n    s = raw_input('> ')\n
\n

Output:

\n
$ ./threads_input.py\nInterrupting text!\nInterrupting text!\nInterrupting text!\n> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo\nInterrupting text!\n> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo\nnaparte family. No, I warn you, that if you do not tell me we are at war,\n
\n

The second correctly handles 2 or more buffered lines, but has more (standard) module dependencies and requires a wee bit of terminal hackery:

\n
#!/usr/bin/python\n\nimport time,readline,thread\nimport sys,struct,fcntl,termios\n\ndef blank_current_readline():\n    # Next line said to be reasonably portable for various Unixes\n    (rows,cols) = struct.unpack('hh', fcntl.ioctl(sys.stdout, termios.TIOCGWINSZ,'1234'))\n\n    text_len = len(readline.get_line_buffer())+2\n\n    # ANSI escape sequences (All VT100 except ESC[0G)\n    sys.stdout.write('\x1b[2K')                         # Clear current line\n    sys.stdout.write('\x1b[1A\x1b[2K'*(text_len/cols))  # Move cursor up and clear line\n    sys.stdout.write('\x1b[0G')                         # Move to start of line\n\n\ndef noisy_thread():\n    while True:\n        time.sleep(3)\n        blank_current_readline()\n        print 'Interrupting text!'\n        sys.stdout.write('> ' + readline.get_line_buffer())\n        sys.stdout.flush()          # Needed or text doesn't show until a key is pressed\n\n\nif __name__ == '__main__':\n    thread.start_new_thread(noisy_thread, ())\n    while True:\n        s = raw_input('> ')\n
\n

Output. Previous readline lines cleared properly:

\n
$ ./threads_input2.py\nInterrupting text!\nInterrupting text!\nInterrupting text!\nInterrupting text!\n> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo\nnaparte family. No, I warn you, that if you do not tell me we are at war,\n
\n
\n

Useful sources:

\n

How to get Linux console window width in Python

\n

apt like column output - python library\n(This code sample shows how to get terminal width for either Unix or Windows)

\n

http://en.wikipedia.org/wiki/ANSI_escape_code

\n soup wrap:

I recently encountered this problem, and would like to leave this solution here for future reference. These solutions clear the pending raw_input (readline) text from the terminal, print the new text, then reprint to the terminal what was in the raw_input buffer.

This first program is pretty simple, but only works correctly when there is only 1 line of text waiting for raw_input:

#!/usr/bin/python

import time,readline,thread,sys

def noisy_thread():
    while True:
        time.sleep(3)
        sys.stdout.write('\r'+' '*(len(readline.get_line_buffer())+2)+'\r')
        print 'Interrupting text!'
        sys.stdout.write('> ' + readline.get_line_buffer())
        sys.stdout.flush()

thread.start_new_thread(noisy_thread, ())
while True:
    s = raw_input('> ')

Output:

$ ./threads_input.py
Interrupting text!
Interrupting text!
Interrupting text!
> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo
Interrupting text!
> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo
naparte family. No, I warn you, that if you do not tell me we are at war,

The second correctly handles 2 or more buffered lines, but has more (standard) module dependencies and requires a wee bit of terminal hackery:

#!/usr/bin/python

import time,readline,thread
import sys,struct,fcntl,termios

def blank_current_readline():
    # Next line said to be reasonably portable for various Unixes
    (rows,cols) = struct.unpack('hh', fcntl.ioctl(sys.stdout, termios.TIOCGWINSZ,'1234'))

    text_len = len(readline.get_line_buffer())+2

    # ANSI escape sequences (All VT100 except ESC[0G)
    sys.stdout.write('\x1b[2K')                         # Clear current line
    sys.stdout.write('\x1b[1A\x1b[2K'*(text_len/cols))  # Move cursor up and clear line
    sys.stdout.write('\x1b[0G')                         # Move to start of line


def noisy_thread():
    while True:
        time.sleep(3)
        blank_current_readline()
        print 'Interrupting text!'
        sys.stdout.write('> ' + readline.get_line_buffer())
        sys.stdout.flush()          # Needed or text doesn't show until a key is pressed


if __name__ == '__main__':
    thread.start_new_thread(noisy_thread, ())
    while True:
        s = raw_input('> ')

Output. Previous readline lines cleared properly:

$ ./threads_input2.py
Interrupting text!
Interrupting text!
Interrupting text!
Interrupting text!
> WELL, PRINCE, Genoa and Lucca are now no more than private estates of the Bo
naparte family. No, I warn you, that if you do not tell me we are at war,

Useful sources:

How to get Linux console window width in Python

apt like column output - python library (This code sample shows how to get terminal width for either Unix or Windows)

http://en.wikipedia.org/wiki/ANSI_escape_code

qid & accept id: (2126551, 2127648) query: An equivalent to string.ascii_letters for unicode strings in python 2.x? soup:

You can construct your own constant of Unicode upper and lower case letters with:

\n
import unicodedata as ud\nall_unicode = ''.join(unichr(i) for i in xrange(65536))\nunicode_letters = ''.join(c for c in all_unicode\n                          if ud.category(c)=='Lu' or ud.category(c)=='Ll')\n
\n

This makes a string 2153 characters long (narrow Unicode Python build). For code like letter in unicode_letters it would be faster to use a set instead:

\n
unicode_letters = set(unicode_letters)\n
\n soup wrap:

You can construct your own constant of Unicode upper and lower case letters with:

import unicodedata as ud
all_unicode = ''.join(unichr(i) for i in xrange(65536))
unicode_letters = ''.join(c for c in all_unicode
                          if ud.category(c)=='Lu' or ud.category(c)=='Ll')

This makes a string 2153 characters long (narrow Unicode Python build). For code like letter in unicode_letters it would be faster to use a set instead:

unicode_letters = set(unicode_letters)
qid & accept id: (2148119, 5807028) query: How to convert an xml string to a dictionary in Python? soup:

This is a great module that someone created. I've used it several times. \nhttp://code.activestate.com/recipes/410469-xml-as-dictionary/

\n

Here is the code from the website just in case the link goes bad.

\n
import cElementTree as ElementTree\n\nclass XmlListConfig(list):\n    def __init__(self, aList):\n        for element in aList:\n            if element:\n                # treat like dict\n                if len(element) == 1 or element[0].tag != element[1].tag:\n                    self.append(XmlDictConfig(element))\n                # treat like list\n                elif element[0].tag == element[1].tag:\n                    self.append(XmlListConfig(element))\n            elif element.text:\n                text = element.text.strip()\n                if text:\n                    self.append(text)\n\n\nclass XmlDictConfig(dict):\n    '''\n    Example usage:\n\n    >>> tree = ElementTree.parse('your_file.xml')\n    >>> root = tree.getroot()\n    >>> xmldict = XmlDictConfig(root)\n\n    Or, if you want to use an XML string:\n\n    >>> root = ElementTree.XML(xml_string)\n    >>> xmldict = XmlDictConfig(root)\n\n    And then use xmldict for what it is... a dict.\n    '''\n    def __init__(self, parent_element):\n        if parent_element.items():\n            self.update(dict(parent_element.items()))\n        for element in parent_element:\n            if element:\n                # treat like dict - we assume that if the first two tags\n                # in a series are different, then they are all different.\n                if len(element) == 1 or element[0].tag != element[1].tag:\n                    aDict = XmlDictConfig(element)\n                # treat like list - we assume that if the first two tags\n                # in a series are the same, then the rest are the same.\n                else:\n                    # here, we put the list in dictionary; the key is the\n                    # tag name the list elements all share in common, and\n                    # the value is the list itself \n                    aDict = {element[0].tag: XmlListConfig(element)}\n                # if the tag has attributes, add those to the dict\n                if element.items():\n                    aDict.update(dict(element.items()))\n                self.update({element.tag: aDict})\n            # this assumes that if you've got an attribute in a tag,\n            # you won't be having any text. This may or may not be a \n            # good idea -- time will tell. It works for the way we are\n            # currently doing XML configuration files...\n            elif element.items():\n                self.update({element.tag: dict(element.items())})\n            # finally, if there are no child tags and no attributes, extract\n            # the text\n            else:\n                self.update({element.tag: element.text})\n
\n

Example usage:

\n
tree = ElementTree.parse('your_file.xml')\nroot = tree.getroot()\nxmldict = XmlDictConfig(root)\n
\n

//Or, if you want to use an XML string:

\n
root = ElementTree.XML(xml_string)\nxmldict = XmlDictConfig(root)\n
\n soup wrap:

This is a great module that someone created. I've used it several times. http://code.activestate.com/recipes/410469-xml-as-dictionary/

Here is the code from the website just in case the link goes bad.

import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

Example usage:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)
qid & accept id: (2170228, 2174843) query: Iterate over model instance field names and values in template soup:

I've come up with the following method, which works for me because in every case the model will have a ModelForm associated with it.

\n
def GetModelData(form, fields):\n    """\n    Extract data from the bound form model instance and return a\n    dictionary that is easily usable in templates with the actual\n    field verbose name as the label, e.g.\n\n    model_data{"Address line 1": "32 Memory lane",\n               "Address line 2": "Brainville",\n               "Phone": "0212378492"}\n\n    This way, the template has an ordered list that can be easily\n    presented in tabular form.\n    """\n    model_data = {}\n    for field in fields:\n        model_data[form[field].label] = eval("form.data.%s" % form[field].name)\n    return model_data\n\n@login_required\ndef clients_view(request, client_id):\n    client = Client.objects.get(id=client_id)\n    form = AddClientForm(client)\n\n    fields = ("address1", "address2", "address3", "address4",\n              "phone", "fax", "mobile", "email")\n    model_data = GetModelData(form, fields)\n\n    template_vars = RequestContext(request,\n        {\n            "client": client,\n            "model_data": model_data\n        }\n    )\n    return render_to_response("clients-view.html", template_vars)\n
\n

Here is an extract from the template I am using for this particular view:

\n
\n    \n    {% for field, value in model_data.items %}\n        \n            \n        \n    {% endfor %}\n    \n
{{ field }}{{ value }}
\n
\n

The nice thing about this method is that I can choose on a template-by-template basis the order in which I would like to display the field labels, using the tuple passed in to GetModelData and specifying the field names. This also allows me to exclude certain fields (e.g. a User foreign key) as only the field names passed in via the tuple are built into the final dictionary.

\n

I'm not going to accept this as the answer because I'm sure someone can come up with something more "Djangonic" :-)

\n

Update: I'm choosing this as the final answer because it is the simplest out of those given that does what I need. Thanks to everyone who contributed answers.

\n soup wrap:

I've come up with the following method, which works for me because in every case the model will have a ModelForm associated with it.

def GetModelData(form, fields):
    """
    Extract data from the bound form model instance and return a
    dictionary that is easily usable in templates with the actual
    field verbose name as the label, e.g.

    model_data{"Address line 1": "32 Memory lane",
               "Address line 2": "Brainville",
               "Phone": "0212378492"}

    This way, the template has an ordered list that can be easily
    presented in tabular form.
    """
    model_data = {}
    for field in fields:
        model_data[form[field].label] = eval("form.data.%s" % form[field].name)
    return model_data

@login_required
def clients_view(request, client_id):
    client = Client.objects.get(id=client_id)
    form = AddClientForm(client)

    fields = ("address1", "address2", "address3", "address4",
              "phone", "fax", "mobile", "email")
    model_data = GetModelData(form, fields)

    template_vars = RequestContext(request,
        {
            "client": client,
            "model_data": model_data
        }
    )
    return render_to_response("clients-view.html", template_vars)

Here is an extract from the template I am using for this particular view:


    {% for field, value in model_data.items %}
        
    {% endfor %}
    
{{ field }}{{ value }}

The nice thing about this method is that I can choose on a template-by-template basis the order in which I would like to display the field labels, using the tuple passed in to GetModelData and specifying the field names. This also allows me to exclude certain fields (e.g. a User foreign key) as only the field names passed in via the tuple are built into the final dictionary.

I'm not going to accept this as the answer because I'm sure someone can come up with something more "Djangonic" :-)

Update: I'm choosing this as the final answer because it is the simplest out of those given that does what I need. Thanks to everyone who contributed answers.

qid & accept id: (2192658, 2192975) query: Is there a better way to convert from decimal to binary in python? soup:

In Python 2.6 or newer, use format syntax:

\n
'{0:0=#10b}'.format(my_num)[2:]\n# '00001010'\n
\n

One of the neat things about Python strings is that they are sequences. If all you need to do is iterate through the characters, then there is no need to convert the string to a list.

\n

Edit: For steganography, you might be interested in converting a stream of characters into a stream of bits. Here is how you could do that with generators:

\n
def str2bits(astr):\n    for char in astr:    \n        n=ord(char)\n        for bit in '{0:0=#10b}'.format(n)[2:]:\n            yield int(bit)\n
\n

And to convert a stream of bits back into a stream of characters:

\n
def grouper(n, iterable, fillvalue=None):\n    # Source: http://docs.python.org/library/itertools.html#recipes\n    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"\n    return itertools.izip_longest(*[iter(iterable)]*n,fillvalue=fillvalue)\n\ndef bits2str(bits):\n    for b in grouper(8,bits):\n        yield chr(int(''.join(map(str,b)),2))\n
\n

For example, you could use the above functions like this:

\n
for b in str2bits('Hi Zvarberg'):\n    print b,\n# 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 1 1 1\n\n# To show bits2str is the inverse of str2bits:\nprint ''.join([c for c in bits2str(str2bits('Hi Zvarberg'))])\n# Hi Zvarberg\n
\n

Also, SO guru Ned Batchelder does some steganography-related experiments using Python and PIL here. You may be able to find some useful code there.

\n

If you find you need more speed (and still want to code this in Python), you may want to look into using numpy.

\n soup wrap:

In Python 2.6 or newer, use format syntax:

'{0:0=#10b}'.format(my_num)[2:]
# '00001010'

One of the neat things about Python strings is that they are sequences. If all you need to do is iterate through the characters, then there is no need to convert the string to a list.

Edit: For steganography, you might be interested in converting a stream of characters into a stream of bits. Here is how you could do that with generators:

def str2bits(astr):
    for char in astr:    
        n=ord(char)
        for bit in '{0:0=#10b}'.format(n)[2:]:
            yield int(bit)

And to convert a stream of bits back into a stream of characters:

def grouper(n, iterable, fillvalue=None):
    # Source: http://docs.python.org/library/itertools.html#recipes
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    return itertools.izip_longest(*[iter(iterable)]*n,fillvalue=fillvalue)

def bits2str(bits):
    for b in grouper(8,bits):
        yield chr(int(''.join(map(str,b)),2))

For example, you could use the above functions like this:

for b in str2bits('Hi Zvarberg'):
    print b,
# 0 1 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 1 1 1 0 0 1 0 0 1 1 0 0 1 1 1

# To show bits2str is the inverse of str2bits:
print ''.join([c for c in bits2str(str2bits('Hi Zvarberg'))])
# Hi Zvarberg

Also, SO guru Ned Batchelder does some steganography-related experiments using Python and PIL here. You may be able to find some useful code there.

If you find you need more speed (and still want to code this in Python), you may want to look into using numpy.

qid & accept id: (2203351, 2210729) query: Flex: detecting user idle? soup:

See also the idle event in SystemManager. This approach works for AIR or Flash Player.

\n
application.systemManager.addEventListener(FlexEvent.IDLE, onIdle);\n
\n

You can get the idle time (in an unsupported way) using

\n
SystemManager.mx_internal::idleCounter\n
\n soup wrap:

See also the idle event in SystemManager. This approach works for AIR or Flash Player.

application.systemManager.addEventListener(FlexEvent.IDLE, onIdle);

You can get the idle time (in an unsupported way) using

SystemManager.mx_internal::idleCounter
qid & accept id: (2237624, 2238076) query: Applying python decorators to methods in a class soup:

In Python 2.6, a class decorator is definitely the way to go. E.g., here's a pretty general one for these kind of tasks:

\n
import inspect\n\ndef decallmethods(decorator, prefix='test_'):\n  def dectheclass(cls):\n    for name, m in inspect.getmembers(cls, inspect.ismethod):\n      if name.startswith(prefix):\n        setattr(cls, name, decorator(m))\n    return cls\n  return dectheclass\n
\n

and now, just

\n
@decallmethods(login_testuser)\nclass TestCase(object):\n    def setUp(self):\n        pass\n\n    def test_1(self):\n        print "test_1()"\n\n    def test_2(self):\n        print "test_2()"\n
\n

will get you what you desire. In Python 2.5 or worse, the @decallmethods syntax doesn't work for class decoration, but with otherwise exactly the same code you can replace it with the following statement right after the end of the class TestCase statement:

\n
TestCase = decallmethods(login_testuser)(TestCase)\n
\n soup wrap:

In Python 2.6, a class decorator is definitely the way to go. E.g., here's a pretty general one for these kind of tasks:

import inspect

def decallmethods(decorator, prefix='test_'):
  def dectheclass(cls):
    for name, m in inspect.getmembers(cls, inspect.ismethod):
      if name.startswith(prefix):
        setattr(cls, name, decorator(m))
    return cls
  return dectheclass

and now, just

@decallmethods(login_testuser)
class TestCase(object):
    def setUp(self):
        pass

    def test_1(self):
        print "test_1()"

    def test_2(self):
        print "test_2()"

will get you what you desire. In Python 2.5 or worse, the @decallmethods syntax doesn't work for class decoration, but with otherwise exactly the same code you can replace it with the following statement right after the end of the class TestCase statement:

TestCase = decallmethods(login_testuser)(TestCase)
qid & accept id: (2255177, 2259769) query: Finding the exponent of n = 2**x using bitwise operations [logarithm in base 2 of n] soup:

Short answer

\n

As far as python is concerned:

\n\n

Preliminary notes

\n
    \n
  1. All speed measurements below have been obtained via timeit.Timer.repeat(testn, cycles) where testn was set to 3 and cycles was automatically adjusted by the script to obtain times in the range of seconds (note: there was a bug in this auto-adjusting mechanism that has been fixed on 18/02/2010).
  2. \n
  3. Not all methods can scale, this is why I did not test all functions for the various powers of 2
  4. \n
  5. I did not manage to get some of the proposed methods to work (the function returns a wrong result). I did not yet have tiem to do a step-by-step debugging session: I included the code (commented) just in case somebody spots the mistake by inspection (or want to perform the debug themselves)
  6. \n
\n

Results

\n

func(25)**

\n
hashlookup:          0.13s     100%\nlookup:              0.15s     109%\nstringcount:         0.29s     220%\nunrolled_bitwise:    0.36s     272%\nlog_e:               0.60s     450%\nbitcounter:          0.64s     479%\nlog_2:               0.69s     515%\nilog:                0.81s     609%\nbitwise:             1.10s     821%\nolgn:                1.42s    1065%\n
\n

func(231)**

\n
hashlookup:          0.11s     100%\nunrolled_bitwise:    0.26s     229%\nlog_e:               0.30s     268%\nstringcount:         0.30s     270%\nlog_2:               0.34s     301%\nilog:                0.41s     363%\nbitwise:             0.87s     778%\nolgn:                1.02s     912%\nbitcounter:          1.42s    1264%\n
\n

func(2128)**

\n
hashlookup:     0.01s     100%\nstringcount:    0.03s     264%\nlog_e:          0.04s     315%\nlog_2:          0.04s     383%\nolgn:           0.18s    1585%\nbitcounter:     1.41s   12393%\n
\n

func(21024)**

\n
log_e:          0.00s     100%\nlog_2:          0.01s     118%\nstringcount:    0.02s     354%\nolgn:           0.03s     707%\nbitcounter:     1.73s   37695%\n
\n

Code

\n
import math, sys\n\ndef stringcount(v):\n    """mac"""    \n    return len(bin(v)) - 3\n\ndef log_2(v):\n    """mac"""    \n    return int(round(math.log(v, 2), 0)) # 2**101 generates 100.999999999\n\ndef log_e(v):\n    """bp on mac"""    \n    return int(round(math.log(v)/0.69314718055994529, 0))  # 0.69 == log(2)\n\ndef bitcounter(v):\n    """John Y on mac"""\n    r = 0\n    while v > 1 :\n        v >>= 1\n        r += 1\n    return r\n\ndef olgn(n) :\n    """outis"""\n    if n < 1:\n        return -1\n    low = 0\n    high = sys.getsizeof(n)*8 # not the best upper-bound guesstimate, but...\n    while True:\n        mid = (low+high)//2\n        i = n >> mid\n        if i == 1:\n            return mid\n        if i == 0:\n            high = mid-1\n        else:\n            low = mid+1\n\ndef hashlookup(v):\n    """mac on brone -- limit: v < 2**131"""\n#    def prepareTable(max_log2=130) :\n#        hash_table = {}\n#        for p in range(1, max_log2) :\n#            hash_table[2**p] = p\n#        return hash_table\n\n    global hash_table\n    return hash_table[v] \n\ndef lookup(v):\n    """brone -- limit: v < 2**11"""\n#    def prepareTable(max_log2=10) :\n#        log2s_table=[0]*((1<>= S[i];\n            r |= S[i];\n    return r\n\ndef unrolled_bitwise(v):\n    """x4u on Mark Byers -- limit:   v < 2**33"""\n    r = 0;\n    if v > 0xffff : \n        v >>= 16\n        r = 16;\n    if v > 0x00ff :\n        v >>=  8\n        r += 8;\n    if v > 0x000f :\n        v >>=  4\n        r += 4;\n    if v > 0x0003 : \n        v >>=  2\n        r += 2;\n    return r + (v >> 1)\n\ndef ilog(v):\n    """Gregory Maxwell - (Original code: B. Terriberry) -- limit: v < 2**32"""\n    ret = 1\n    m = (not not v & 0xFFFF0000) << 4;\n    v >>= m;\n    ret |= m;\n    m = (not not v & 0xFF00) << 3;\n    v >>= m;\n    ret |= m;\n    m = (not not v & 0xF0) << 2;\n    v >>= m;\n    ret |= m;\n    m = (not not v & 0xC) << 1;\n    v >>= m;\n    ret |= m;\n    ret += (not not v & 0x2);\n    return ret - 1;\n\n\n# following table is equal to "return hashlookup.prepareTable()" \nhash_table = {...} # numbers have been cut out to avoid cluttering the post\n\n# following table is equal to "return lookup.prepareTable()" - cached for speed\nlog2s_table = (...) # numbers have been cut out to avoid cluttering the post\n
\n soup wrap:

Short answer

As far as python is concerned:

Preliminary notes

  1. All speed measurements below have been obtained via timeit.Timer.repeat(testn, cycles) where testn was set to 3 and cycles was automatically adjusted by the script to obtain times in the range of seconds (note: there was a bug in this auto-adjusting mechanism that has been fixed on 18/02/2010).
  2. Not all methods can scale, this is why I did not test all functions for the various powers of 2
  3. I did not manage to get some of the proposed methods to work (the function returns a wrong result). I did not yet have tiem to do a step-by-step debugging session: I included the code (commented) just in case somebody spots the mistake by inspection (or want to perform the debug themselves)

Results

func(25)**

hashlookup:          0.13s     100%
lookup:              0.15s     109%
stringcount:         0.29s     220%
unrolled_bitwise:    0.36s     272%
log_e:               0.60s     450%
bitcounter:          0.64s     479%
log_2:               0.69s     515%
ilog:                0.81s     609%
bitwise:             1.10s     821%
olgn:                1.42s    1065%

func(231)**

hashlookup:          0.11s     100%
unrolled_bitwise:    0.26s     229%
log_e:               0.30s     268%
stringcount:         0.30s     270%
log_2:               0.34s     301%
ilog:                0.41s     363%
bitwise:             0.87s     778%
olgn:                1.02s     912%
bitcounter:          1.42s    1264%

func(2128)**

hashlookup:     0.01s     100%
stringcount:    0.03s     264%
log_e:          0.04s     315%
log_2:          0.04s     383%
olgn:           0.18s    1585%
bitcounter:     1.41s   12393%

func(21024)**

log_e:          0.00s     100%
log_2:          0.01s     118%
stringcount:    0.02s     354%
olgn:           0.03s     707%
bitcounter:     1.73s   37695%

Code

import math, sys

def stringcount(v):
    """mac"""    
    return len(bin(v)) - 3

def log_2(v):
    """mac"""    
    return int(round(math.log(v, 2), 0)) # 2**101 generates 100.999999999

def log_e(v):
    """bp on mac"""    
    return int(round(math.log(v)/0.69314718055994529, 0))  # 0.69 == log(2)

def bitcounter(v):
    """John Y on mac"""
    r = 0
    while v > 1 :
        v >>= 1
        r += 1
    return r

def olgn(n) :
    """outis"""
    if n < 1:
        return -1
    low = 0
    high = sys.getsizeof(n)*8 # not the best upper-bound guesstimate, but...
    while True:
        mid = (low+high)//2
        i = n >> mid
        if i == 1:
            return mid
        if i == 0:
            high = mid-1
        else:
            low = mid+1

def hashlookup(v):
    """mac on brone -- limit: v < 2**131"""
#    def prepareTable(max_log2=130) :
#        hash_table = {}
#        for p in range(1, max_log2) :
#            hash_table[2**p] = p
#        return hash_table

    global hash_table
    return hash_table[v] 

def lookup(v):
    """brone -- limit: v < 2**11"""
#    def prepareTable(max_log2=10) :
#        log2s_table=[0]*((1<>= S[i];
            r |= S[i];
    return r

def unrolled_bitwise(v):
    """x4u on Mark Byers -- limit:   v < 2**33"""
    r = 0;
    if v > 0xffff : 
        v >>= 16
        r = 16;
    if v > 0x00ff :
        v >>=  8
        r += 8;
    if v > 0x000f :
        v >>=  4
        r += 4;
    if v > 0x0003 : 
        v >>=  2
        r += 2;
    return r + (v >> 1)

def ilog(v):
    """Gregory Maxwell - (Original code: B. Terriberry) -- limit: v < 2**32"""
    ret = 1
    m = (not not v & 0xFFFF0000) << 4;
    v >>= m;
    ret |= m;
    m = (not not v & 0xFF00) << 3;
    v >>= m;
    ret |= m;
    m = (not not v & 0xF0) << 2;
    v >>= m;
    ret |= m;
    m = (not not v & 0xC) << 1;
    v >>= m;
    ret |= m;
    ret += (not not v & 0x2);
    return ret - 1;


# following table is equal to "return hashlookup.prepareTable()" 
hash_table = {...} # numbers have been cut out to avoid cluttering the post

# following table is equal to "return lookup.prepareTable()" - cached for speed
log2s_table = (...) # numbers have been cut out to avoid cluttering the post
qid & accept id: (2257101, 2257148) query: Sorting dictionary keys by values in a list? soup:

You shouldn't call you variables dict and list, because then, you cant use the build-in methods any more. I have renamed them in this example.

\n
>>> l = [1, 2, 37, 32, 4]\n>>> d = dict = {\n...     32: 'Megumi', \n...     1: 'Ai',\n...     2: 'Risa',\n...     3: 'Eri', \n...     4: 'Sayumi', \n...     37: 'Mai'\n... }\n
\n

You can't sort the default dict type in Python, because it's a hash table and therefore sorted by the hash functions of the keys. Anyway, you might find some alternative Python implementations when you search for OrderedDict or something like that in google.

\n

But you can create a new list containing the (key, value)-tuples from the dictionary, which is sorted by the first list:

\n
>>> s = list((i, d.get(i)) for i in L)\n>>> print s\n[(1, 'Ai'), (2, 'Risa'), (37, 'Mai'), (32, 'Megumi'), (4, 'Sayumi')]\n
\n

Or if you are only interested in the values:

\n
>>> s = list(d.get(i) for i in L)\n>>> print s\n['Ai', 'Risa', 'Mai', 'Megumi', 'Sayumi']\n
\n

Hope that helps!

\n soup wrap:

You shouldn't call you variables dict and list, because then, you cant use the build-in methods any more. I have renamed them in this example.

>>> l = [1, 2, 37, 32, 4]
>>> d = dict = {
...     32: 'Megumi', 
...     1: 'Ai',
...     2: 'Risa',
...     3: 'Eri', 
...     4: 'Sayumi', 
...     37: 'Mai'
... }

You can't sort the default dict type in Python, because it's a hash table and therefore sorted by the hash functions of the keys. Anyway, you might find some alternative Python implementations when you search for OrderedDict or something like that in google.

But you can create a new list containing the (key, value)-tuples from the dictionary, which is sorted by the first list:

>>> s = list((i, d.get(i)) for i in L)
>>> print s
[(1, 'Ai'), (2, 'Risa'), (37, 'Mai'), (32, 'Megumi'), (4, 'Sayumi')]

Or if you are only interested in the values:

>>> s = list(d.get(i) for i in L)
>>> print s
['Ai', 'Risa', 'Mai', 'Megumi', 'Sayumi']

Hope that helps!

qid & accept id: (2286557, 2286608) query: Positional Comparisons in Python soup:

This solution prints the results in the same order as the final placings.
\nIf the place has not changed (+0) is printed.
\nIf you wish to filter those out instead, simply put an if diff: before the print

\n
>>> after_short_program = [\n...     'Evgeni Plushenko',\n...     'Evan Lysacek',\n...     'Daisuke Takahashi',\n...     'Nobunari Oda',\n...     'Stephane Lambiel',\n... ]\n>>> \n>>> after_free_skate = [\n...     'Evan Lysacek',\n...     'Daisuke Takahashi',\n...     'Evgeni Plushenko',\n...     'Stephane Lambiel',\n...     'Nobunari Oda',\n... ]\n>>> \n>>> for i,item in enumerate(after_free_skate):\n...     diff = after_short_program.index(item)-i\n...     print "%s (%+d)"%(item,diff)\n...     \n... \nEvan Lysacek (+1)\nDaisuke Takahashi (+1)\nEvgeni Plushenko (-2)\nStephane Lambiel (+1)\nNobunari Oda (-1)\n
\n

As pwdyson points out, if your stopwatches aren't good enough, you might get a tie. So this modification uses dicts instead of lists. The order of the placings is still preserved

\n
>>> from operator import itemgetter\n>>> \n>>> after_short_program = {\n...     'Evgeni Plushenko':1,\n...     'Evan Lysacek':2,\n...     'Daisuke Takahashi':3,\n...     'Stephane Lambiel':4,\n...     'Nobunari Oda':5,\n... }\n>>> \n>>> after_free_skate = {\n...     'Evan Lysacek':1,\n...     'Daisuke Takahashi':2,\n...     'Evgeni Plushenko':3,\n...     'Stephane Lambiel':4,   # These are tied\n...     'Nobunari Oda':4,       # at 4th place\n... }\n>>> \n>>> for k,v in sorted(after_free_skate.items(),key=itemgetter(1)):\n...     diff = after_short_program[k]-v\n...     print "%s (%+d)"%(k,diff)\n...     \n... \nEvan Lysacek (+1)\nDaisuke Takahashi (+1)\nEvgeni Plushenko (-2)\nNobunari Oda (+1)\nStephane Lambiel (+0)\n>>> \n
\n

If there is a possibility of keys in the second dict that are not in the first you can do something like this

\n
for k,v in sorted(after_free_skate.items(),key=itemgetter(1)):\n    try:\n        diff = after_short_program[k]-v\n        print "%s (%+d)"%(k,diff)\n    except KeyError:\n        print "%s (new)"%k\n
\n soup wrap:

This solution prints the results in the same order as the final placings.
If the place has not changed (+0) is printed.
If you wish to filter those out instead, simply put an if diff: before the print

>>> after_short_program = [
...     'Evgeni Plushenko',
...     'Evan Lysacek',
...     'Daisuke Takahashi',
...     'Nobunari Oda',
...     'Stephane Lambiel',
... ]
>>> 
>>> after_free_skate = [
...     'Evan Lysacek',
...     'Daisuke Takahashi',
...     'Evgeni Plushenko',
...     'Stephane Lambiel',
...     'Nobunari Oda',
... ]
>>> 
>>> for i,item in enumerate(after_free_skate):
...     diff = after_short_program.index(item)-i
...     print "%s (%+d)"%(item,diff)
...     
... 
Evan Lysacek (+1)
Daisuke Takahashi (+1)
Evgeni Plushenko (-2)
Stephane Lambiel (+1)
Nobunari Oda (-1)

As pwdyson points out, if your stopwatches aren't good enough, you might get a tie. So this modification uses dicts instead of lists. The order of the placings is still preserved

>>> from operator import itemgetter
>>> 
>>> after_short_program = {
...     'Evgeni Plushenko':1,
...     'Evan Lysacek':2,
...     'Daisuke Takahashi':3,
...     'Stephane Lambiel':4,
...     'Nobunari Oda':5,
... }
>>> 
>>> after_free_skate = {
...     'Evan Lysacek':1,
...     'Daisuke Takahashi':2,
...     'Evgeni Plushenko':3,
...     'Stephane Lambiel':4,   # These are tied
...     'Nobunari Oda':4,       # at 4th place
... }
>>> 
>>> for k,v in sorted(after_free_skate.items(),key=itemgetter(1)):
...     diff = after_short_program[k]-v
...     print "%s (%+d)"%(k,diff)
...     
... 
Evan Lysacek (+1)
Daisuke Takahashi (+1)
Evgeni Plushenko (-2)
Nobunari Oda (+1)
Stephane Lambiel (+0)
>>> 

If there is a possibility of keys in the second dict that are not in the first you can do something like this

for k,v in sorted(after_free_skate.items(),key=itemgetter(1)):
    try:
        diff = after_short_program[k]-v
        print "%s (%+d)"%(k,diff)
    except KeyError:
        print "%s (new)"%k
qid & accept id: (2305115, 2305144) query: Remove and insert lines in a text file soup:

For python2.6

\n
with open("file1") as infile:\n    with open("file2","w") as outfile:\n        for i,line in enumerate(infile):\n            if i==2:\n                # 3rd line\n                outfile.write("new line1\n")\n                outfile.write("new line2\n")\n                outfile.write("new line3\n")\n            elif i==3:\n                # 4th line\n                pass\n            else:\n                outfile.write(line)\n
\n

For python3.1

\n
with open("file1") as infile, open("file2","w") as outfile:\n    for i,line in enumerate(infile):\n        if i==2:\n            # 3rd line\n            outfile.write("new line1\n")\n            outfile.write("new line2\n")\n            outfile.write("new line3\n")\n        elif i==3:\n            # 4th line\n            pass\n        else:\n            outfile.write(line)\n
\n soup wrap:

For python2.6

with open("file1") as infile:
    with open("file2","w") as outfile:
        for i,line in enumerate(infile):
            if i==2:
                # 3rd line
                outfile.write("new line1\n")
                outfile.write("new line2\n")
                outfile.write("new line3\n")
            elif i==3:
                # 4th line
                pass
            else:
                outfile.write(line)

For python3.1

with open("file1") as infile, open("file2","w") as outfile:
    for i,line in enumerate(infile):
        if i==2:
            # 3rd line
            outfile.write("new line1\n")
            outfile.write("new line2\n")
            outfile.write("new line3\n")
        elif i==3:
            # 4th line
            pass
        else:
            outfile.write(line)
qid & accept id: (2305501, 2305592) query: Sampling keys due to their values soup:

1. Construct a CDF-like list like this:

\n
def build_cdf(distrib):\n    cdf = []\n    val = 0\n    for key, freq in distrib.items():\n        val += freq\n        cdf.append((val, key))\n    return (val, cdf)\n
\n

This function returns a tuple, the 1st value is the sum of probabilities, and 2nd value is the CDF.

\n

2. Construct the sampler like this:

\n
import random\ndef sample_from_cdf(val_and_cdf):\n    (val, cdf) = val_and_cdf;\n    rand = random.uniform(0, val)\n    # use bisect.bisect_left to reduce search time from O(n) to O(log n).\n    return [key for index, key in cdf if index > rand][0]\n
\n

Usage:

\n
x = build_cdf({"a":0.2, "b":0.3, "c":0.5});\ny = [sample_from_cdf(x) for i in range(0,100000)];\nprint (len([t for t in y if t == "a"]))   # 19864\nprint (len([t for t in y if t == "b"]))   # 29760\nprint (len([t for t in y if t == "c"]))   # 50376\n
\n

You may want to make this into a class.

\n soup wrap:

1. Construct a CDF-like list like this:

def build_cdf(distrib):
    cdf = []
    val = 0
    for key, freq in distrib.items():
        val += freq
        cdf.append((val, key))
    return (val, cdf)

This function returns a tuple, the 1st value is the sum of probabilities, and 2nd value is the CDF.

2. Construct the sampler like this:

import random
def sample_from_cdf(val_and_cdf):
    (val, cdf) = val_and_cdf;
    rand = random.uniform(0, val)
    # use bisect.bisect_left to reduce search time from O(n) to O(log n).
    return [key for index, key in cdf if index > rand][0]

Usage:

x = build_cdf({"a":0.2, "b":0.3, "c":0.5});
y = [sample_from_cdf(x) for i in range(0,100000)];
print (len([t for t in y if t == "a"]))   # 19864
print (len([t for t in y if t == "b"]))   # 29760
print (len([t for t in y if t == "c"]))   # 50376

You may want to make this into a class.

qid & accept id: (2337285, 2340579) query: Set a DTD using minidom in python soup:

The documentation is out of date. Use the source, Luke. I do it something like this.

\n
from xml.dom.minidom import DOMImplementation\n\nimp = DOMImplementation()\ndoctype = imp.createDocumentType(\n    qualifiedName='foo',\n    publicId='', \n    systemId='http://www.path.to.my.dtd.com/my.dtd',\n)\ndoc = imp.createDocument(None, 'foo', doctype)\ndoc.toxml()\n
\n

This prints the following.

\n
\n
\n

Note how the root element is created automatically by createDocument(). Also, your 'something' has been changed to 'foo': the DTD needs to contain the root element name itself.

\n soup wrap:

The documentation is out of date. Use the source, Luke. I do it something like this.

from xml.dom.minidom import DOMImplementation

imp = DOMImplementation()
doctype = imp.createDocumentType(
    qualifiedName='foo',
    publicId='', 
    systemId='http://www.path.to.my.dtd.com/my.dtd',
)
doc = imp.createDocument(None, 'foo', doctype)
doc.toxml()

This prints the following.


Note how the root element is created automatically by createDocument(). Also, your 'something' has been changed to 'foo': the DTD needs to contain the root element name itself.

qid & accept id: (2358890, 2359619) query: Python - lexical analysis and tokenization soup:

There's an excellent article on Using Regular Expressions for Lexical Analysis at effbot.org.

\n

Adapting the tokenizer to your problem:

\n
import re\n\ntoken_pattern = r"""\n(?P[a-zA-Z_][a-zA-Z0-9_]*)\n|(?P[0-9]+)\n|(?P\.)\n|(?P[$][{])\n|(?P[{])\n|(?P[}])\n|(?P\n)\n|(?P\s+)\n|(?P[=])\n|(?P[/])\n"""\n\ntoken_re = re.compile(token_pattern, re.VERBOSE)\n\nclass TokenizerException(Exception): pass\n\ndef tokenize(text):\n    pos = 0\n    while True:\n        m = token_re.match(text, pos)\n        if not m: break\n        pos = m.end()\n        tokname = m.lastgroup\n        tokvalue = m.group(tokname)\n        yield tokname, tokvalue\n    if pos != len(text):\n        raise TokenizerException('tokenizer stopped at pos %r of %r' % (\n            pos, len(text)))\n
\n

To test it, we do:

\n
stuff = r'property.${general.name}.ip = ${general.ip}'\nstuff2 = r'''\ngeneral {\n  name = myname\n  ip = 127.0.0.1\n}\n'''\n\nprint ' stuff '.center(60, '=')\nfor tok in tokenize(stuff):\n    print tok\n\nprint ' stuff2 '.center(60, '=')\nfor tok in tokenize(stuff2):\n    print tok\n
\n

for:

\n
========================== stuff ===========================\n('identifier', 'property')\n('dot', '.')\n('open_variable', '${')\n('identifier', 'general')\n('dot', '.')\n('identifier', 'name')\n('close_curly', '}')\n('dot', '.')\n('identifier', 'ip')\n('whitespace', ' ')\n('equals', '=')\n('whitespace', ' ')\n('open_variable', '${')\n('identifier', 'general')\n('dot', '.')\n('identifier', 'ip')\n('close_curly', '}')\n========================== stuff2 ==========================\n('newline', '\n')\n('identifier', 'general')\n('whitespace', ' ')\n('open_curly', '{')\n('newline', '\n')\n('whitespace', '  ')\n('identifier', 'name')\n('whitespace', ' ')\n('equals', '=')\n('whitespace', ' ')\n('identifier', 'myname')\n('newline', '\n')\n('whitespace', '  ')\n('identifier', 'ip')\n('whitespace', ' ')\n('equals', '=')\n('whitespace', ' ')\n('integer', '127')\n('dot', '.')\n('integer', '0')\n('dot', '.')\n('integer', '0')\n('dot', '.')\n('integer', '1')\n('newline', '\n')\n('close_curly', '}')\n('newline', '\n')\n
\n soup wrap:

There's an excellent article on Using Regular Expressions for Lexical Analysis at effbot.org.

Adapting the tokenizer to your problem:

import re

token_pattern = r"""
(?P[a-zA-Z_][a-zA-Z0-9_]*)
|(?P[0-9]+)
|(?P\.)
|(?P[$][{])
|(?P[{])
|(?P[}])
|(?P\n)
|(?P\s+)
|(?P[=])
|(?P[/])
"""

token_re = re.compile(token_pattern, re.VERBOSE)

class TokenizerException(Exception): pass

def tokenize(text):
    pos = 0
    while True:
        m = token_re.match(text, pos)
        if not m: break
        pos = m.end()
        tokname = m.lastgroup
        tokvalue = m.group(tokname)
        yield tokname, tokvalue
    if pos != len(text):
        raise TokenizerException('tokenizer stopped at pos %r of %r' % (
            pos, len(text)))

To test it, we do:

stuff = r'property.${general.name}.ip = ${general.ip}'
stuff2 = r'''
general {
  name = myname
  ip = 127.0.0.1
}
'''

print ' stuff '.center(60, '=')
for tok in tokenize(stuff):
    print tok

print ' stuff2 '.center(60, '=')
for tok in tokenize(stuff2):
    print tok

for:

========================== stuff ===========================
('identifier', 'property')
('dot', '.')
('open_variable', '${')
('identifier', 'general')
('dot', '.')
('identifier', 'name')
('close_curly', '}')
('dot', '.')
('identifier', 'ip')
('whitespace', ' ')
('equals', '=')
('whitespace', ' ')
('open_variable', '${')
('identifier', 'general')
('dot', '.')
('identifier', 'ip')
('close_curly', '}')
========================== stuff2 ==========================
('newline', '\n')
('identifier', 'general')
('whitespace', ' ')
('open_curly', '{')
('newline', '\n')
('whitespace', '  ')
('identifier', 'name')
('whitespace', ' ')
('equals', '=')
('whitespace', ' ')
('identifier', 'myname')
('newline', '\n')
('whitespace', '  ')
('identifier', 'ip')
('whitespace', ' ')
('equals', '=')
('whitespace', ' ')
('integer', '127')
('dot', '.')
('integer', '0')
('dot', '.')
('integer', '0')
('dot', '.')
('integer', '1')
('newline', '\n')
('close_curly', '}')
('newline', '\n')
qid & accept id: (2363954, 2364138) query: Comparing two lists items in python soup:

First create a function which can load a given file, as you may want to maintain individual sets and also want to count occurrence of each number, best would be to have a dict for whole file where keys are set names e.g. complex.1 etc, for each such set keep another dict for numbers in set, below code explains it better

\n
def file_loader(f):\n    file_dict = {}\n    current_set = None\n    for line in f:\n        if line.startswith('d.complex'):\n            file_dict[line] = current_set = {}\n            continue\n\n        if current_set is not None:\n            current_set[line] = current_set.get(line, 0)\n\n    return file_dict\n
\n

Now you can easily write a function which will count a number in given file_dict

\n
def count_number(file_dict, num):\n    count = 0\n    for set_name, number_set in file_dict.iteritems():\n        count += number_set.get(num, 0)\n\n    return count\n
\n

e.g here is a usage example

\n
s = """d.complex.1\n10\n11\n12\n10\n11\n12"""\n\nfile_dict = file_loader(s.split("\n"))\nprint file_dict\nprint count_number(file_dict, '10')\n
\n

output is:

\n
{'d.complex.1': {'11': 2, '10': 2, '12': 2}}\n2\n
\n

You may have to improve file loader, e.g. skip empty lines, convert to int etc

\n soup wrap:

First create a function which can load a given file, as you may want to maintain individual sets and also want to count occurrence of each number, best would be to have a dict for whole file where keys are set names e.g. complex.1 etc, for each such set keep another dict for numbers in set, below code explains it better

def file_loader(f):
    file_dict = {}
    current_set = None
    for line in f:
        if line.startswith('d.complex'):
            file_dict[line] = current_set = {}
            continue

        if current_set is not None:
            current_set[line] = current_set.get(line, 0)

    return file_dict

Now you can easily write a function which will count a number in given file_dict

def count_number(file_dict, num):
    count = 0
    for set_name, number_set in file_dict.iteritems():
        count += number_set.get(num, 0)

    return count

e.g here is a usage example

s = """d.complex.1
10
11
12
10
11
12"""

file_dict = file_loader(s.split("\n"))
print file_dict
print count_number(file_dict, '10')

output is:

{'d.complex.1': {'11': 2, '10': 2, '12': 2}}
2

You may have to improve file loader, e.g. skip empty lines, convert to int etc

qid & accept id: (2382905, 2382919) query: Creating a palindrome list with reverse() soup:

It looks like you might be in need of

\n
mus.extend(reversed(mus))\n
\n

Or if you simply need to iterate over this and not necessarily form the list, use

\n
import itertools\nfor item in itertools.chain(mus, reversed(mus)):\n    do_something...\n
\n soup wrap:

It looks like you might be in need of

mus.extend(reversed(mus))

Or if you simply need to iterate over this and not necessarily form the list, use

import itertools
for item in itertools.chain(mus, reversed(mus)):
    do_something...
qid & accept id: (2389846, 2390047) query: Python Decimals format soup:

If you have Python 2.6 or newer, use format:

\n
'{0:.3g}'.format(num)\n
\n

For Python 2.5 or older:

\n
'%.3g'%(num)\n
\n

Explanation:

\n

{0}tells format to print the first argument -- in this case, num.

\n

Everything after the colon (:) specifies the format_spec.

\n

.3 sets the precision to 3.

\n

g removes insignificant zeros. See \nhttp://en.wikipedia.org/wiki/Printf#fprintf

\n

For example:

\n
tests=[(1.00, '1'),\n       (1.2, '1.2'),\n       (1.23, '1.23'),\n       (1.234, '1.23'),\n       (1.2345, '1.23')]\n\nfor num, answer in tests:\n    result = '{0:.3g}'.format(num)\n    if result != answer:\n        print('Error: {0} --> {1} != {2}'.format(num, result, answer))\n        exit()\n    else:\n        print('{0} --> {1}'.format(num,result))\n
\n

yields

\n
1.0 --> 1\n1.2 --> 1.2\n1.23 --> 1.23\n1.234 --> 1.23\n1.2345 --> 1.23\n
\n soup wrap:

If you have Python 2.6 or newer, use format:

'{0:.3g}'.format(num)

For Python 2.5 or older:

'%.3g'%(num)

Explanation:

{0}tells format to print the first argument -- in this case, num.

Everything after the colon (:) specifies the format_spec.

.3 sets the precision to 3.

g removes insignificant zeros. See http://en.wikipedia.org/wiki/Printf#fprintf

For example:

tests=[(1.00, '1'),
       (1.2, '1.2'),
       (1.23, '1.23'),
       (1.234, '1.23'),
       (1.2345, '1.23')]

for num, answer in tests:
    result = '{0:.3g}'.format(num)
    if result != answer:
        print('Error: {0} --> {1} != {2}'.format(num, result, answer))
        exit()
    else:
        print('{0} --> {1}'.format(num,result))

yields

1.0 --> 1
1.2 --> 1.2
1.23 --> 1.23
1.234 --> 1.23
1.2345 --> 1.23
qid & accept id: (2397295, 2398430) query: Web scraping with Python soup:

Use BeautifulSoup as a tree builder for html5lib:

\n
from html5lib import HTMLParser, treebuilders\n\nparser = HTMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup"))\n\ntext = "abc"\nsoup = parser.parse(text)\nprint soup.prettify()\n
\n

Output:

\n
\n \n \n \n  a\n  \n   b\n   \n    c\n   \n  \n \n\n
\n soup wrap:

Use BeautifulSoup as a tree builder for html5lib:

from html5lib import HTMLParser, treebuilders

parser = HTMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup"))

text = "abc"
soup = parser.parse(text)
print soup.prettify()

Output:


 
 
 
  a
  
   b
   
    c
   
  
 

qid & accept id: (2454494, 2454613) query: urllib2 multiple Set-Cookie headers in response soup:

According to urllib2 docs, the .headers attribute of the result URL object is an httplib.HTTPMessage (which appears to be undocumented, at least in the Python docs).

\n

However,

\n
help(httplib.HTTPMessage)\n...\n\nIf multiple header fields with the same name occur, they are combined\naccording to the rules in RFC 2616 sec 4.2:\n\nAppending each subsequent field-value to the first, each separated\nby a comma. The order in which header fields with the same field-name\nare received is significant to the interpretation of the combined\nfield value.\n
\n

So, if you access u.headers['Set-Cookie'], you should get one Set-Cookie header with the values separated by commas.

\n

Indeed, this appears to be the case.

\n
import httplib\nfrom StringIO import StringIO\n\nmsg = \\n"""Set-Cookie: Foo\nSet-Cookie: Bar\nSet-Cookie: Baz\n\nThis is the message"""\n\nmsg = StringIO(msg)\n\nmsg = httplib.HTTPMessage(msg)\n\nassert msg['Set-Cookie'] == 'Foo, Bar, Baz'\n
\n soup wrap:

According to urllib2 docs, the .headers attribute of the result URL object is an httplib.HTTPMessage (which appears to be undocumented, at least in the Python docs).

However,

help(httplib.HTTPMessage)
...

If multiple header fields with the same name occur, they are combined
according to the rules in RFC 2616 sec 4.2:

Appending each subsequent field-value to the first, each separated
by a comma. The order in which header fields with the same field-name
are received is significant to the interpretation of the combined
field value.

So, if you access u.headers['Set-Cookie'], you should get one Set-Cookie header with the values separated by commas.

Indeed, this appears to be the case.

import httplib
from StringIO import StringIO

msg = \
"""Set-Cookie: Foo
Set-Cookie: Bar
Set-Cookie: Baz

This is the message"""

msg = StringIO(msg)

msg = httplib.HTTPMessage(msg)

assert msg['Set-Cookie'] == 'Foo, Bar, Baz'
qid & accept id: (2468334, 2468374) query: Python | How to create dynamic and expandable dictionaries soup:
userdata = { "data":[]}\n\ndef fil_userdata():\n  for i in xrange(0,5):\n    user = {}\n    user["name"]=...\n    user["age"]=...\n    user["country"]=...\n    add_user(user)\n\ndef add_user(user):\n  userdata["data"].append(user)\n
\n

or shorter:

\n
def gen_user():\n  return {"name":"foo", "age":22}\n\nuserdata = {"data": [gen_user() for i in xrange(0,5)]}\n\n# or fill separated from declaration so you can fill later\nuserdata ={"data":None} # None: not initialized\nuserdata["data"]=[gen_user() for i in xrange(0,5)]\n
\n soup wrap:
userdata = { "data":[]}

def fil_userdata():
  for i in xrange(0,5):
    user = {}
    user["name"]=...
    user["age"]=...
    user["country"]=...
    add_user(user)

def add_user(user):
  userdata["data"].append(user)

or shorter:

def gen_user():
  return {"name":"foo", "age":22}

userdata = {"data": [gen_user() for i in xrange(0,5)]}

# or fill separated from declaration so you can fill later
userdata ={"data":None} # None: not initialized
userdata["data"]=[gen_user() for i in xrange(0,5)]
qid & accept id: (2470764, 2470811) query: python union of 2 nested lists with index soup:

Create an auxiliary dict (work is O(len(A)) -- assuming the first three items of a sublist in A uniquely identify it (otherwise you need a dict of lists):

\n
aud = dict((tuple(a[:3]), i) for i, a in enumerate(A))\n
\n

Use said dict to loop once on B (work is O(len(B))) to get B sublists and A indices:

\n
result = [(b, aud[tuple(b[:3])]) for b in B if tuple(b[:3]) in aud]\n
\n soup wrap:

Create an auxiliary dict (work is O(len(A)) -- assuming the first three items of a sublist in A uniquely identify it (otherwise you need a dict of lists):

aud = dict((tuple(a[:3]), i) for i, a in enumerate(A))

Use said dict to loop once on B (work is O(len(B))) to get B sublists and A indices:

result = [(b, aud[tuple(b[:3])]) for b in B if tuple(b[:3]) in aud]
qid & accept id: (2534786, 2539718) query: Drawing a clamped uniform cubic B-spline using Cairo soup:

Okay, so I searched a lot using Google and I think I came up with a reasonable solution that is suitable for my purposes. I'm posting it here - maybe it will be useful to someone else as well.

\n

First, let's start with a simple Point class:

\n
from collections import namedtuple\n\nclass Point(namedtuple("Point", "x y")):\n    __slots__ = ()\n\n    def interpolate(self, other, ratio = 0.5):\n        return Point(x = self.x * (1.0-ratio) + other.x * float(ratio), \\n                     y = self.y * (1.0-ratio) + other.y * float(ratio))\n
\n

A cubic B-spline is nothing more than a collection of Point objects:

\n
class CubicBSpline(object):\n    __slots__ = ("points", )\n\n    def __init__(self, points):\n        self.points = [Point(*coords) for coords in points]\n
\n

Now, assume that we have an open uniform cubic B-spline instead of a clamped one. Four consecutive control points of a cubic B-spline define a single Bézier segment, so control points 0 to 3 define the first Bézier segment, control points 1 to 4 define the second segment and so on. The control points of the Bézier spline can be determined by linearly interpolating between the control points of the B-spline in an appropriate way. Let A, B, C and D be the four control points of the B-spline. Calculate the following auxiliary points:

\n
    \n
  1. Find the point which divides the A-B line in a ratio of 2:1, let it be A'.
  2. \n
  3. Find the point which divides the C-D line in a ratio of 1:2, let it be D'.
  4. \n
  5. Divide the B-C line into three equal parts, let the two points be F and G.
  6. \n
  7. Find the point halfway between A' and F, this will be E.
  8. \n
  9. Find the point halfway between G and D', this will be H.
  10. \n
\n

A Bézier curve from E to H with control points F and G is equivalent to an open B-spline between points A, B, C and D. See sections 1-5 of this excellent document. By the way, the above method is called Böhm's algorithm, and it is much more complicated if formulated in a proper mathematic way that accounts for non-uniform or non-cubic B-splines as well.

\n

We have to repeat the above procedure for each group of 4 consecutive points of the B-spline, so in the end we will need the 1:2 and 2:1 division points between almost any consecutive control point pairs. This is what the following BSplineDrawer class does before drawing the curves:

\n
class BSplineDrawer(object):\n    def __init__(self, context):\n        self.ctx = context\n\n    def draw(self, bspline):\n        pairs = zip(bspline.points[:-1], bspline.points[1:])\n        one_thirds = [p1.interpolate(p2, 1/3.) for p1, p2 in pairs)\n        two_thirds = [p2.interpolate(p1, 1/3.) for p1, p2 in pairs)\n\n        coords = [None] * 6\n        for i in xrange(len(bspline.points) - 3):\n            start = two_thirds[i].interpolate(one_thirds[i+1])\n            coords[0:2] = one_thirds[i+1]\n            coords[2:4] = two_thirds[i+1]\n            coords[4:6] = two_thirds[i+1].interpolate(one_thirds[i+2])\n\n            self.context.move_to(*start)\n            self.context.curve_to(*coords)\n            self.context.stroke()\n
\n

Finally, if we want to draw clamped B-splines instead of open B-splines, we simply have to repeat both endpoints of the clamped B-spline three more times:

\n
class CubicBSpline(object):\n    [...]\n    def clamped(self):\n        new_points = [self.points[0]] * 3 + self.points + [self.points[-1]] * 3\n        return CubicBSpline(new_points)\n
\n

Finally, this is how the code should be used:

\n
import cairo\n\nsurface = cairo.ImageSurface(cairo.FORMAT_ARGB32, 600, 400)\nctx = cairo.Context(surface)\n\npoints = [(100,100), (200,100), (200,200), (100,200), (100,400), (300,400)]\nspline = CubicBSpline(points).clamped()\n\nctx.set_source_rgb(0., 0., 1.)\nctx.set_line_width(5)\nBSplineDrawer(ctx).draw(spline)\n
\n soup wrap:

Okay, so I searched a lot using Google and I think I came up with a reasonable solution that is suitable for my purposes. I'm posting it here - maybe it will be useful to someone else as well.

First, let's start with a simple Point class:

from collections import namedtuple

class Point(namedtuple("Point", "x y")):
    __slots__ = ()

    def interpolate(self, other, ratio = 0.5):
        return Point(x = self.x * (1.0-ratio) + other.x * float(ratio), \
                     y = self.y * (1.0-ratio) + other.y * float(ratio))

A cubic B-spline is nothing more than a collection of Point objects:

class CubicBSpline(object):
    __slots__ = ("points", )

    def __init__(self, points):
        self.points = [Point(*coords) for coords in points]

Now, assume that we have an open uniform cubic B-spline instead of a clamped one. Four consecutive control points of a cubic B-spline define a single Bézier segment, so control points 0 to 3 define the first Bézier segment, control points 1 to 4 define the second segment and so on. The control points of the Bézier spline can be determined by linearly interpolating between the control points of the B-spline in an appropriate way. Let A, B, C and D be the four control points of the B-spline. Calculate the following auxiliary points:

  1. Find the point which divides the A-B line in a ratio of 2:1, let it be A'.
  2. Find the point which divides the C-D line in a ratio of 1:2, let it be D'.
  3. Divide the B-C line into three equal parts, let the two points be F and G.
  4. Find the point halfway between A' and F, this will be E.
  5. Find the point halfway between G and D', this will be H.

A Bézier curve from E to H with control points F and G is equivalent to an open B-spline between points A, B, C and D. See sections 1-5 of this excellent document. By the way, the above method is called Böhm's algorithm, and it is much more complicated if formulated in a proper mathematic way that accounts for non-uniform or non-cubic B-splines as well.

We have to repeat the above procedure for each group of 4 consecutive points of the B-spline, so in the end we will need the 1:2 and 2:1 division points between almost any consecutive control point pairs. This is what the following BSplineDrawer class does before drawing the curves:

class BSplineDrawer(object):
    def __init__(self, context):
        self.ctx = context

    def draw(self, bspline):
        pairs = zip(bspline.points[:-1], bspline.points[1:])
        one_thirds = [p1.interpolate(p2, 1/3.) for p1, p2 in pairs)
        two_thirds = [p2.interpolate(p1, 1/3.) for p1, p2 in pairs)

        coords = [None] * 6
        for i in xrange(len(bspline.points) - 3):
            start = two_thirds[i].interpolate(one_thirds[i+1])
            coords[0:2] = one_thirds[i+1]
            coords[2:4] = two_thirds[i+1]
            coords[4:6] = two_thirds[i+1].interpolate(one_thirds[i+2])

            self.context.move_to(*start)
            self.context.curve_to(*coords)
            self.context.stroke()

Finally, if we want to draw clamped B-splines instead of open B-splines, we simply have to repeat both endpoints of the clamped B-spline three more times:

class CubicBSpline(object):
    [...]
    def clamped(self):
        new_points = [self.points[0]] * 3 + self.points + [self.points[-1]] * 3
        return CubicBSpline(new_points)

Finally, this is how the code should be used:

import cairo

surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, 600, 400)
ctx = cairo.Context(surface)

points = [(100,100), (200,100), (200,200), (100,200), (100,400), (300,400)]
spline = CubicBSpline(points).clamped()

ctx.set_source_rgb(0., 0., 1.)
ctx.set_line_width(5)
BSplineDrawer(ctx).draw(spline)
qid & accept id: (2572099, 2572116) query: Python's safest method to store and retrieve passwords from a database soup:

Store the password+salt as a hash and the salt. Take a look at how Django does it: basic docs and source.\nIn the db they store $$ in a single char field. You can also store the three parts in separate fields.

\n

The function to set the password:

\n
def set_password(self, raw_password):\n    import random\n    algo = 'sha1'\n    salt = get_hexdigest(algo, str(random.random()), str(random.random()))[:5]\n    hsh = get_hexdigest(algo, salt, raw_password)\n    self.password = '%s$%s$%s' % (algo, salt, hsh)\n
\n

The get_hexdigest is just a thin wrapper around some hashing algorithms. You can use hashlib for that. Something like hashlib.sha1('%s%s' % (salt, hash)).hexdigest()

\n

And the function to check the password:

\n
def check_password(raw_password, enc_password):\n    """\n    Returns a boolean of whether the raw_password was correct. Handles\n    encryption formats behind the scenes.\n    """\n    algo, salt, hsh = enc_password.split('$')\n    return hsh == get_hexdigest(algo, salt, raw_password)\n
\n soup wrap:

Store the password+salt as a hash and the salt. Take a look at how Django does it: basic docs and source. In the db they store $$ in a single char field. You can also store the three parts in separate fields.

The function to set the password:

def set_password(self, raw_password):
    import random
    algo = 'sha1'
    salt = get_hexdigest(algo, str(random.random()), str(random.random()))[:5]
    hsh = get_hexdigest(algo, salt, raw_password)
    self.password = '%s$%s$%s' % (algo, salt, hsh)

The get_hexdigest is just a thin wrapper around some hashing algorithms. You can use hashlib for that. Something like hashlib.sha1('%s%s' % (salt, hash)).hexdigest()

And the function to check the password:

def check_password(raw_password, enc_password):
    """
    Returns a boolean of whether the raw_password was correct. Handles
    encryption formats behind the scenes.
    """
    algo, salt, hsh = enc_password.split('$')
    return hsh == get_hexdigest(algo, salt, raw_password)
qid & accept id: (2575672, 2575786) query: What's an easy and fast way to put returned XML data into a dict? soup:

Using xml from the standard Python library:

\n
import xml.etree.ElementTree as xee\ncontents='''\\n\n\n  74.125.45.100\n  OK\n  US\n  United States\n  06\n  California\n  Mountain View\n  94043\n  37.4192\n  -122.057\n  America/Los_Angeles\n  -25200\n  1\n'''\n\ndoc=xee.fromstring(contents)\nprint dict(((elt.tag,elt.text) for elt in doc))\n
\n

Or using lxml:

\n
import lxml.etree\nimport urllib2\nurl='http://ipinfodb.com/ip_query.php?ip=74.125.45.100&timezone=true'\ndoc = lxml.etree.parse( urllib2.urlopen(url) ).getroot()\nprint dict(((elt.tag,elt.text) for elt in doc))\n
\n soup wrap:

Using xml from the standard Python library:

import xml.etree.ElementTree as xee
contents='''\


  74.125.45.100
  OK
  US
  United States
  06
  California
  Mountain View
  94043
  37.4192
  -122.057
  America/Los_Angeles
  -25200
  1
'''

doc=xee.fromstring(contents)
print dict(((elt.tag,elt.text) for elt in doc))

Or using lxml:

import lxml.etree
import urllib2
url='http://ipinfodb.com/ip_query.php?ip=74.125.45.100&timezone=true'
doc = lxml.etree.parse( urllib2.urlopen(url) ).getroot()
print dict(((elt.tag,elt.text) for elt in doc))
qid & accept id: (2588364, 2588957) query: Python TEA implementation soup:

I fixed it. Here is working TEA implementation in python:

\n
#!/usr/bin/env python\n#-*- coding: utf-8 -*-\n\nimport sys\nfrom ctypes import *\n\ndef encipher(v, k):\n    y = c_uint32(v[0])\n    z = c_uint32(v[1])\n    sum = c_uint32(0)\n    delta = 0x9e3779b9\n    n = 32\n    w = [0,0]\n\n    while(n>0):\n        sum.value += delta\n        y.value += ( z.value << 4 ) + k[0] ^ z.value + sum.value ^ ( z.value >> 5 ) + k[1]\n        z.value += ( y.value << 4 ) + k[2] ^ y.value + sum.value ^ ( y.value >> 5 ) + k[3]\n        n -= 1\n\n    w[0] = y.value\n    w[1] = z.value\n    return w\n\ndef decipher(v, k):\n    y = c_uint32(v[0])\n    z = c_uint32(v[1])\n    sum = c_uint32(0xc6ef3720)\n    delta = 0x9e3779b9\n    n = 32\n    w = [0,0]\n\n    while(n>0):\n        z.value -= ( y.value << 4 ) + k[2] ^ y.value + sum.value ^ ( y.value >> 5 ) + k[3]\n        y.value -= ( z.value << 4 ) + k[0] ^ z.value + sum.value ^ ( z.value >> 5 ) + k[1]\n        sum.value -= delta\n        n -= 1\n\n    w[0] = y.value\n    w[1] = z.value\n    return w\n\nif __name__ == "__main__":\n    key = [1,2,3,4]\n    v = [1385482522,639876499]\n    enc = encipher(v,key)\n    print enc\n    print decipher(enc,key)\n
\n

And a small sample:

\n
>>> v\n[1385482522, 639876499]\n>>> tea.decipher(tea.encipher(v,key),key)\n[1385482522L, 639876499L]\n
\n soup wrap:

I fixed it. Here is working TEA implementation in python:

#!/usr/bin/env python
#-*- coding: utf-8 -*-

import sys
from ctypes import *

def encipher(v, k):
    y = c_uint32(v[0])
    z = c_uint32(v[1])
    sum = c_uint32(0)
    delta = 0x9e3779b9
    n = 32
    w = [0,0]

    while(n>0):
        sum.value += delta
        y.value += ( z.value << 4 ) + k[0] ^ z.value + sum.value ^ ( z.value >> 5 ) + k[1]
        z.value += ( y.value << 4 ) + k[2] ^ y.value + sum.value ^ ( y.value >> 5 ) + k[3]
        n -= 1

    w[0] = y.value
    w[1] = z.value
    return w

def decipher(v, k):
    y = c_uint32(v[0])
    z = c_uint32(v[1])
    sum = c_uint32(0xc6ef3720)
    delta = 0x9e3779b9
    n = 32
    w = [0,0]

    while(n>0):
        z.value -= ( y.value << 4 ) + k[2] ^ y.value + sum.value ^ ( y.value >> 5 ) + k[3]
        y.value -= ( z.value << 4 ) + k[0] ^ z.value + sum.value ^ ( z.value >> 5 ) + k[1]
        sum.value -= delta
        n -= 1

    w[0] = y.value
    w[1] = z.value
    return w

if __name__ == "__main__":
    key = [1,2,3,4]
    v = [1385482522,639876499]
    enc = encipher(v,key)
    print enc
    print decipher(enc,key)

And a small sample:

>>> v
[1385482522, 639876499]
>>> tea.decipher(tea.encipher(v,key),key)
[1385482522L, 639876499L]
qid & accept id: (2621549, 2622254) query: Creating multiple csv files from data within a csv file soup:

Using only AWK:

\n
awk -F, -vOFS=, -vc=1 '\n    NR == 1 {\n        for (i=1; i2 {\n        for (i=1; i < c; i++) {\n            print $1,$2, $g[i] > "output_"f[i]".csv"\n        }\n    }' data.csv\n
\n

As a one-liner:

\n
awk -F, -vOFS=, -vc=1 'NR == 1 {for (i=1; i2 { for (i=1; i < c; i++) {print $1,$2, $g[i] > "file_"f[i]".csv" }}' data.csv\n
\n

Example output:

\n
$ cat file_L1.csv\nEXAMPLEfoo,60,6\nEXAMPLEbar,30,6\nEXAMPLE1,60,3\nEXAMPLE2,120,6\nEXAMPLE3,60,6\nEXAMPLE4,30,6\n$ cat file_L2.csv\nEXAMPLEfoo,60,0\nEXAMPLEbar,30,6\nEXAMPLE1,60,3\nEXAMPLE2,120,0\nEXAMPLE3,60,6\nEXAMPLE4,30,6\n$ cat file_L11.csv\nEXAMPLEfoo,60,0\nEXAMPLEbar,30,6\nEXAMPLE1,60,3\nEXAMPLE2,120,0\nEXAMPLE3,60,0\nEXAMPLE4,30,6\n
\n soup wrap:

Using only AWK:

awk -F, -vOFS=, -vc=1 '
    NR == 1 {
        for (i=1; i2 {
        for (i=1; i < c; i++) {
            print $1,$2, $g[i] > "output_"f[i]".csv"
        }
    }' data.csv

As a one-liner:

awk -F, -vOFS=, -vc=1 'NR == 1 {for (i=1; i2 { for (i=1; i < c; i++) {print $1,$2, $g[i] > "file_"f[i]".csv" }}' data.csv

Example output:

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L11.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,0
EXAMPLE4,30,6
qid & accept id: (2654689, 2654818) query: Django - how to write users and profiles handling in best way? soup:
\n

users should be able to register and authenticate

\n
\n

django.contrib.auth is the module you want. Be sure to check the docs for custom login forms.

\n
\n

every user should have profile (or model with all required fields)

\n
\n

You need to set settings.AUTH_PROFILE_MODULE, as noted by others.

\n

Information about setting up the user profile model is available for the latest version, 1.1, and 1.0. It hasn't been dropped.

\n
\n

users dont need django builtin admin panel, but they need to edit their profiles/models via simple web form

\n
\n

You can create a form and view just like you would for any other app; maybe make a "user control panel" app for handling these things. Your views would then interact with the django.contrib.auth.models.User and django.contrib.auth.models.Group models. You can set this up to do whatever you need.

\n

EDIT: Responding to your questions-in-the-form-of-an-answer (paging Alex Trebek)...

\n
\n

The second version of djangobook, covering django 1.0 (that is way closer to 1.2 than 0.96) no longer has that information anywhere, what makes me highly confused - has anything changed? Is there other, better, more secure way to handle users and their profiles? Therefore this question asked.

\n
\n

I wouldn't recommend djangobook as a reference; it's out of date on this topic. User profiles exist and I'm using them in my Django 1.1.1 site; I'm even populating them from NIS.

\n

Please use the links I provided above. They go directly to the actual Django documentation and are authoritative.

\n
\n

By the way, I forgot to ask, if the way you all refer to (that is AUTH_PROFILE_MODULE) will create automatically upon registration

\n
\n

Answered in the docs.

\n
\n

and require the profile to exist upon any action (user withoud existing, filled profile should not exists, this is why I was thinking about extending User model somehow)?

\n
\n

The profile needs to exist if User.get_profile() is called.

\n
\n

Will it get updated as well (people are mentioning 'signals' on various blogs related to this subject)?

\n
\n

It's like any other model: it only gets updated when you change the fields and call save().

\n

The signal part is how you hook in a function to create a profile for a new User:

\n
from django.db.models.signals import post_save\nfrom django.contrib.auth import User\nfrom myUserProfileApp import UserProfile\n\ndef make_user_profile(sender, **kwargs):\n    if 'created' not in kwargs or not kwargs['created']:\n        return\n\n    # Assumes that the `ForeignKey(User)` field in "UserProfile" is named "user".\n    profile = UserProfile(user=kwargs["instance"])\n    # Set anything else you need to in the profile, then...\n    profile.save()\n\npost_save.connect(make_user_profile, sender=User, weak=False)\n
\n

This only creates a new profile for a new User. Existing Users need to have profiles manually added:

\n
$ ./manage.py shell\n>>> from django.contrib.auth import User\n>>> from myUserProfileApp import UserProfile\n>>> for u in User.objects.all():\n...  UserProfile(user=u).save() # Add other params as needed.\n...\n
\n

If you have some users with profiles and some without, you'll need to do a bit more work:

\n
>>> for u in User.objects.all():\n...  try:\n...   UserProfile(user=u).save() # Add other params as needed.\n...  except:\n...   pass\n
\n soup wrap:

users should be able to register and authenticate

django.contrib.auth is the module you want. Be sure to check the docs for custom login forms.

every user should have profile (or model with all required fields)

You need to set settings.AUTH_PROFILE_MODULE, as noted by others.

Information about setting up the user profile model is available for the latest version, 1.1, and 1.0. It hasn't been dropped.

users dont need django builtin admin panel, but they need to edit their profiles/models via simple web form

You can create a form and view just like you would for any other app; maybe make a "user control panel" app for handling these things. Your views would then interact with the django.contrib.auth.models.User and django.contrib.auth.models.Group models. You can set this up to do whatever you need.

EDIT: Responding to your questions-in-the-form-of-an-answer (paging Alex Trebek)...

The second version of djangobook, covering django 1.0 (that is way closer to 1.2 than 0.96) no longer has that information anywhere, what makes me highly confused - has anything changed? Is there other, better, more secure way to handle users and their profiles? Therefore this question asked.

I wouldn't recommend djangobook as a reference; it's out of date on this topic. User profiles exist and I'm using them in my Django 1.1.1 site; I'm even populating them from NIS.

Please use the links I provided above. They go directly to the actual Django documentation and are authoritative.

By the way, I forgot to ask, if the way you all refer to (that is AUTH_PROFILE_MODULE) will create automatically upon registration

Answered in the docs.

and require the profile to exist upon any action (user withoud existing, filled profile should not exists, this is why I was thinking about extending User model somehow)?

The profile needs to exist if User.get_profile() is called.

Will it get updated as well (people are mentioning 'signals' on various blogs related to this subject)?

It's like any other model: it only gets updated when you change the fields and call save().

The signal part is how you hook in a function to create a profile for a new User:

from django.db.models.signals import post_save
from django.contrib.auth import User
from myUserProfileApp import UserProfile

def make_user_profile(sender, **kwargs):
    if 'created' not in kwargs or not kwargs['created']:
        return

    # Assumes that the `ForeignKey(User)` field in "UserProfile" is named "user".
    profile = UserProfile(user=kwargs["instance"])
    # Set anything else you need to in the profile, then...
    profile.save()

post_save.connect(make_user_profile, sender=User, weak=False)

This only creates a new profile for a new User. Existing Users need to have profiles manually added:

$ ./manage.py shell
>>> from django.contrib.auth import User
>>> from myUserProfileApp import UserProfile
>>> for u in User.objects.all():
...  UserProfile(user=u).save() # Add other params as needed.
...

If you have some users with profiles and some without, you'll need to do a bit more work:

>>> for u in User.objects.all():
...  try:
...   UserProfile(user=u).save() # Add other params as needed.
...  except:
...   pass
qid & accept id: (2658026, 2659472) query: How to change the date/time in Python for all modules? soup:

Monkey-patching time.time is probably sufficient, actually, as it provides the basis for almost all the other time-based routines in Python. This appears to handle your use case pretty well, without resorting to more complex tricks, and it doesn't matter when you do it (aside from the few stdlib packages like Queue.py and threading.py that do from time import time in which case you must patch before they get imported):

\n
>>> import datetime\n>>> datetime.datetime.now()\ndatetime.datetime(2010, 4, 17, 14, 5, 35, 642000)\n>>> import time\n>>> def mytime(): return 120000000.0\n...\n>>> time.time = mytime\n>>> datetime.datetime.now()\ndatetime.datetime(1973, 10, 20, 17, 20)\n
\n

That said, in years of mocking objects for various types of automated testing, I've needed this approach only very rarely, as most of the time it's my own application code that needs the mocking, and not the stdlib routines. After all, you know they work already. If you are encountering situations where your own code has to handle values returned by library routines, you may want to mock the library routines themselves, at least when checking how your own app will handle the timestamps.

\n

The best approach by far is to build your own date/time service routine(s) which you use exclusively in your application code, and build into that the ability for tests to supply fake results as required. For example, I do a more complex equivalent of this sometimes:

\n
# in file apptime.py (for example)\nimport time as _time\n\nclass MyTimeService(object):\n    def __init__(self, get_time=None):\n        self.get_time = get_time or _time.time\n\n    def __call__(self):\n        return self.get_time()\n\ntime = MyTimeService()\n
\n

Now in my app code I just do import apptime as time; time.time() to get the current time value, whereas in test code I can first do apptime.time = MyTimeService(mock_time_func) in my setUp() code to supply fake time results.

\n soup wrap:

Monkey-patching time.time is probably sufficient, actually, as it provides the basis for almost all the other time-based routines in Python. This appears to handle your use case pretty well, without resorting to more complex tricks, and it doesn't matter when you do it (aside from the few stdlib packages like Queue.py and threading.py that do from time import time in which case you must patch before they get imported):

>>> import datetime
>>> datetime.datetime.now()
datetime.datetime(2010, 4, 17, 14, 5, 35, 642000)
>>> import time
>>> def mytime(): return 120000000.0
...
>>> time.time = mytime
>>> datetime.datetime.now()
datetime.datetime(1973, 10, 20, 17, 20)

That said, in years of mocking objects for various types of automated testing, I've needed this approach only very rarely, as most of the time it's my own application code that needs the mocking, and not the stdlib routines. After all, you know they work already. If you are encountering situations where your own code has to handle values returned by library routines, you may want to mock the library routines themselves, at least when checking how your own app will handle the timestamps.

The best approach by far is to build your own date/time service routine(s) which you use exclusively in your application code, and build into that the ability for tests to supply fake results as required. For example, I do a more complex equivalent of this sometimes:

# in file apptime.py (for example)
import time as _time

class MyTimeService(object):
    def __init__(self, get_time=None):
        self.get_time = get_time or _time.time

    def __call__(self):
        return self.get_time()

time = MyTimeService()

Now in my app code I just do import apptime as time; time.time() to get the current time value, whereas in test code I can first do apptime.time = MyTimeService(mock_time_func) in my setUp() code to supply fake time results.

qid & accept id: (2716894, 2720966) query: making binned boxplot in matplotlib with numpy and scipy in Python soup:

Numpy has a dedicated function for creating histograms the way you need to:

\n
histogram(a, bins=10, range=None, normed=False, weights=None, new=None)\n
\n

which you can use like:

\n
(hist_data, bin_edges) = histogram(my_array[:,0], weights=my_array[:,1])\n
\n

The key point here is to use the weights argument: each value a[i] will contribute weights[i] to the histogram. Example:

\n
a = [0, 1]\nweights = [10, 2]\n
\n

describes 10 points at x = 0 and 2 points at x = 1.

\n

You can set the number of bins, or the bin limits, with the bins argument (see the official documentation for more details).

\n

The histogram can then be plotted with something like:

\n
bar(bin_edges[:-1], hist_data)\n
\n

If you only need to do a histogram plot, the similar hist() function can directly plot the histogram:

\n
hist(my_array[:,0], weights=my_array[:,1])\n
\n soup wrap:

Numpy has a dedicated function for creating histograms the way you need to:

histogram(a, bins=10, range=None, normed=False, weights=None, new=None)

which you can use like:

(hist_data, bin_edges) = histogram(my_array[:,0], weights=my_array[:,1])

The key point here is to use the weights argument: each value a[i] will contribute weights[i] to the histogram. Example:

a = [0, 1]
weights = [10, 2]

describes 10 points at x = 0 and 2 points at x = 1.

You can set the number of bins, or the bin limits, with the bins argument (see the official documentation for more details).

The histogram can then be plotted with something like:

bar(bin_edges[:-1], hist_data)

If you only need to do a histogram plot, the similar hist() function can directly plot the histogram:

hist(my_array[:,0], weights=my_array[:,1])
qid & accept id: (2717086, 2717876) query: Dealing with Windows line-endings in Python soup:

Allegedly: """This guy has \r\n right in the middle of tag descriptors like so: """.

\n

I see no \r\n here. Perhaps you mean repr(xml) contains things like

\n
""\n
\n

If not, try to say precisely what you mean, with repr-fashion examples.

\n

The following should work:

\n
>>> import re\n>>> guff = """\r\n"""\n>>> re.sub(r"(<[^>]*)\r\n([^>]*>)", r"\1\2", guff)\n'\r\n'\n>>>\n
\n

If there is more than one line break in a tag e.g. this will fix only the first. Alternatives (1) loop until the guff stops shrinking (2) write a smarter regexp yourself :-)

\n soup wrap:

Allegedly: """This guy has \r\n right in the middle of tag descriptors like so: """.

I see no \r\n here. Perhaps you mean repr(xml) contains things like

""

If not, try to say precisely what you mean, with repr-fashion examples.

The following should work:

>>> import re
>>> guff = """\r\n"""
>>> re.sub(r"(<[^>]*)\r\n([^>]*>)", r"\1\2", guff)
'\r\n'
>>>

If there is more than one line break in a tag e.g. this will fix only the first. Alternatives (1) loop until the guff stops shrinking (2) write a smarter regexp yourself :-)

qid & accept id: (2726839, 2727085) query: Creating a pygtk text field that only accepts number soup:

I wouldn't know about a way to do something like this by simple switching a settings, I guess you will need to handle this via signals, one way would be to connect to the changed signal and then filter out anything that's not a number.

\n

Simple approach(untested but should work):

\n
class NumberEntry(gtk.Entry):\n    def __init__(self):\n        gtk.Entry.__init__(self)\n        self.connect('changed', self.on_changed)\n\n    def on_changed(self, *args):\n        text = self.get_text().strip()\n        self.set_text(''.join([i for i in text if i in '0123456789']))\n
\n

If you want formatted Numbers you could of course go more fancy with a regex or something else, to determine which characters should stay inside the entry.

\n

EDIT
\nSince you may not want to create your Entry in Python I'm going to show you a simple way to "numbify" an existing one.

\n
    def numbify(widget):\n        def filter_numbers(entry, *args):\n            text = entry.get_text().strip()\n            entry.set_text(''.join([i for i in text if i in '0123456789']))\n\n        widget.connect('changed', filter_numbers)\n\n    # Use gtk.Builder rather than glade, you'll need to change the format of your .glade file in Glade accordingly\n    builder = gtk.Builder()\n    builder.add_from_file('yourprogram.glade')\n    entry = builder.get_object('yourentry')\n\n    numbify(entry)\n
\n soup wrap:

I wouldn't know about a way to do something like this by simple switching a settings, I guess you will need to handle this via signals, one way would be to connect to the changed signal and then filter out anything that's not a number.

Simple approach(untested but should work):

class NumberEntry(gtk.Entry):
    def __init__(self):
        gtk.Entry.__init__(self)
        self.connect('changed', self.on_changed)

    def on_changed(self, *args):
        text = self.get_text().strip()
        self.set_text(''.join([i for i in text if i in '0123456789']))

If you want formatted Numbers you could of course go more fancy with a regex or something else, to determine which characters should stay inside the entry.

EDIT
Since you may not want to create your Entry in Python I'm going to show you a simple way to "numbify" an existing one.

    def numbify(widget):
        def filter_numbers(entry, *args):
            text = entry.get_text().strip()
            entry.set_text(''.join([i for i in text if i in '0123456789']))

        widget.connect('changed', filter_numbers)

    # Use gtk.Builder rather than glade, you'll need to change the format of your .glade file in Glade accordingly
    builder = gtk.Builder()
    builder.add_from_file('yourprogram.glade')
    entry = builder.get_object('yourentry')

    numbify(entry)
qid & accept id: (2743712, 2744164) query: Installing OSQA on windows (local system) soup:
    \n
  1. Download http://svn.osqa.net/svnroot/osqa/trunk to a folder {OSQA_ROOT} eg, c:\osqa

  2. \n
  3. Rename {OSQA_ROOT}\settings_local.py.dist to {OSQA_ROOT}\settings_local.py

  4. \n
  5. set following in {OSQA_ROOT}\settings_local.py

    \n
    DATABASE_NAME = 'osqa'             # Or path to database file if using sqlite3.\nDATABASE_USER = 'root'               # Not used with sqlite3.\nDATABASE_PASSWORD = 'PASSWD'               # Not used with sqlite3.  put bitnami here\nDATABASE_ENGINE = 'mysql'  #mysql, etc\n
    \n
      \n
    • Default MySQL credentials in bitnami are: -u root -p bitnami \n
      \n
    • \n
  6. \n
  7. add following {DJANGOSTACK}\apps\django\conf\django.conf, / means root folder like http://localhost

    \n
    \n    SetHandler python-program\n    PythonHandler django.core.handlers.modpython\n    PythonPath "['{OSQA_ROOT}'] + sys.path"\n    SetEnv DJANGO_SETTINGS_MODULE osqa.settings\n    PythonDebug On\n\n
  8. \n
  9. instasll markdown2 and html5lib with easy_install.exe, which is inside {DJANGOSTACK}\python\Scripts\

    \n
    easy_install markdown2\neasy_install html5lib\n
  10. \n
  11. create new db called osqa with mysqladmin.exe which is in {DJANGOSTACK}\mysql\bin

    \n
    mysqladmin create osqa\n
  12. \n
  13. syncdb

    \n
    {DJANGOSTACK}\python\python.exe manage.py syncdb\n
    \n
      \n
    • manage.py is in apps\django\django\conf\project_template as of 5/8/11
    • \n
    • You must update settings.py with mysql, root, and bitnami (pw), and osqa as db name\n

    • \n
  14. \n
  15. enjoy!

  16. \n
\n

alt text http://img87.imageshack.us/img87/723/osqabitnamidjangostack.png

\n

Note:

\n
    \n
  • {OSQA_ROOT} => osqa trunk directory
  • \n
  • {DJANGOSTACK} => BitNami DjangoStack install directory
  • \n
\n soup wrap:
  1. Download http://svn.osqa.net/svnroot/osqa/trunk to a folder {OSQA_ROOT} eg, c:\osqa

  2. Rename {OSQA_ROOT}\settings_local.py.dist to {OSQA_ROOT}\settings_local.py

  3. set following in {OSQA_ROOT}\settings_local.py

    DATABASE_NAME = 'osqa'             # Or path to database file if using sqlite3.
    DATABASE_USER = 'root'               # Not used with sqlite3.
    DATABASE_PASSWORD = 'PASSWD'               # Not used with sqlite3.  put bitnami here
    DATABASE_ENGINE = 'mysql'  #mysql, etc
    
    • Default MySQL credentials in bitnami are: -u root -p bitnami

  4. add following {DJANGOSTACK}\apps\django\conf\django.conf, / means root folder like http://localhost

    
        SetHandler python-program
        PythonHandler django.core.handlers.modpython
        PythonPath "['{OSQA_ROOT}'] + sys.path"
        SetEnv DJANGO_SETTINGS_MODULE osqa.settings
        PythonDebug On
    
    
  5. instasll markdown2 and html5lib with easy_install.exe, which is inside {DJANGOSTACK}\python\Scripts\

    easy_install markdown2
    easy_install html5lib
    
  6. create new db called osqa with mysqladmin.exe which is in {DJANGOSTACK}\mysql\bin

    mysqladmin create osqa
    
  7. syncdb

    {DJANGOSTACK}\python\python.exe manage.py syncdb
    
    • manage.py is in apps\django\django\conf\project_template as of 5/8/11
    • You must update settings.py with mysql, root, and bitnami (pw), and osqa as db name

  8. enjoy!

alt text http://img87.imageshack.us/img87/723/osqabitnamidjangostack.png

Note:

  • {OSQA_ROOT} => osqa trunk directory
  • {DJANGOSTACK} => BitNami DjangoStack install directory
qid & accept id: (2777188, 2777223) query: Making a python iterator go backwards? soup:

No, in general you cannot make a Python iterator go backwards. However, if you only want to step back once, you can try something like this:

\n
def str(self, item):\n    print item\n\n    prev, current = None, self.__iter.next()\n    while isinstance(current, int):\n        print current\n        prev, current = current, self.__iter.next()\n
\n

You can then access the previous element any time in prev.

\n

If you really need a bidirectional iterator, you can implement one yourself, but it's likely to introduce even more overhead than the solution above:

\n
class bidirectional_iterator(object):\n    def __init__(self, collection):\n        self.collection = collection\n        self.index = 0\n\n    def next(self):\n        try:\n            result = self.collection[self.index]\n            self.index += 1\n        except IndexError:\n            raise StopIteration\n        return result\n\n    def prev(self):\n        self.index -= 1\n        if self.index < 0:\n            raise StopIteration\n        return self.collection[self.index]\n\n    def __iter__(self):\n        return self\n
\n soup wrap:

No, in general you cannot make a Python iterator go backwards. However, if you only want to step back once, you can try something like this:

def str(self, item):
    print item

    prev, current = None, self.__iter.next()
    while isinstance(current, int):
        print current
        prev, current = current, self.__iter.next()

You can then access the previous element any time in prev.

If you really need a bidirectional iterator, you can implement one yourself, but it's likely to introduce even more overhead than the solution above:

class bidirectional_iterator(object):
    def __init__(self, collection):
        self.collection = collection
        self.index = 0

    def next(self):
        try:
            result = self.collection[self.index]
            self.index += 1
        except IndexError:
            raise StopIteration
        return result

    def prev(self):
        self.index -= 1
        if self.index < 0:
            raise StopIteration
        return self.collection[self.index]

    def __iter__(self):
        return self
qid & accept id: (2785714, 2785733) query: Parsing html for domain links soup:

You might consider stripping 'www.' from the list and doing something as simple as:

\n
url = 'domain.com/'\nfor domain in list:\n    if url.startswith(domain):\n        ... do something ...\n
\n

Or trying both wont hurt either I spose:

\n
url = 'domain.com/'\nfor domain in list:\n    domain_minus_www = domain\n    if domain_minus_www.startswith('www.'):\n        domain_minus_www = domain_minus_www[4:]\n    if url.startswith(domain) or url.startswith(domain_minus_www):\n        ... do something ...\n
\n soup wrap:

You might consider stripping 'www.' from the list and doing something as simple as:

url = 'domain.com/'
for domain in list:
    if url.startswith(domain):
        ... do something ...

Or trying both wont hurt either I spose:

url = 'domain.com/'
for domain in list:
    domain_minus_www = domain
    if domain_minus_www.startswith('www.'):
        domain_minus_www = domain_minus_www[4:]
    if url.startswith(domain) or url.startswith(domain_minus_www):
        ... do something ...
qid & accept id: (2857634, 2857700) query: How can I create a GUI on top of a Python APP so it can do either GUI or CLI? soup:
\n

is there a simple way of detecting something like GTK, so it only applied the code when GTK was present?

\n
\n

First, break your app into 3 separate modules.

\n
    \n
  1. The actual work: foo_core.py.

  2. \n
  3. A CLI module that imports foo_core. Call it foo_cli.py.

  4. \n
  5. A GUI module that imports foo_core. Call it foo_gui.pyw.

  6. \n
\n

The foo_cli module looks like this.

\n
import foo_core\nimport optparse\n\ndef main():\n    # parse the command-line options\n    # the real work is done by foo_core\n\nif __name__ == "__main__":\n   main()\n
\n

The foo_gui module can look like this.

\n
 import foo_core\n import gtk # or whatever\n\n def main()\n     # build the GUI\n     # real work is done by foo_core under control of the GUI\n\n if __name__ == "__main__":\n     main()\n
\n

That's generally sufficient. People can be trusted to decide for themselves if they want CLI or GUI.

\n

If you want to confuse people, you can write a foo.py script that does something like the following.

\n
try:\n    import foo_gui\n    foo_gui.main()\nexcept ImportError:\n    import foo_cli\n    foo_cli.main()\n
\n soup wrap:

is there a simple way of detecting something like GTK, so it only applied the code when GTK was present?

First, break your app into 3 separate modules.

  1. The actual work: foo_core.py.

  2. A CLI module that imports foo_core. Call it foo_cli.py.

  3. A GUI module that imports foo_core. Call it foo_gui.pyw.

The foo_cli module looks like this.

import foo_core
import optparse

def main():
    # parse the command-line options
    # the real work is done by foo_core

if __name__ == "__main__":
   main()

The foo_gui module can look like this.

 import foo_core
 import gtk # or whatever

 def main()
     # build the GUI
     # real work is done by foo_core under control of the GUI

 if __name__ == "__main__":
     main()

That's generally sufficient. People can be trusted to decide for themselves if they want CLI or GUI.

If you want to confuse people, you can write a foo.py script that does something like the following.

try:
    import foo_gui
    foo_gui.main()
except ImportError:
    import foo_cli
    foo_cli.main()
qid & accept id: (2882308, 2883627) query: Spawning a thread in python soup:

Instead of switch-case, why not use a proper polymorphism? For example, here what you can do with duck typing in Python:

\n

In, say, alice.py:

\n
def do_stuff(data):\n    print 'alice does stuff with %s' % data\n
\n

In, say, bob.py:

\n
def do_stuff(data):\n    print 'bob does stuff with %s' % data\n
\n

Then in your client code, say, main.py:

\n
import threading\nimport alice, bob\n\ndef get_work_data():\n    return 'data'\n\ndef main():\n    tasks = [alice.do_stuff, bob.do_stuff]\n    data = get_work_data()\n    for task in tasks:\n        t = threading.Thread(target=task, args=(data,))\n        t.start()\n
\n

Let me know if I need to clarify.

\n soup wrap:

Instead of switch-case, why not use a proper polymorphism? For example, here what you can do with duck typing in Python:

In, say, alice.py:

def do_stuff(data):
    print 'alice does stuff with %s' % data

In, say, bob.py:

def do_stuff(data):
    print 'bob does stuff with %s' % data

Then in your client code, say, main.py:

import threading
import alice, bob

def get_work_data():
    return 'data'

def main():
    tasks = [alice.do_stuff, bob.do_stuff]
    data = get_work_data()
    for task in tasks:
        t = threading.Thread(target=task, args=(data,))
        t.start()

Let me know if I need to clarify.

qid & accept id: (2922769, 2924297) query: Embedding IronPython in a WinForms app and interrupting execution soup:

This is basically an adaptation of how the IronPython console handles Ctrl-C. If you want to check the source, it's in BasicConsole and CommandLine.Run.

\n

First, start up the IronPython engine on a separate thread (as you assumed). When you go to run the user's code, wrap it in a try ... catch(ThreadAbortException) block:

\n
var engine = Python.CreateEngine();\nbool aborted = false;\ntry {\n    engine.Execute(/* whatever */);\n} catch(ThreadAbortException tae) {\n    if(tae.ExceptionState is Microsoft.Scripting.KeyboardInterruptException) {\n        Thread.ResetAbort();\n        aborted = true;\n    } else { throw; }\n}\n\nif(aborted) {\n    // this is application-specific\n}\n
\n

Now, you'll need to keep a reference to the IronPython thread handy. Create a button handler on your form, and call Thread.Abort().

\n
public void StopButton_OnClick(object sender, EventArgs e) {\n    pythonThread.Abort(new Microsoft.Scripting.KeyboardInterruptException(""));\n}\n
\n

The KeyboardInterruptException argument allows the Python thread to trap the ThreadAbortException and handle it as a KeyboardInterrupt.

\n soup wrap:

This is basically an adaptation of how the IronPython console handles Ctrl-C. If you want to check the source, it's in BasicConsole and CommandLine.Run.

First, start up the IronPython engine on a separate thread (as you assumed). When you go to run the user's code, wrap it in a try ... catch(ThreadAbortException) block:

var engine = Python.CreateEngine();
bool aborted = false;
try {
    engine.Execute(/* whatever */);
} catch(ThreadAbortException tae) {
    if(tae.ExceptionState is Microsoft.Scripting.KeyboardInterruptException) {
        Thread.ResetAbort();
        aborted = true;
    } else { throw; }
}

if(aborted) {
    // this is application-specific
}

Now, you'll need to keep a reference to the IronPython thread handy. Create a button handler on your form, and call Thread.Abort().

public void StopButton_OnClick(object sender, EventArgs e) {
    pythonThread.Abort(new Microsoft.Scripting.KeyboardInterruptException(""));
}

The KeyboardInterruptException argument allows the Python thread to trap the ThreadAbortException and handle it as a KeyboardInterrupt.

qid & accept id: (2939050, 3747337) query: Scrapy - Follow RSS links soup:

CrawlSpider rules don't work that way. You'll probably need to subclass BaseSpider and implement your own link extraction in your spider callback. For example:

\n
from scrapy.spider import BaseSpider\nfrom scrapy.http import Request\nfrom scrapy.selector import XmlXPathSelector\n\nclass MySpider(BaseSpider):\n    name = 'myspider'\n\n    def parse(self, response):\n        xxs = XmlXPathSelector(response)\n        links = xxs.select("//link/text()").extract()\n        return [Request(x, callback=self.parse_link) for x in links]\n
\n

You can also try the XPath in the shell, by running for example:

\n
scrapy shell http://blog.scrapy.org/rss.xml\n
\n

And then typing in the shell:

\n
>>> xxs.select("//link/text()").extract()\n[u'http://blog.scrapy.org',\n u'http://blog.scrapy.org/new-bugfix-release-0101',\n u'http://blog.scrapy.org/new-scrapy-blog-and-scrapy-010-release']\n
\n soup wrap:

CrawlSpider rules don't work that way. You'll probably need to subclass BaseSpider and implement your own link extraction in your spider callback. For example:

from scrapy.spider import BaseSpider
from scrapy.http import Request
from scrapy.selector import XmlXPathSelector

class MySpider(BaseSpider):
    name = 'myspider'

    def parse(self, response):
        xxs = XmlXPathSelector(response)
        links = xxs.select("//link/text()").extract()
        return [Request(x, callback=self.parse_link) for x in links]

You can also try the XPath in the shell, by running for example:

scrapy shell http://blog.scrapy.org/rss.xml

And then typing in the shell:

>>> xxs.select("//link/text()").extract()
[u'http://blog.scrapy.org',
 u'http://blog.scrapy.org/new-bugfix-release-0101',
 u'http://blog.scrapy.org/new-scrapy-blog-and-scrapy-010-release']
qid & accept id: (2951701, 2951722) query: Is it possible to use 'else' in a python list comprehension? soup:

The syntax a if b else c is a ternary operator in Python that evaluates to a if the condition b is true - otherwise, it evaluates to c. It can be used in comprehension statements:

\n
>>> [a if a else 2 for a in [0,1,0,3]]\n[2, 1, 2, 3]\n
\n

So for your example,

\n
table = ''.join(chr(index) if index in ords_to_keep else replace_with\n                for index in xrange(15))\n
\n soup wrap:

The syntax a if b else c is a ternary operator in Python that evaluates to a if the condition b is true - otherwise, it evaluates to c. It can be used in comprehension statements:

>>> [a if a else 2 for a in [0,1,0,3]]
[2, 1, 2, 3]

So for your example,

table = ''.join(chr(index) if index in ords_to_keep else replace_with
                for index in xrange(15))
qid & accept id: (2964751, 2975194) query: How to convert a GEOS MultiLineString to Polygon using Python? soup:

Hehe, at first I wrote this:

\n
def close_geometry(self, geometry):\n   if geometry.empty or geometry[0].empty:\n       return geometry # empty\n\n   if(geometry[-1][-1] == geometry[0][0]):\n       return geometry  # already closed\n\n   result = None\n   for linestring in geom:\n      if result is None:\n          resultstring = linestring.clone()\n      else:\n          resultstring.extend(linestring.coords)\n\n   geom = Polygon(resultstring)\n\n   return geom\n
\n

but then I discovered that there is a nifty little method called convex_hull that does the polygon conversion for you automatically.

\n
>>> s1 = LineString((0, 0), (1, 1), (1, 2), (0, 1))\n>>> s1.convex_hull\n\n>>> s1.convex_hull.coords\n(((0.0, 0.0), (0.0, 1.0), (1.0, 2.0), (1.0, 1.0), (0.0, 0.0)),)\n\n>>> m1=MultiLineString(s1)\n>>> m1.convex_hull\n\n>>> m1.convex_hull.coords\n(((0.0, 0.0), (0.0, 1.0), (1.0, 2.0), (1.0, 1.0), (0.0, 0.0)),)\n
\n soup wrap:

Hehe, at first I wrote this:

def close_geometry(self, geometry):
   if geometry.empty or geometry[0].empty:
       return geometry # empty

   if(geometry[-1][-1] == geometry[0][0]):
       return geometry  # already closed

   result = None
   for linestring in geom:
      if result is None:
          resultstring = linestring.clone()
      else:
          resultstring.extend(linestring.coords)

   geom = Polygon(resultstring)

   return geom

but then I discovered that there is a nifty little method called convex_hull that does the polygon conversion for you automatically.

>>> s1 = LineString((0, 0), (1, 1), (1, 2), (0, 1))
>>> s1.convex_hull

>>> s1.convex_hull.coords
(((0.0, 0.0), (0.0, 1.0), (1.0, 2.0), (1.0, 1.0), (0.0, 0.0)),)

>>> m1=MultiLineString(s1)
>>> m1.convex_hull

>>> m1.convex_hull.coords
(((0.0, 0.0), (0.0, 1.0), (1.0, 2.0), (1.0, 1.0), (0.0, 0.0)),)
qid & accept id: (2983959, 2983967) query: Splitting a list in python soup:

You can write your own split function for lists quite easily by using yield:

\n
def split_list(l, sep):\n    current = []\n    for x in l:\n        if x == sep:\n            yield current\n            current = []\n        else:\n            current.append(x)\n    yield current\n
\n

An alternative way is to use list.index and catch the exception:

\n
def split_list(l, sep):\n    i = 0\n    try:\n        while True:\n            j = l.index(sep, i)\n            yield l[i:j]\n            i = j + 1\n    except ValueError:\n        yield l[i:]\n
\n

Either way you can call it like this:

\n
l = ['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')', '+', '4', ')',\n     '/', '3', '.', 'x', '^', '2']\n\nfor r in split_list(l, '+'):\n    print r\n
\n

Result:

\n
['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')']\n['4', ')', '/', '3', '.', 'x', '^', '2']\n
\n

For parsing in Python you might also want to look at something like pyparsing.

\n soup wrap:

You can write your own split function for lists quite easily by using yield:

def split_list(l, sep):
    current = []
    for x in l:
        if x == sep:
            yield current
            current = []
        else:
            current.append(x)
    yield current

An alternative way is to use list.index and catch the exception:

def split_list(l, sep):
    i = 0
    try:
        while True:
            j = l.index(sep, i)
            yield l[i:j]
            i = j + 1
    except ValueError:
        yield l[i:]

Either way you can call it like this:

l = ['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')', '+', '4', ')',
     '/', '3', '.', 'x', '^', '2']

for r in split_list(l, '+'):
    print r

Result:

['(', '2', '.', 'x', '.', '(', '3', '-', '1', ')']
['4', ')', '/', '3', '.', 'x', '^', '2']

For parsing in Python you might also want to look at something like pyparsing.

qid & accept id: (3038661, 3038691) query: Efficiently finding the shortest path in large graphs soup:

python-graph

\n

added:

\n

The comments made me curious as to how the performance of pygraph was for a problem on the order of the OP, so I made a toy program to find out. Here's the output for a slightly smaller version of the problem:

\n
$ python2.6 biggraph.py 4 6\nbiggraph generate 10000 nodes     00:00:00\nbiggraph generate 1000000 edges   00:00:00\nbiggraph add edges                00:00:05\nbiggraph Dijkstra                 00:01:32\nbiggraph shortest_path done       00:04:15\nstep: 1915 2\nstep: 0 1\nbiggraph walk done                00:04:15\npath: [9999, 1915, 0]\n
\n

Not too bad for 10k nodes and 1M edges. It is important to note that the way Dijkstra's is computed by pygraph yields a dictionary of all spanning trees for each node relative to one target (which was arbitrarily node 0, and holds no privileged position in the graph). Therefore, the solution that took 3.75 minutes to compute actually yielded the answer to "what is the shortest path from all nodes to the target?". Indeed once shortest_path was done, walking the answer was mere dictionary lookups and took essentially no time. It is also worth noting that adding the pre-computed edges to the graph was rather expensive at ~1.5 minutes. These timings are consistent across multiple runs.

\n

I'd like to say that the process scales well, but I'm still waiting on biggraph 5 6 on an otherwise idled computer (Athlon 64, 4800 BogoMIPS per processor, all in core) which has been running for over a quarter hour. At least the memory use is stable at about 0.5GB. And the results are in:

\n
biggraph generate 100000 nodes    00:00:00\nbiggraph generate 1000000 edges   00:00:00\nbiggraph add edges                00:00:07\nbiggraph Dijkstra                 00:01:27\nbiggraph shortest_path done       00:23:44\nstep: 48437 4\nstep: 66200 3\nstep: 83824 2\nstep: 0 1\nbiggraph walk done                00:23:44\npath: [99999, 48437, 66200, 83824, 0]\n
\n

That's a long time, but it was also a heavy computation (and I really wish I'd pickled the result). Here's the code for the curious:

\n
#!/usr/bin/python\n\nimport pygraph.classes.graph\nimport pygraph.algorithms\nimport pygraph.algorithms.minmax\nimport time\nimport random\nimport sys\n\nif len(sys.argv) != 3:\n    print ('usage %s: node_exponent edge_exponent' % sys.argv[0])\n    sys.exit(1)\n\nnnodes = 10**int(sys.argv[1])\nnedges = 10**int(sys.argv[2])\n\nstart_time = time.clock()\ndef timestamp(s):\n    t = time.gmtime(time.clock() - start_time)\n    print 'biggraph', s.ljust(24), time.strftime('%H:%M:%S', t)\n\ntimestamp('generate %d nodes' % nnodes)\nbg = pygraph.classes.graph.graph()\nbg.add_nodes(xrange(nnodes))\n\ntimestamp('generate %d edges' % nedges)\nedges = set()\nwhile len(edges) < nedges:\n    left, right = random.randrange(nnodes), random.randrange(nnodes)\n    if left == right:\n        continue\n    elif left > right:\n        left, right = right, left\n    edges.add((left, right))\n\ntimestamp('add edges')\nfor edge in edges:\n    bg.add_edge(edge)\n\ntimestamp("Dijkstra")\ntarget = 0\nspan, dist = pygraph.algorithms.minmax.shortest_path(bg, target)\ntimestamp('shortest_path done')\n\n# the paths from any node to target is in dict span, let's\n# pick any arbitrary node (the last one) and walk to the\n# target from there, the associated distance will decrease\n# monotonically\nlastnode = nnodes - 1\npath = []\nwhile lastnode != target:\n    nextnode = span[lastnode]\n    print 'step:', nextnode, dist[lastnode]\n    assert nextnode in bg.neighbors(lastnode)\n    path.append(lastnode)\n    lastnode = nextnode\npath.append(target)\ntimestamp('walk done')\nprint 'path:', path\n
\n soup wrap:

python-graph

added:

The comments made me curious as to how the performance of pygraph was for a problem on the order of the OP, so I made a toy program to find out. Here's the output for a slightly smaller version of the problem:

$ python2.6 biggraph.py 4 6
biggraph generate 10000 nodes     00:00:00
biggraph generate 1000000 edges   00:00:00
biggraph add edges                00:00:05
biggraph Dijkstra                 00:01:32
biggraph shortest_path done       00:04:15
step: 1915 2
step: 0 1
biggraph walk done                00:04:15
path: [9999, 1915, 0]

Not too bad for 10k nodes and 1M edges. It is important to note that the way Dijkstra's is computed by pygraph yields a dictionary of all spanning trees for each node relative to one target (which was arbitrarily node 0, and holds no privileged position in the graph). Therefore, the solution that took 3.75 minutes to compute actually yielded the answer to "what is the shortest path from all nodes to the target?". Indeed once shortest_path was done, walking the answer was mere dictionary lookups and took essentially no time. It is also worth noting that adding the pre-computed edges to the graph was rather expensive at ~1.5 minutes. These timings are consistent across multiple runs.

I'd like to say that the process scales well, but I'm still waiting on biggraph 5 6 on an otherwise idled computer (Athlon 64, 4800 BogoMIPS per processor, all in core) which has been running for over a quarter hour. At least the memory use is stable at about 0.5GB. And the results are in:

biggraph generate 100000 nodes    00:00:00
biggraph generate 1000000 edges   00:00:00
biggraph add edges                00:00:07
biggraph Dijkstra                 00:01:27
biggraph shortest_path done       00:23:44
step: 48437 4
step: 66200 3
step: 83824 2
step: 0 1
biggraph walk done                00:23:44
path: [99999, 48437, 66200, 83824, 0]

That's a long time, but it was also a heavy computation (and I really wish I'd pickled the result). Here's the code for the curious:

#!/usr/bin/python

import pygraph.classes.graph
import pygraph.algorithms
import pygraph.algorithms.minmax
import time
import random
import sys

if len(sys.argv) != 3:
    print ('usage %s: node_exponent edge_exponent' % sys.argv[0])
    sys.exit(1)

nnodes = 10**int(sys.argv[1])
nedges = 10**int(sys.argv[2])

start_time = time.clock()
def timestamp(s):
    t = time.gmtime(time.clock() - start_time)
    print 'biggraph', s.ljust(24), time.strftime('%H:%M:%S', t)

timestamp('generate %d nodes' % nnodes)
bg = pygraph.classes.graph.graph()
bg.add_nodes(xrange(nnodes))

timestamp('generate %d edges' % nedges)
edges = set()
while len(edges) < nedges:
    left, right = random.randrange(nnodes), random.randrange(nnodes)
    if left == right:
        continue
    elif left > right:
        left, right = right, left
    edges.add((left, right))

timestamp('add edges')
for edge in edges:
    bg.add_edge(edge)

timestamp("Dijkstra")
target = 0
span, dist = pygraph.algorithms.minmax.shortest_path(bg, target)
timestamp('shortest_path done')

# the paths from any node to target is in dict span, let's
# pick any arbitrary node (the last one) and walk to the
# target from there, the associated distance will decrease
# monotonically
lastnode = nnodes - 1
path = []
while lastnode != target:
    nextnode = span[lastnode]
    print 'step:', nextnode, dist[lastnode]
    assert nextnode in bg.neighbors(lastnode)
    path.append(lastnode)
    lastnode = nextnode
path.append(target)
timestamp('walk done')
print 'path:', path
qid & accept id: (3065624, 3066821) query: How to speed-up python nested loop? soup:

Update: (almost) completely vectorized version below in "new_function2"...

\n

I'll add comments to explain things in a bit.

\n

It gives a ~50x speedup, and a larger speedup is possible if you're okay with the output being numpy arrays instead of lists. As is:

\n
In [86]: %timeit new_function2(close, volume, INTERVAL_LENGTH)\n1 loops, best of 3: 1.15 s per loop\n
\n

You can replace your inner loop with a call to np.cumsum()... See my "new_function" function below. This gives a considerable speedup...

\n
In [61]: %timeit new_function(close, volume, INTERVAL_LENGTH)\n1 loops, best of 3: 15.7 s per loop\n
\n

vs

\n
In [62]: %timeit old_function(close, volume, INTERVAL_LENGTH)\n1 loops, best of 3: 53.1 s per loop\n
\n

It should be possible to vectorize the entire thing and avoid for loops entirely, though... Give me an minute, and I'll see what I can do...

\n
import numpy as np\n\nARRAY_LENGTH = 500000\nINTERVAL_LENGTH = 15\nclose = np.arange(ARRAY_LENGTH, dtype=np.float)\nvolume = np.arange(ARRAY_LENGTH, dtype=np.float)\n\ndef old_function(close, volume, INTERVAL_LENGTH):\n    results = []\n    for i in xrange(len(close) - INTERVAL_LENGTH):\n        for j in xrange(i+1, i+INTERVAL_LENGTH):\n            ret = close[j] / close[i]\n            vol = sum( volume[i+1:j+1] )\n            if (ret > 1.0001) and (ret < 1.5) and (vol > 100):\n                results.append( (i, j, ret, vol) )\n    return results\n\n\ndef new_function(close, volume, INTERVAL_LENGTH):\n    results = []\n    for i in xrange(close.size - INTERVAL_LENGTH):\n        vol = volume[i+1:i+INTERVAL_LENGTH].cumsum()\n        ret = close[i+1:i+INTERVAL_LENGTH] / close[i]\n\n        filter = (ret > 1.0001) & (ret < 1.5) & (vol > 100)\n        j = np.arange(i+1, i+INTERVAL_LENGTH)[filter]\n\n        tmp_results = zip(j.size * [i], j, ret[filter], vol[filter])\n        results.extend(tmp_results)\n    return results\n\ndef new_function2(close, volume, INTERVAL_LENGTH):\n    vol, ret = [], []\n    I, J = [], []\n    for k in xrange(1, INTERVAL_LENGTH):\n        start = k\n        end = volume.size - INTERVAL_LENGTH + k\n        vol.append(volume[start:end])\n        ret.append(close[start:end])\n        J.append(np.arange(start, end))\n        I.append(np.arange(volume.size - INTERVAL_LENGTH))\n\n    vol = np.vstack(vol)\n    ret = np.vstack(ret)\n    J = np.vstack(J)\n    I = np.vstack(I)\n\n    vol = vol.cumsum(axis=0)\n    ret = ret / close[:-INTERVAL_LENGTH]\n\n    filter = (ret > 1.0001) & (ret < 1.5) & (vol > 100)\n\n    vol = vol[filter]\n    ret = ret[filter]\n    I = I[filter]\n    J = J[filter]\n\n    output = zip(I.flat,J.flat,ret.flat,vol.flat)\n    return output\n\nresults = old_function(close, volume, INTERVAL_LENGTH)\nresults2 = new_function(close, volume, INTERVAL_LENGTH)\nresults3 = new_function(close, volume, INTERVAL_LENGTH)\n\n# Using sets to compare, as the output \n# is in a different order than the original function\nprint set(results) == set(results2)\nprint set(results) == set(results3)\n
\n soup wrap:

Update: (almost) completely vectorized version below in "new_function2"...

I'll add comments to explain things in a bit.

It gives a ~50x speedup, and a larger speedup is possible if you're okay with the output being numpy arrays instead of lists. As is:

In [86]: %timeit new_function2(close, volume, INTERVAL_LENGTH)
1 loops, best of 3: 1.15 s per loop

You can replace your inner loop with a call to np.cumsum()... See my "new_function" function below. This gives a considerable speedup...

In [61]: %timeit new_function(close, volume, INTERVAL_LENGTH)
1 loops, best of 3: 15.7 s per loop

vs

In [62]: %timeit old_function(close, volume, INTERVAL_LENGTH)
1 loops, best of 3: 53.1 s per loop

It should be possible to vectorize the entire thing and avoid for loops entirely, though... Give me an minute, and I'll see what I can do...

import numpy as np

ARRAY_LENGTH = 500000
INTERVAL_LENGTH = 15
close = np.arange(ARRAY_LENGTH, dtype=np.float)
volume = np.arange(ARRAY_LENGTH, dtype=np.float)

def old_function(close, volume, INTERVAL_LENGTH):
    results = []
    for i in xrange(len(close) - INTERVAL_LENGTH):
        for j in xrange(i+1, i+INTERVAL_LENGTH):
            ret = close[j] / close[i]
            vol = sum( volume[i+1:j+1] )
            if (ret > 1.0001) and (ret < 1.5) and (vol > 100):
                results.append( (i, j, ret, vol) )
    return results


def new_function(close, volume, INTERVAL_LENGTH):
    results = []
    for i in xrange(close.size - INTERVAL_LENGTH):
        vol = volume[i+1:i+INTERVAL_LENGTH].cumsum()
        ret = close[i+1:i+INTERVAL_LENGTH] / close[i]

        filter = (ret > 1.0001) & (ret < 1.5) & (vol > 100)
        j = np.arange(i+1, i+INTERVAL_LENGTH)[filter]

        tmp_results = zip(j.size * [i], j, ret[filter], vol[filter])
        results.extend(tmp_results)
    return results

def new_function2(close, volume, INTERVAL_LENGTH):
    vol, ret = [], []
    I, J = [], []
    for k in xrange(1, INTERVAL_LENGTH):
        start = k
        end = volume.size - INTERVAL_LENGTH + k
        vol.append(volume[start:end])
        ret.append(close[start:end])
        J.append(np.arange(start, end))
        I.append(np.arange(volume.size - INTERVAL_LENGTH))

    vol = np.vstack(vol)
    ret = np.vstack(ret)
    J = np.vstack(J)
    I = np.vstack(I)

    vol = vol.cumsum(axis=0)
    ret = ret / close[:-INTERVAL_LENGTH]

    filter = (ret > 1.0001) & (ret < 1.5) & (vol > 100)

    vol = vol[filter]
    ret = ret[filter]
    I = I[filter]
    J = J[filter]

    output = zip(I.flat,J.flat,ret.flat,vol.flat)
    return output

results = old_function(close, volume, INTERVAL_LENGTH)
results2 = new_function(close, volume, INTERVAL_LENGTH)
results3 = new_function(close, volume, INTERVAL_LENGTH)

# Using sets to compare, as the output 
# is in a different order than the original function
print set(results) == set(results2)
print set(results) == set(results3)
qid & accept id: (3068839, 3143120) query: PyGTK/GIO: monitor directory for changes recursively soup:

I'm not sure if GIO allows you to have more than one monitor at once, but if it does there's no* reason you can't do something like this:

\n
import gio\nimport os\n\ndef directory_changed(monitor, file1, file2, evt_type):\n    if os.path.isdir(file2):    #maybe this needs to be file1?\n        add_monitor(file2) \n    print "Changed:", file1, file2, evt_type\n\ndef add_monitor(dir):\n    gfile = gio.File(dir)\n    monitor = gfile.monitor_directory(gio.FILE_MONITOR_NONE, None)\n    monitor.connect("changed", directory_changed) \n\nadd_monitor('.')\n\nimport glib\nml = glib.MainLoop()\nml.run()\n
\n

*when I say no reason, there's the possibility that this could become a resource hog, though with nearly zero knowledge about GIO I couldn't really say. It's also entirely possible to roll your own in Python with a few commands (os.listdir among others). It might look something like this

\n
import time\nimport os\n\nclass Watcher(object):\n    def __init__(self):\n        self.dirs = []\n        self.snapshots = {}\n\n    def add_dir(self, dir):\n        self.dirs.append(dir)\n\n    def check_for_changes(self, dir):\n        snapshot = self.snapshots.get(dir)\n        curstate = os.listdir(dir)\n        if not snapshot:\n            self.snapshots[dir] = curstate\n        else:\n            if not snapshot == curstate:\n                print 'Changes: ',\n                for change in set(curstate).symmetric_difference(set(snapshot)):\n                    if os.path.isdir(change):\n                        print "isdir"\n                        self.add_dir(change)\n                    print change,\n\n                self.snapshots[dir] = curstate\n                print\n\n    def mainloop(self):\n        if len(self.dirs) < 1:\n            print "ERROR: Please add a directory with add_dir()"\n            return\n\n        while True:\n            for dir in self.dirs:\n                self.check_for_changes(dir)\n            time.sleep(4) # Don't want to be a resource hog\n\nw = Watcher()\nw.add_dir('.')\n\n\nw.mainloop()\n
\n soup wrap:

I'm not sure if GIO allows you to have more than one monitor at once, but if it does there's no* reason you can't do something like this:

import gio
import os

def directory_changed(monitor, file1, file2, evt_type):
    if os.path.isdir(file2):    #maybe this needs to be file1?
        add_monitor(file2) 
    print "Changed:", file1, file2, evt_type

def add_monitor(dir):
    gfile = gio.File(dir)
    monitor = gfile.monitor_directory(gio.FILE_MONITOR_NONE, None)
    monitor.connect("changed", directory_changed) 

add_monitor('.')

import glib
ml = glib.MainLoop()
ml.run()

*when I say no reason, there's the possibility that this could become a resource hog, though with nearly zero knowledge about GIO I couldn't really say. It's also entirely possible to roll your own in Python with a few commands (os.listdir among others). It might look something like this

import time
import os

class Watcher(object):
    def __init__(self):
        self.dirs = []
        self.snapshots = {}

    def add_dir(self, dir):
        self.dirs.append(dir)

    def check_for_changes(self, dir):
        snapshot = self.snapshots.get(dir)
        curstate = os.listdir(dir)
        if not snapshot:
            self.snapshots[dir] = curstate
        else:
            if not snapshot == curstate:
                print 'Changes: ',
                for change in set(curstate).symmetric_difference(set(snapshot)):
                    if os.path.isdir(change):
                        print "isdir"
                        self.add_dir(change)
                    print change,

                self.snapshots[dir] = curstate
                print

    def mainloop(self):
        if len(self.dirs) < 1:
            print "ERROR: Please add a directory with add_dir()"
            return

        while True:
            for dir in self.dirs:
                self.check_for_changes(dir)
            time.sleep(4) # Don't want to be a resource hog

w = Watcher()
w.add_dir('.')


w.mainloop()
qid & accept id: (3083583, 3083673) query: python help django navigation soup:

you have to do

\n
python manage.py --help\n
\n

Here is the django admin.py/manage.py doc http://docs.djangoproject.com/en/dev/ref/django-admin/

\n

To getting help in general in python you can use builtin help function e.g.

\n
>>> help('help')\n\nWelcome to Python 2.5!  This is the online help utility.\n....\n
\n soup wrap:

you have to do

python manage.py --help

Here is the django admin.py/manage.py doc http://docs.djangoproject.com/en/dev/ref/django-admin/

To getting help in general in python you can use builtin help function e.g.

>>> help('help')

Welcome to Python 2.5!  This is the online help utility.
....
qid & accept id: (3102098, 3102887) query: sound way to feed commands to twisted ssh after reactor.run() soup:

joefis' answer is basically sound, but I bet some examples would be helpful. First, there are a few ways you can have some code run right after the reactor starts.

\n

This one is pretty straightforward:

\n
def f():\n    print "the reactor is running now"\n\nreactor.callWhenRunning(f)\n
\n

Another way is to use timed events, although there's probably no reason to do it this way instead of using callWhenRunning:

\n
reactor.callLater(0, f)\n
\n

You can also use the underlying API which callWhenRunning is implemented in terms of:

\n
reactor.addSystemEventTrigger('after', 'startup', f)\n
\n

You can also use services. This is a bit more involved, since it involves using using twistd(1) (or something else that's going to hook the service system up to the reactor). But you can write a class like this:

\n
from twisted.application.service import Service\n\nclass ThingDoer(Service):\n    def startService(self):\n        print "The reactor is running now."\n
\n

And then write a .tac file like this:

\n
from twisted.application.service import Application\n\nfrom thatmodule import ThingDoer\n\napplication = Application("Do Things")\nThingDoer().setServiceParent(application)\n
\n

And finally, you can run this .tac file using twistd(1):

\n
$ twistd -ny thatfile.tac\n
\n

Of course, this only tells you how to do one thing after the reactor is running, which isn't exactly what you're asking. It's the same idea, though - you define some event handler and ask to receive an event by having that handler called; when it is called, you get to do stuff. The same idea applies to anything you do with Conch.

\n

You can see this in the Conch examples, for example in sshsimpleclient.py we have:

\n
class CatChannel(channel.SSHChannel):\n    name = 'session'\n\n    def openFailed(self, reason):\n        print 'echo failed', reason\n\n    def channelOpen(self, ignoredData):\n        self.data = ''\n        d = self.conn.sendRequest(self, 'exec', common.NS('cat'), wantReply = 1)\n        d.addCallback(self._cbRequest) \n\n    def _cbRequest(self, ignored):\n        self.write('hello conch\n')\n        self.conn.sendEOF(self)\n\n    def dataReceived(self, data):\n        self.data += data\n\n    def closed(self):\n        print 'got data from cat: %s' % repr(self.data)\n        self.loseConnection()\n        reactor.stop()\n
\n

In this example, channelOpen is the event handler called when a new channel is opened. It sends a request to the server. It gets back a Deferred, to which it attaches a callback. That callback is an event handler which will be called when the request succeeds (in this case, when cat has been executed). _cbRequest is the callback it attaches, and that method takes the next step - writing some bytes to the channel and then closing it. Then there's the dataReceived event handler, which is called when bytes are received over the chnanel, and the closed event handler, called when the channel is closed.

\n

So you can see four different event handlers here, some of which are starting operations that will eventually trigger a later event handler.

\n

So to get back to your question about doing one thing after another, if you wanted to open two cat channels, one after the other, then in the closed event handler could open a new channel (instead of stopping the reactor as it does in this example).

\n soup wrap:

joefis' answer is basically sound, but I bet some examples would be helpful. First, there are a few ways you can have some code run right after the reactor starts.

This one is pretty straightforward:

def f():
    print "the reactor is running now"

reactor.callWhenRunning(f)

Another way is to use timed events, although there's probably no reason to do it this way instead of using callWhenRunning:

reactor.callLater(0, f)

You can also use the underlying API which callWhenRunning is implemented in terms of:

reactor.addSystemEventTrigger('after', 'startup', f)

You can also use services. This is a bit more involved, since it involves using using twistd(1) (or something else that's going to hook the service system up to the reactor). But you can write a class like this:

from twisted.application.service import Service

class ThingDoer(Service):
    def startService(self):
        print "The reactor is running now."

And then write a .tac file like this:

from twisted.application.service import Application

from thatmodule import ThingDoer

application = Application("Do Things")
ThingDoer().setServiceParent(application)

And finally, you can run this .tac file using twistd(1):

$ twistd -ny thatfile.tac

Of course, this only tells you how to do one thing after the reactor is running, which isn't exactly what you're asking. It's the same idea, though - you define some event handler and ask to receive an event by having that handler called; when it is called, you get to do stuff. The same idea applies to anything you do with Conch.

You can see this in the Conch examples, for example in sshsimpleclient.py we have:

class CatChannel(channel.SSHChannel):
    name = 'session'

    def openFailed(self, reason):
        print 'echo failed', reason

    def channelOpen(self, ignoredData):
        self.data = ''
        d = self.conn.sendRequest(self, 'exec', common.NS('cat'), wantReply = 1)
        d.addCallback(self._cbRequest) 

    def _cbRequest(self, ignored):
        self.write('hello conch\n')
        self.conn.sendEOF(self)

    def dataReceived(self, data):
        self.data += data

    def closed(self):
        print 'got data from cat: %s' % repr(self.data)
        self.loseConnection()
        reactor.stop()

In this example, channelOpen is the event handler called when a new channel is opened. It sends a request to the server. It gets back a Deferred, to which it attaches a callback. That callback is an event handler which will be called when the request succeeds (in this case, when cat has been executed). _cbRequest is the callback it attaches, and that method takes the next step - writing some bytes to the channel and then closing it. Then there's the dataReceived event handler, which is called when bytes are received over the chnanel, and the closed event handler, called when the channel is closed.

So you can see four different event handlers here, some of which are starting operations that will eventually trigger a later event handler.

So to get back to your question about doing one thing after another, if you wanted to open two cat channels, one after the other, then in the closed event handler could open a new channel (instead of stopping the reactor as it does in this example).

qid & accept id: (3121979, 3121985) query: How to sort (list/tuple) of lists/tuples? soup:
sorted_by_second = sorted(data, key=lambda tup: tup[1])\n
\n

or:

\n
data.sort(key=lambda tup: tup[1])  # sorts in place\n
\n soup wrap:
sorted_by_second = sorted(data, key=lambda tup: tup[1])

or:

data.sort(key=lambda tup: tup[1])  # sorts in place
qid & accept id: (3145246, 3145496) query: How can I group objects by their date in Django? soup:

Here's a working example of ignacio's suggestion to use itertools.groupby.

\n
class Article(object):\n    def __init__(self, pub_date):\n        self.pub_date = pub_date\n\n\nif __name__ == '__main__':\n    from datetime import date\n    import itertools\n    import operator\n\n    # You'll use your Article query here instead:\n    # a_list = Article.objects.filter(pub_date__lte = date.today())\n    a_list = [\n        Article(date(2010, 1, 2)),\n        Article(date(2010, 2, 3)),\n        Article(date(2010, 1, 2)),\n        Article(date(2011, 3, 2)),\n    ]\n\n\n    keyfunc = operator.attrgetter('pub_date')\n\n    a_list = sorted(a_list, key = keyfunc)\n    group_list = [{ k.strftime('%Y-%m-%d') : list(g)} \n                  for k, g in itertools.groupby(a_list, keyfunc)]\n\n    print group_list\n
\n

Output:

\n
[{'2010-01-02': [<__main__.Article object at 0xb76c4fec>, <__main__.Article object at 0xb76c604c>]}, {'2010-02-03': [<__main__.Article object at 0xb76c602c>]}, {'2011-03-02': [<__main__.Article object at 0xb76c606c>]}]\n
\n soup wrap:

Here's a working example of ignacio's suggestion to use itertools.groupby.

class Article(object):
    def __init__(self, pub_date):
        self.pub_date = pub_date


if __name__ == '__main__':
    from datetime import date
    import itertools
    import operator

    # You'll use your Article query here instead:
    # a_list = Article.objects.filter(pub_date__lte = date.today())
    a_list = [
        Article(date(2010, 1, 2)),
        Article(date(2010, 2, 3)),
        Article(date(2010, 1, 2)),
        Article(date(2011, 3, 2)),
    ]


    keyfunc = operator.attrgetter('pub_date')

    a_list = sorted(a_list, key = keyfunc)
    group_list = [{ k.strftime('%Y-%m-%d') : list(g)} 
                  for k, g in itertools.groupby(a_list, keyfunc)]

    print group_list

Output:

[{'2010-01-02': [<__main__.Article object at 0xb76c4fec>, <__main__.Article object at 0xb76c604c>]}, {'2010-02-03': [<__main__.Article object at 0xb76c602c>]}, {'2011-03-02': [<__main__.Article object at 0xb76c606c>]}]
qid & accept id: (3187961, 3188040) query: Split field to array when accessed soup:

You can easily add an instance method to your Categories class like this:

\n
class Categories(models.Model):\n   ... rest of your definition ...\n\n   def get_spamwords_as_list(self):\n       return self.spamwords.split(',')\n
\n

You could use it like this:

\n
cat = Categories.objects.get(id=1)\nprint cat.get_spamwords_as_list()\n
\n

But I'm curious about your underlying data model -- why aren't you using a ManyToManyField to model your categories?

\n

UPDATE: Adding an alternative generic version:

\n
def get_word_list(self, name):\n    if name in ['keywords', 'spamwords', 'translations']:\n        return getattr(self, name).split(',')\n\n# or even\ndef __getattr__(self, name):\n    if name[-5:] == '_list' and name[:-5] in ['keywords', 'spamwords', 'translations']:\n        return getattr(self, name[:-5]).split(',')\n    else\n        raise AttributeError\n\ncat = Categories.get(pk=1)\ncat.get_word_list('keywords')  # ['word 1', 'word 2', ...]\ncat.keywords_list              # ['word 1', 'word 2', ...] with 2nd approach\ncat.keywords                   # 'word 1, word 2' -- remains CSV\n
\n soup wrap:

You can easily add an instance method to your Categories class like this:

class Categories(models.Model):
   ... rest of your definition ...

   def get_spamwords_as_list(self):
       return self.spamwords.split(',')

You could use it like this:

cat = Categories.objects.get(id=1)
print cat.get_spamwords_as_list()

But I'm curious about your underlying data model -- why aren't you using a ManyToManyField to model your categories?

UPDATE: Adding an alternative generic version:

def get_word_list(self, name):
    if name in ['keywords', 'spamwords', 'translations']:
        return getattr(self, name).split(',')

# or even
def __getattr__(self, name):
    if name[-5:] == '_list' and name[:-5] in ['keywords', 'spamwords', 'translations']:
        return getattr(self, name[:-5]).split(',')
    else
        raise AttributeError

cat = Categories.get(pk=1)
cat.get_word_list('keywords')  # ['word 1', 'word 2', ...]
cat.keywords_list              # ['word 1', 'word 2', ...] with 2nd approach
cat.keywords                   # 'word 1, word 2' -- remains CSV
qid & accept id: (3208076, 3208107) query: python: access multiple values in the value portion of a key:value pair soup:

Where pairs is your list of pairs:

\n
averages = [float(sum(values)) / len(values) for key, values in pairs]\n
\n

will give you a list of average values.

\n

If your numbers are strings, as in your example, replace sum(values) above with sum([int(i) for i in values]).

\n

EDIT: And if you rather want a dictionary then a list of averages:

\n
averages = dict([(key, float(sum(values)) / len(values)) for key, values in pairs])\n
\n soup wrap:

Where pairs is your list of pairs:

averages = [float(sum(values)) / len(values) for key, values in pairs]

will give you a list of average values.

If your numbers are strings, as in your example, replace sum(values) above with sum([int(i) for i in values]).

EDIT: And if you rather want a dictionary then a list of averages:

averages = dict([(key, float(sum(values)) / len(values)) for key, values in pairs])
qid & accept id: (3234114, 3234954) query: Python : match string inside double quotes and bracket soup:

If you want to get both Chinese phrases when there are two of them (as in adult and aircraft), you'll need to work harder. The code below is for Python 3.x.

\n
#coding: utf8\nimport re\ns = """“作為”(act) ,用於罪行或民事過失時,包括一連串作為、任何違法的不作為和一連串違法的不作為;\n    “行政上訴委員會”(Administrative Appeals Board) 指根據《行政上訴委員會條例》(第442章)設立的行政上訴委員會;(由1994年第6號第32條增補)\n    “成人”、“成年人”(adult)* 指年滿18歲的人;  (由1990年第32號第6條修訂)\n    “飛機”、“航空器”(aircraft) 指任何可憑空氣的反作用而在大氣中獲得支承力的機器;\n    “外籍人士”(alien) 指並非中國公民的人;  (由1998年第26號第4條增補)\n    “修訂”(amend) 包括廢除、增補或更改,亦指同時進行,或以同一條例或文書進行上述全部或其中任何事項;  (由1993年第89號第3條修訂)\n    “可逮捕的罪行”(arrestable offence) 指由法律規限固定刑罰的罪行,或根據、憑藉法例對犯者可處超過12個月監禁的罪行,亦指犯任何這類罪行的企圖;  (由1971年第30號第2條增補)\n    “《基本法》”(Basic Law) 指《中華人民共和國香港特別行政區基本法》;  (由1998年第26號第4條增補)\n    “行政長官”(Chief Executive) 指─"""\nfor zh1, zh2, en in re.findall(r"“([^”]*)”(?:、“([^”]*)”)?\((.*?)\)",s):\n    print(ascii((zh1, zh2, en)))\n
\n

resulting in:

\n
('\u4f5c\u70ba', '', 'act')\n('\u884c\u653f\u4e0a\u8a34\u59d4\u54e1\u6703', '', 'Administrative Appeals Board')\n('\u6210\u4eba', '\u6210\u5e74\u4eba', 'adult')\n('\u98db\u6a5f', '\u822a\u7a7a\u5668', 'aircraft')\n('\u5916\u7c4d\u4eba\u58eb', '', 'alien')\n('\u4fee\u8a02', '', 'amend')\n('\u53ef\u902e\u6355\u7684\u7f6a\u884c', '', 'arrestable offence')\n('\u300a\u57fa\u672c\u6cd5\u300b', '', 'Basic Law')\n('\u884c\u653f\u9577\u5b98', '', 'Chief Executive')\n
\n soup wrap:

If you want to get both Chinese phrases when there are two of them (as in adult and aircraft), you'll need to work harder. The code below is for Python 3.x.

#coding: utf8
import re
s = """“作為”(act) ,用於罪行或民事過失時,包括一連串作為、任何違法的不作為和一連串違法的不作為;
    “行政上訴委員會”(Administrative Appeals Board) 指根據《行政上訴委員會條例》(第442章)設立的行政上訴委員會;(由1994年第6號第32條增補)
    “成人”、“成年人”(adult)* 指年滿18歲的人;  (由1990年第32號第6條修訂)
    “飛機”、“航空器”(aircraft) 指任何可憑空氣的反作用而在大氣中獲得支承力的機器;
    “外籍人士”(alien) 指並非中國公民的人;  (由1998年第26號第4條增補)
    “修訂”(amend) 包括廢除、增補或更改,亦指同時進行,或以同一條例或文書進行上述全部或其中任何事項;  (由1993年第89號第3條修訂)
    “可逮捕的罪行”(arrestable offence) 指由法律規限固定刑罰的罪行,或根據、憑藉法例對犯者可處超過12個月監禁的罪行,亦指犯任何這類罪行的企圖;  (由1971年第30號第2條增補)
    “《基本法》”(Basic Law) 指《中華人民共和國香港特別行政區基本法》;  (由1998年第26號第4條增補)
    “行政長官”(Chief Executive) 指─"""
for zh1, zh2, en in re.findall(r"“([^”]*)”(?:、“([^”]*)”)?\((.*?)\)",s):
    print(ascii((zh1, zh2, en)))

resulting in:

('\u4f5c\u70ba', '', 'act')
('\u884c\u653f\u4e0a\u8a34\u59d4\u54e1\u6703', '', 'Administrative Appeals Board')
('\u6210\u4eba', '\u6210\u5e74\u4eba', 'adult')
('\u98db\u6a5f', '\u822a\u7a7a\u5668', 'aircraft')
('\u5916\u7c4d\u4eba\u58eb', '', 'alien')
('\u4fee\u8a02', '', 'amend')
('\u53ef\u902e\u6355\u7684\u7f6a\u884c', '', 'arrestable offence')
('\u300a\u57fa\u672c\u6cd5\u300b', '', 'Basic Law')
('\u884c\u653f\u9577\u5b98', '', 'Chief Executive')
qid & accept id: (3254713, 3254750) query: Generating passwords in Python 3.1.1 soup:

There is no "sha" algorithm. The sha1 algorithm is much stronger than md5, since md5 is completely broken. I believe there is an algorithm that takes microseconds to generate a collision.

\n

Sha1 has been considerably weakened by cryptanalysts, and the search is on for the next big thing, but it is still currently suitable for all but the most paranoid.

\n

With regard to their use in passwords, the purpose is to prevent discovery of the original password. So it doesn't really matter much that md5 collisions are trivial to generate, since a collision simply yields an alternate password that has the same md5 hash as the original password, it doesn't reveal the original password.

\n

Important note:

\n

Your version is missing an important component: the salt. This is a random string that is concatenated to the original password in order to generate the hash, and then concatenated to the hash itself for storage. The purpose is to ensure that users with the same password don't end up with the same stored hash.

\n
import random\n\nprint('Username: ' + os.environ['USER'])\npasswd = getpass('Password: ')\nsalt = ''.join(random.choice('BCDFGHJKLMNPQRSTVWXYZ') for range(4))\nh = hashlib.md5()\nh.update(salt)\nh.update(passwd.encode())\npasswd_encrypt = salt + h.hexdigest()\n
\n

You then verify the password by reusing the stored salt:

\n
passwd = getpass('Password: ')\nsalt = passwd_encrypt[:4]\nh = hashlib.md5()\nh.update(salt)\nh.update(passwd.encode())\nif passwd_encrypt != salt + h.hexdigest():\n    raise LoginFailed()\n
\n soup wrap:

There is no "sha" algorithm. The sha1 algorithm is much stronger than md5, since md5 is completely broken. I believe there is an algorithm that takes microseconds to generate a collision.

Sha1 has been considerably weakened by cryptanalysts, and the search is on for the next big thing, but it is still currently suitable for all but the most paranoid.

With regard to their use in passwords, the purpose is to prevent discovery of the original password. So it doesn't really matter much that md5 collisions are trivial to generate, since a collision simply yields an alternate password that has the same md5 hash as the original password, it doesn't reveal the original password.

Important note:

Your version is missing an important component: the salt. This is a random string that is concatenated to the original password in order to generate the hash, and then concatenated to the hash itself for storage. The purpose is to ensure that users with the same password don't end up with the same stored hash.

import random

print('Username: ' + os.environ['USER'])
passwd = getpass('Password: ')
salt = ''.join(random.choice('BCDFGHJKLMNPQRSTVWXYZ') for range(4))
h = hashlib.md5()
h.update(salt)
h.update(passwd.encode())
passwd_encrypt = salt + h.hexdigest()

You then verify the password by reusing the stored salt:

passwd = getpass('Password: ')
salt = passwd_encrypt[:4]
h = hashlib.md5()
h.update(salt)
h.update(passwd.encode())
if passwd_encrypt != salt + h.hexdigest():
    raise LoginFailed()
qid & accept id: (3257619, 3259971) query: Numpy interconversion between multidimensional and linear indexing soup:

Although I very much like EOL's answer, I wanted to generalize it a bit for non-uniform numbers of bins along each direction, and also to highlight the differences between C and F styles of ordering. Here is an example solution:

\n
ndims = 5\nN = 10\n\n# Define bin boundaries \nbinbnds = ndims*[None]\nnbins = []\nfor idim in xrange(ndims):\n    binbnds[idim] = numpy.linspace(-10.0,10.0,numpy.random.randint(2,15))\n    binbnds[idim][0] = -float('inf')\n    binbnds[idim][-1] = float('inf')\n    nbins.append(binbnds[idim].shape[0]-1)\n\nnstates = numpy.cumprod(nbins)[-1]\n\n# Define variable values for N particles in ndims dimensions\np = numpy.random.normal(size=(N,ndims))\n\n# Assign to bins along each dimension\nbinassign = ndims*[None]\nfor idim in xrange(ndims):\n    binassign[idim] = numpy.digitize(p[:,idim],binbnds[idim]) - 1\n\nbinassign = numpy.array(binassign)\n\n# multidimensional array with elements mapping from multidim to linear index\n# Two different arrays for C vs F ordering\nlinind_C = numpy.arange(nstates).reshape(nbins,order='C')\nlinind_F = numpy.arange(nstates).reshape(nbins,order='F')\n
\n

and now make the conversion

\n
# Fast conversion to linear index\nb_F = numpy.cumprod([1] + nbins)[:-1]\nb_C = numpy.cumprod([1] + nbins[::-1])[:-1][::-1]\n\nbox_index_F = numpy.dot(b_F,binassign)\nbox_index_C = numpy.dot(b_C,binassign)\n
\n

and to check for correctness:

\n
# Check\nprint 'Checking correct mapping for each particle F order'\nfor k in xrange(N):\n    ii = box_index_F[k]\n    jj = linind_F[tuple(binassign[:,k])]\n    print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)\n\nprint 'Checking correct mapping for each particle C order'\nfor k in xrange(N):\n    ii = box_index_C[k]\n    jj = linind_C[tuple(binassign[:,k])]\n    print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)\n
\n

And for completeness, if you want to go back from the 1d index to the multidimensional index in a fast, vectorized-style way:

\n
print 'Convert C-style from linear to multi'\nx = box_index_C.reshape(-1,1)\nbassign_rev_C = x / b_C % nbins \n\nprint 'Convert F-style from linear to multi'\nx = box_index_F.reshape(-1,1)\nbassign_rev_F = x / b_F % nbins\n
\n

and again to check:

\n
print 'Check C-order'\nfor k in xrange(N):\n    ii = tuple(binassign[:,k])\n    jj = tuple(bassign_rev_C[k,:])\n    print ii==jj,ii,jj\n\nprint 'Check F-order'\nfor k in xrange(N):\n    ii = tuple(binassign[:,k])\n    jj = tuple(bassign_rev_F[k,:])\n    print ii==jj,ii,jj \n
\n soup wrap:

Although I very much like EOL's answer, I wanted to generalize it a bit for non-uniform numbers of bins along each direction, and also to highlight the differences between C and F styles of ordering. Here is an example solution:

ndims = 5
N = 10

# Define bin boundaries 
binbnds = ndims*[None]
nbins = []
for idim in xrange(ndims):
    binbnds[idim] = numpy.linspace(-10.0,10.0,numpy.random.randint(2,15))
    binbnds[idim][0] = -float('inf')
    binbnds[idim][-1] = float('inf')
    nbins.append(binbnds[idim].shape[0]-1)

nstates = numpy.cumprod(nbins)[-1]

# Define variable values for N particles in ndims dimensions
p = numpy.random.normal(size=(N,ndims))

# Assign to bins along each dimension
binassign = ndims*[None]
for idim in xrange(ndims):
    binassign[idim] = numpy.digitize(p[:,idim],binbnds[idim]) - 1

binassign = numpy.array(binassign)

# multidimensional array with elements mapping from multidim to linear index
# Two different arrays for C vs F ordering
linind_C = numpy.arange(nstates).reshape(nbins,order='C')
linind_F = numpy.arange(nstates).reshape(nbins,order='F')

and now make the conversion

# Fast conversion to linear index
b_F = numpy.cumprod([1] + nbins)[:-1]
b_C = numpy.cumprod([1] + nbins[::-1])[:-1][::-1]

box_index_F = numpy.dot(b_F,binassign)
box_index_C = numpy.dot(b_C,binassign)

and to check for correctness:

# Check
print 'Checking correct mapping for each particle F order'
for k in xrange(N):
    ii = box_index_F[k]
    jj = linind_F[tuple(binassign[:,k])]
    print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)

print 'Checking correct mapping for each particle C order'
for k in xrange(N):
    ii = box_index_C[k]
    jj = linind_C[tuple(binassign[:,k])]
    print 'particle %d %s (%d %d)' % (k,ii == jj,ii,jj)

And for completeness, if you want to go back from the 1d index to the multidimensional index in a fast, vectorized-style way:

print 'Convert C-style from linear to multi'
x = box_index_C.reshape(-1,1)
bassign_rev_C = x / b_C % nbins 

print 'Convert F-style from linear to multi'
x = box_index_F.reshape(-1,1)
bassign_rev_F = x / b_F % nbins

and again to check:

print 'Check C-order'
for k in xrange(N):
    ii = tuple(binassign[:,k])
    jj = tuple(bassign_rev_C[k,:])
    print ii==jj,ii,jj

print 'Check F-order'
for k in xrange(N):
    ii = tuple(binassign[:,k])
    jj = tuple(bassign_rev_F[k,:])
    print ii==jj,ii,jj 
qid & accept id: (3277047, 3277336) query: Implementing class descriptors by subclassing the `type` class soup:

It is convention (usually), for a descriptor, when accessed on a class, to return the descriptor object itself. This is what property does; if you access a property object on a class, you get the property object back (because that's what it's __get__ method chooses to do). But that's a convention; you don't have to do it that way.

\n

So, if you only need to have a getter descriptor on your class, and you don't mind that a an attempt to set will overwrite the descriptor, you can do something like this with no metaclass programming:

\n
def classproperty_getter_only(f):\n    class NonDataDescriptor(object):\n        def __get__(self, instance, icls):\n            return f(icls)\n    return NonDataDescriptor()\n\nclass Foo(object):\n\n    @classproperty_getter_only\n    def flup(cls):\n        return 'hello from', cls\n\nprint Foo.flup\nprint Foo().flup\n
\n

for

\n
('hello from', )\n('hello from', )\n
\n

If you want a full fledged data descriptor, or want to use the built-in property object, then you're right you can use a metaclass and put it there (realizing that this attribute will be totally invisible from instances of your class; metaclasses are not examined when doing attribute lookup on an instance of a class).

\n

Is it advisable? I don't think so. I wouldn't do what you're describing casually in production code; I would only consider it if I had a very compelling reason to do so (and I can't think of such a scenario off the top of my head). Metaclasses are very powerful, but they aren't well understood by all programmers, and are somewhat harder to reason about, so their use makes your code harder to maintain. I think this sort of design would be frowned upon by the python community at large.

\n soup wrap:

It is convention (usually), for a descriptor, when accessed on a class, to return the descriptor object itself. This is what property does; if you access a property object on a class, you get the property object back (because that's what it's __get__ method chooses to do). But that's a convention; you don't have to do it that way.

So, if you only need to have a getter descriptor on your class, and you don't mind that a an attempt to set will overwrite the descriptor, you can do something like this with no metaclass programming:

def classproperty_getter_only(f):
    class NonDataDescriptor(object):
        def __get__(self, instance, icls):
            return f(icls)
    return NonDataDescriptor()

class Foo(object):

    @classproperty_getter_only
    def flup(cls):
        return 'hello from', cls

print Foo.flup
print Foo().flup

for

('hello from', )
('hello from', )

If you want a full fledged data descriptor, or want to use the built-in property object, then you're right you can use a metaclass and put it there (realizing that this attribute will be totally invisible from instances of your class; metaclasses are not examined when doing attribute lookup on an instance of a class).

Is it advisable? I don't think so. I wouldn't do what you're describing casually in production code; I would only consider it if I had a very compelling reason to do so (and I can't think of such a scenario off the top of my head). Metaclasses are very powerful, but they aren't well understood by all programmers, and are somewhat harder to reason about, so their use makes your code harder to maintain. I think this sort of design would be frowned upon by the python community at large.

qid & accept id: (3306189, 7077371) query: Using TCL extensions to set native window style in Tkinter soup:

You can do this using a combination of the Python win32 api packages and Tkinter. What you need to know is that a Tk window is the client section of a Win32 window. The window manager interactions are handled using a wrapper that is the parent of Tk window itself. If you have a Tkinter window 'w' then you can create a PyWin32 window for the frame or just manipulate it directly. You can get the frame hwnd using w.wm_frame() and parsing the hex string returned or by using GetParent on the winfo_id value from the Tk window (although wm_frame is likely to be more reliable).

\n
import string, win32ui, win32con\nfrom Tkinter import *\nw = Tk()\nframe = win32ui.CreateWindowFromHandle(string.atoi(w.wm_frame(), 0))\nframe.ModifyStyle(win32con.WS_CAPTION, 0, win32con.SWP_FRAMECHANGED)\n
\n

This removes the WS_CAPTION style and notifies the window that its frame is modified which forces a geometry recalculation so that the change propagates to the Tk child window.

\n

EDIT ---\nThe following arranges to ensure we modify the window style after the window has been fully created and mapped to the display.

\n
import string, win32ui, win32con\nfrom Tkinter import *\n\ndef decaption(event):\n    w = event.widget\n    frame = win32ui.CreateWindowFromHandle(string.atoi(w.wm_frame(), 0))\n    frame.ModifyStyle(win32con.WS_CAPTION, 0, win32con.SWP_FRAMECHANGED)\n    w.bind("", None)\n\nroot = Tk()\nroot.bind("", decaption)\nroot.mainloop()\n
\n soup wrap:

You can do this using a combination of the Python win32 api packages and Tkinter. What you need to know is that a Tk window is the client section of a Win32 window. The window manager interactions are handled using a wrapper that is the parent of Tk window itself. If you have a Tkinter window 'w' then you can create a PyWin32 window for the frame or just manipulate it directly. You can get the frame hwnd using w.wm_frame() and parsing the hex string returned or by using GetParent on the winfo_id value from the Tk window (although wm_frame is likely to be more reliable).

import string, win32ui, win32con
from Tkinter import *
w = Tk()
frame = win32ui.CreateWindowFromHandle(string.atoi(w.wm_frame(), 0))
frame.ModifyStyle(win32con.WS_CAPTION, 0, win32con.SWP_FRAMECHANGED)

This removes the WS_CAPTION style and notifies the window that its frame is modified which forces a geometry recalculation so that the change propagates to the Tk child window.

EDIT --- The following arranges to ensure we modify the window style after the window has been fully created and mapped to the display.

import string, win32ui, win32con
from Tkinter import *

def decaption(event):
    w = event.widget
    frame = win32ui.CreateWindowFromHandle(string.atoi(w.wm_frame(), 0))
    frame.ModifyStyle(win32con.WS_CAPTION, 0, win32con.SWP_FRAMECHANGED)
    w.bind("", None)

root = Tk()
root.bind("", decaption)
root.mainloop()
qid & accept id: (3337512, 3337531) query: setDefault for Nested dictionary in python soup:

Assuming self.table is a dict, you could use

\n
self.table.setdefault(field,0)\n
\n

The rest are all similar. Note that if self.table already has a key field, then the value associated with that key is returned. Only if there is no key field is self.table[field] set to 0.

\n

Edit: Perhaps this is closer to what you want:

\n
import collections\nclass Foo(object):\n    def __init__(self):\n        self.CompleteAnalysis=collections.defaultdict(\n            lambda: collections.defaultdict(list))\n\n    def getFilledFields(self,sentence):\n        field, field_value, field_date = sentence.split('|')\n        field_value = field_value.strip('\n')\n        field_date = field_date.strip('\n')\n        self.CompleteAnalysis[field]['date'].append(field_date)\n        self.CompleteAnalysis[field]['value'].append(field_value) \n\nfoo=Foo()\nfoo.getFilledFields('A|1|2000-1-1')\nfoo.getFilledFields('A|2|2000-1-2')\nprint(foo.CompleteAnalysis['A']['date'])\n# ['2000-1-1', '2000-1-2']\n\nprint(foo.CompleteAnalysis['A']['value'])\n# ['1', '2']\n
\n

Instead of keeping track of the count, perhaps just take the length of the list:

\n
print(len(foo.CompleteAnalysis['A']['value']))\n# 2\n
\n soup wrap:

Assuming self.table is a dict, you could use

self.table.setdefault(field,0)

The rest are all similar. Note that if self.table already has a key field, then the value associated with that key is returned. Only if there is no key field is self.table[field] set to 0.

Edit: Perhaps this is closer to what you want:

import collections
class Foo(object):
    def __init__(self):
        self.CompleteAnalysis=collections.defaultdict(
            lambda: collections.defaultdict(list))

    def getFilledFields(self,sentence):
        field, field_value, field_date = sentence.split('|')
        field_value = field_value.strip('\n')
        field_date = field_date.strip('\n')
        self.CompleteAnalysis[field]['date'].append(field_date)
        self.CompleteAnalysis[field]['value'].append(field_value) 

foo=Foo()
foo.getFilledFields('A|1|2000-1-1')
foo.getFilledFields('A|2|2000-1-2')
print(foo.CompleteAnalysis['A']['date'])
# ['2000-1-1', '2000-1-2']

print(foo.CompleteAnalysis['A']['value'])
# ['1', '2']

Instead of keeping track of the count, perhaps just take the length of the list:

print(len(foo.CompleteAnalysis['A']['value']))
# 2
qid & accept id: (3375374, 3381491) query: How do I delete an object in a django relation (While keeping all related objects)? soup:

The code given is correct. My problem when asking the question was a typo in my implementation.

\n

shame on me

\n

well... there is still a bit that could be improved on:

\n
more=Many.objects.filter(one=one)\nfor m in more\n    m.one=None\n    m.save()\n#and finally:\none.delete()\n
\n

can be written as:

\n
for m in one.many_set.all()\n    m.one=None\n    m.save()\none.delete()\n
\n

which is equivalent to:

\n
one.many_set.clear()\none.delete()\n
\n soup wrap:

The code given is correct. My problem when asking the question was a typo in my implementation.

shame on me

well... there is still a bit that could be improved on:

more=Many.objects.filter(one=one)
for m in more
    m.one=None
    m.save()
#and finally:
one.delete()

can be written as:

for m in one.many_set.all()
    m.one=None
    m.save()
one.delete()

which is equivalent to:

one.many_set.clear()
one.delete()
qid & accept id: (3387691, 3387975) query: How to "perfectly" override a dict? soup:

You can write an object that behaves like a dict quite easily with ABCs\n(Abstract Base Classes) from the collections module. It even tells you\nif you missed a method, so below is the minimal version that shuts the ABC up.

\n
import collections\n\n\nclass TransformedDict(collections.MutableMapping):\n    """A dictionary that applies an arbitrary key-altering\n       function before accessing the keys"""\n\n    def __init__(self, *args, **kwargs):\n        self.store = dict()\n        self.update(dict(*args, **kwargs))  # use the free update to set keys\n\n    def __getitem__(self, key):\n        return self.store[self.__keytransform__(key)]\n\n    def __setitem__(self, key, value):\n        self.store[self.__keytransform__(key)] = value\n\n    def __delitem__(self, key):\n        del self.store[self.__keytransform__(key)]\n\n    def __iter__(self):\n        return iter(self.store)\n\n    def __len__(self):\n        return len(self.store)\n\n    def __keytransform__(self, key):\n        return key\n
\n

You get a few free methods from the ABC:

\n
class MyTransformedDict(TransformedDict):\n\n    def __keytransform__(self, key):\n        return key.lower()\n\n\ns = MyTransformedDict([('Test', 'test')])\n\nassert s.get('TEST') is s['test']   # free get\nassert 'TeSt' in s                  # free __contains__\n                                    # free setdefault, __eq__, and so on\n\nimport pickle\nassert pickle.loads(pickle.dumps(s)) == s\n                                    # works too since we just use a normal dict\n
\n

I wouldn't subclass dict (or other builtins) directly. It often makes no sense, because what you actually want to do is implement the interface of a dict. And that is exactly what ABCs are for.

\n soup wrap:

You can write an object that behaves like a dict quite easily with ABCs (Abstract Base Classes) from the collections module. It even tells you if you missed a method, so below is the minimal version that shuts the ABC up.

import collections


class TransformedDict(collections.MutableMapping):
    """A dictionary that applies an arbitrary key-altering
       function before accessing the keys"""

    def __init__(self, *args, **kwargs):
        self.store = dict()
        self.update(dict(*args, **kwargs))  # use the free update to set keys

    def __getitem__(self, key):
        return self.store[self.__keytransform__(key)]

    def __setitem__(self, key, value):
        self.store[self.__keytransform__(key)] = value

    def __delitem__(self, key):
        del self.store[self.__keytransform__(key)]

    def __iter__(self):
        return iter(self.store)

    def __len__(self):
        return len(self.store)

    def __keytransform__(self, key):
        return key

You get a few free methods from the ABC:

class MyTransformedDict(TransformedDict):

    def __keytransform__(self, key):
        return key.lower()


s = MyTransformedDict([('Test', 'test')])

assert s.get('TEST') is s['test']   # free get
assert 'TeSt' in s                  # free __contains__
                                    # free setdefault, __eq__, and so on

import pickle
assert pickle.loads(pickle.dumps(s)) == s
                                    # works too since we just use a normal dict

I wouldn't subclass dict (or other builtins) directly. It often makes no sense, because what you actually want to do is implement the interface of a dict. And that is exactly what ABCs are for.

qid & accept id: (3458542, 3459948) query: Multiple drag and drop in PyQt4 soup:

Here's a full working example:

\n
from PyQt4 import QtCore, QtGui, Qt\nimport cPickle\nimport pickle\n
\n

Why are you using cPickle as well as pickle?

\n
class DragTable(QtGui.QTableView):\n    def __init__(self, parent = None):\n        super(DragTable, self).__init__(parent)\n        self.setDragEnabled(True)\n        self.setSelectionBehavior(QtGui.QAbstractItemView.SelectRows)\n
\n

You probably want to set the selection behavior here, because I'm assuming row-based data presentation. You may of course change that.

\n
    def dragEnterEvent(self, event):\n        if event.mimeData().hasFormat("application/pubmedrecord"):\n            event.setDropAction(Qt.MoveAction)\n            event.accept()\n        else:\n            event.ignore()\n\n    def startDrag(self, event):\n
\n

Your code assumes only one index here, based on the event position. For a QTableView, this is unnecessary, as it already handles the mouse click itself. Instead, it's better to depend on Qt to provide you with the information that you actually need, as always. Here, I've chose to use selectedIndexes().

\n
        indices = self.selectedIndexes()\n
\n

Indices is now a list of QModelIndex instances, that I chose to convert to a set of row numbers. It's also possible to convert these to a list of QPersistentModelIndexes, depending on your needs.

\n

One thing that may surprise you here, is that indices contains indexes for all cells in the table, not all rows, regardless of the selection behavior. That's why I chose to use a set instead of a list.

\n
        selected = set()\n        for index in indices:\n            selected.add(index.row())\n
\n

I left the rest untouched, assuming that you know what you're doing there.

\n
        bstream = cPickle.dumps(selected)\n        mimeData = QtCore.QMimeData()\n        mimeData.setData("application/pubmedrecord", bstream)\n        drag = QtGui.QDrag(self)\n        drag.setMimeData(mimeData)\n        pixmap = QtGui.QPixmap(":/drag.png")\n\n        drag.setHotSpot(QtCore.QPoint(pixmap.width()/3, pixmap.height()/3))\n        drag.setPixmap(pixmap)\n        result = drag.start(QtCore.Qt.MoveAction)\n\n    def mouseMoveEvent(self, event):\n        self.startDrag(event)\n\n\nclass TagLabel(QtGui.QLabel):\n    def __init__(self, text, color, parent = None):\n        super(TagLabel, self).__init__(parent)\n        self.tagColor = color\n        self.setText(text)\n        self.setStyleSheet("QLabel { background-color: %s; font-size: 14pt; }" % self.tagColor)\n        self.defaultStyle = self.styleSheet()\n        self.setAlignment(QtCore.Qt.AlignHCenter|QtCore.Qt.AlignVCenter)\n        self.setAcceptDrops(True)\n\n    def dragEnterEvent(self, event):\n        if event.mimeData().hasFormat("application/pubmedrecord"):\n            self.set_bg(True)\n            event.accept()\n        else:\n            event.reject()\n\n    def dragLeaveEvent(self, event):\n        self.set_bg(False)\n        event.accept()\n\n    def dropEvent(self, event):\n        self.set_bg(False)\n        data = event.mimeData()\n        bstream = data.retrieveData("application/pubmedrecord", QtCore.QVariant.ByteArray)\n        selected = pickle.loads(bstream.toByteArray())\n        event.accept()\n        self.emit(QtCore.SIGNAL("dropAccepted(PyQt_PyObject)"), (selected, str(self.text()), str(self.tagColor)))\n
\n

Unless you are interfacing with C++-code with this signal, it's not necessary to add a signal argument here, you may also use dropAccepted without parentheses and PyQt4 will do the right thing.

\n
    def set_bg(self, active = False):\n        if active:\n            style = "QLabel {background: yellow; font-size: 14pt;}"\n            self.setStyleSheet(style)\n        else:\n            self.setStyleSheet(self.defaultStyle)\n\n\n\napp = QtGui.QApplication([])\n\nl = TagLabel("bla bla bla bla bla bla bla", "red")\nl.show()\n\nm = QtGui.QStandardItemModel()\nfor _ in xrange(4):\n    m.appendRow([QtGui.QStandardItem(x) for x in ["aap", "noot", "mies"]])\n\nt = DragTable()\nt.setModel(m)\nt.show()\n\ndef h(o):\n    print "signal handled", o\nl.connect(l, QtCore.SIGNAL("dropAccepted(PyQt_PyObject)"), h)\n\napp.exec_()\n
\n soup wrap:

Here's a full working example:

from PyQt4 import QtCore, QtGui, Qt
import cPickle
import pickle

Why are you using cPickle as well as pickle?

class DragTable(QtGui.QTableView):
    def __init__(self, parent = None):
        super(DragTable, self).__init__(parent)
        self.setDragEnabled(True)
        self.setSelectionBehavior(QtGui.QAbstractItemView.SelectRows)

You probably want to set the selection behavior here, because I'm assuming row-based data presentation. You may of course change that.

    def dragEnterEvent(self, event):
        if event.mimeData().hasFormat("application/pubmedrecord"):
            event.setDropAction(Qt.MoveAction)
            event.accept()
        else:
            event.ignore()

    def startDrag(self, event):

Your code assumes only one index here, based on the event position. For a QTableView, this is unnecessary, as it already handles the mouse click itself. Instead, it's better to depend on Qt to provide you with the information that you actually need, as always. Here, I've chose to use selectedIndexes().

        indices = self.selectedIndexes()

Indices is now a list of QModelIndex instances, that I chose to convert to a set of row numbers. It's also possible to convert these to a list of QPersistentModelIndexes, depending on your needs.

One thing that may surprise you here, is that indices contains indexes for all cells in the table, not all rows, regardless of the selection behavior. That's why I chose to use a set instead of a list.

        selected = set()
        for index in indices:
            selected.add(index.row())

I left the rest untouched, assuming that you know what you're doing there.

        bstream = cPickle.dumps(selected)
        mimeData = QtCore.QMimeData()
        mimeData.setData("application/pubmedrecord", bstream)
        drag = QtGui.QDrag(self)
        drag.setMimeData(mimeData)
        pixmap = QtGui.QPixmap(":/drag.png")

        drag.setHotSpot(QtCore.QPoint(pixmap.width()/3, pixmap.height()/3))
        drag.setPixmap(pixmap)
        result = drag.start(QtCore.Qt.MoveAction)

    def mouseMoveEvent(self, event):
        self.startDrag(event)


class TagLabel(QtGui.QLabel):
    def __init__(self, text, color, parent = None):
        super(TagLabel, self).__init__(parent)
        self.tagColor = color
        self.setText(text)
        self.setStyleSheet("QLabel { background-color: %s; font-size: 14pt; }" % self.tagColor)
        self.defaultStyle = self.styleSheet()
        self.setAlignment(QtCore.Qt.AlignHCenter|QtCore.Qt.AlignVCenter)
        self.setAcceptDrops(True)

    def dragEnterEvent(self, event):
        if event.mimeData().hasFormat("application/pubmedrecord"):
            self.set_bg(True)
            event.accept()
        else:
            event.reject()

    def dragLeaveEvent(self, event):
        self.set_bg(False)
        event.accept()

    def dropEvent(self, event):
        self.set_bg(False)
        data = event.mimeData()
        bstream = data.retrieveData("application/pubmedrecord", QtCore.QVariant.ByteArray)
        selected = pickle.loads(bstream.toByteArray())
        event.accept()
        self.emit(QtCore.SIGNAL("dropAccepted(PyQt_PyObject)"), (selected, str(self.text()), str(self.tagColor)))

Unless you are interfacing with C++-code with this signal, it's not necessary to add a signal argument here, you may also use dropAccepted without parentheses and PyQt4 will do the right thing.

    def set_bg(self, active = False):
        if active:
            style = "QLabel {background: yellow; font-size: 14pt;}"
            self.setStyleSheet(style)
        else:
            self.setStyleSheet(self.defaultStyle)



app = QtGui.QApplication([])

l = TagLabel("bla bla bla bla bla bla bla", "red")
l.show()

m = QtGui.QStandardItemModel()
for _ in xrange(4):
    m.appendRow([QtGui.QStandardItem(x) for x in ["aap", "noot", "mies"]])

t = DragTable()
t.setModel(m)
t.show()

def h(o):
    print "signal handled", o
l.connect(l, QtCore.SIGNAL("dropAccepted(PyQt_PyObject)"), h)

app.exec_()
qid & accept id: (3495524, 3495654) query: sqlite SQL query for unprocessed rows soup:

I don't understand if you consider a match based on value1 columns matching, or a combination of all three columns...

\n

Using EXISTS to find those that are already present:

\n
SELECT *\n  FROM TABLE_A a\n WHERE EXISTS(SELECT NULL\n                FROM TABLE_A$foo f\n               WHERE a.id = f.id\n                 AND a.value1 = f.value1\n                 AND a.value2 = f.value2)\n
\n

Using EXISTS to find those that are not present:

\n
SELECT *\n  FROM TABLE_A a\n WHERE NOT EXISTS(SELECT NULL\n                    FROM TABLE_A$foo f\n                   WHERE a.id = f.id\n                     AND a.value1 = f.value1\n                     AND a.value2 = f.value2)\n
\n soup wrap:

I don't understand if you consider a match based on value1 columns matching, or a combination of all three columns...

Using EXISTS to find those that are already present:

SELECT *
  FROM TABLE_A a
 WHERE EXISTS(SELECT NULL
                FROM TABLE_A$foo f
               WHERE a.id = f.id
                 AND a.value1 = f.value1
                 AND a.value2 = f.value2)

Using EXISTS to find those that are not present:

SELECT *
  FROM TABLE_A a
 WHERE NOT EXISTS(SELECT NULL
                    FROM TABLE_A$foo f
                   WHERE a.id = f.id
                     AND a.value1 = f.value1
                     AND a.value2 = f.value2)
qid & accept id: (3575359, 3575510) query: Extracting Text from Parsed HTML with Python soup:

BeautifulSoup could also extract node values from your html.

\n
from BeautifulSoup import BeautifulSoup\n\nhtml = ('Page title'\n       ''\n       ''\n       ''\n       ''\n       ''\n       '
Slackware Linux 13.0 [x86 DVD ISO]Slackware Linux 14.0 [x86 DVD ISO]Slackware Linux 15.0 [x86 DVD ISO]
'\n 'body'\n '')\nsoup = BeautifulSoup(html)\nlinks = [td.find('a') for td in soup.findAll('td', { "class" : "name" })]\nfor link in links:\n print link.string\n
\n

Output:

\n
Slackware Linux 13.0 [x86 DVD ISO]  \nSlackware Linux 14.0 [x86 DVD ISO]  \nSlackware Linux 15.0 [x86 DVD ISO]  \n
\n soup wrap:

BeautifulSoup could also extract node values from your html.

from BeautifulSoup import BeautifulSoup

html = ('Page title'
       ''
       ''
       ''
       ''
       ''
       '
Slackware Linux 13.0 [x86 DVD ISO]Slackware Linux 14.0 [x86 DVD ISO]Slackware Linux 15.0 [x86 DVD ISO]
' 'body' '') soup = BeautifulSoup(html) links = [td.find('a') for td in soup.findAll('td', { "class" : "name" })] for link in links: print link.string

Output:

Slackware Linux 13.0 [x86 DVD ISO]  
Slackware Linux 14.0 [x86 DVD ISO]  
Slackware Linux 15.0 [x86 DVD ISO]  
qid & accept id: (3576512, 3607397) query: Abort a running task in Celery within django soup:

apply_async returns an AsyncResult instance, or in this case an AbortableAsyncResult. Save the task_id and use that to instantiate a new AbortableAsyncResult later, making sure you supply the backend optional argument if you're not using the default_backend.

\n
abortable_async_result = AsyncBoot.apply_async(args=[name], name=name, connect_timeout=3)\nmyTaskId = abortable_async_result.task_id\n
\n

Later:

\n
abortable_async_result = AbortableAsyncResult(myTaskId)\nabortable_async_result.abort()\n
\n soup wrap:

apply_async returns an AsyncResult instance, or in this case an AbortableAsyncResult. Save the task_id and use that to instantiate a new AbortableAsyncResult later, making sure you supply the backend optional argument if you're not using the default_backend.

abortable_async_result = AsyncBoot.apply_async(args=[name], name=name, connect_timeout=3)
myTaskId = abortable_async_result.task_id

Later:

abortable_async_result = AbortableAsyncResult(myTaskId)
abortable_async_result.abort()
qid & accept id: (3637419, 3638091) query: Multiple Database Config in Django 1.2 soup:

Yeah, it is a little bit complicated.

\n

There are a number of ways you could implement it. Basically, you need some way of indicating which models are associated with which database.

\n

First option

\n

Here's the code that I use; hope it helps.

\n
from django.db import connections\n\nclass DBRouter(object):\n    """A router to control all database operations on models in\n    the contrib.auth application"""\n\n    def db_for_read(self, model, **hints):\n        m = model.__module__.split('.')\n        try:\n            d = m[-1]\n            if d in connections:\n                return d\n        except IndexError:\n            pass\n        return None\n\n    def db_for_write(self, model, **hints):\n        m = model.__module__.split('.')\n        try:\n            d = m[-1]\n            if d in connections:\n                return d\n        except IndexError:\n            pass\n        return None\n\n    def allow_syncdb(self, db, model):\n        "Make sure syncdb doesn't run on anything but default"\n        if model._meta.app_label == 'myapp':\n            return False\n        elif db == 'default':\n            return True\n        return None\n
\n

The way this works is I create a file with the name of the database to use that holds my models. In your case, you'd create a separate models-style file called asterisk.py that was in the same folder as the models for your app.

\n

In your models.py file, you'd add

\n
from asterisk import *\n
\n

Then when you actually request a record from that model, it works something like this:

\n
    \n
  1. records = MyModel.object.all()
  2. \n
  3. module for MyModel is myapp.asterisk
  4. \n
  5. there's a connection called "asterisk" so use\nit instead of "default"
  6. \n
\n

Second Option

\n

If you want to have per-model control of database choice, something like this would work:

\n
from django.db import connections\n\nclass DBRouter(object):\n    """A router to control all database operations on models in\n    the contrib.auth application"""\n\n    def db_for_read(self, model, **hints):\n        if hasattr(model,'connection_name'):\n            return model.connection_name\n        return None\n\n    def db_for_write(self, model, **hints):\n        if hasattr(model,'connection_name'):\n            return model.connection_name\n        return None\n\n    def allow_syncdb(self, db, model):\n        if hasattr(model,'connection_name'):\n            return model.connection_name\n        return None\n
\n

Then for each model:

\n
class MyModel(models.Model):\n    connection_name="asterisk"\n    #etc...\n
\n

Note that I have not tested this second option.

\n soup wrap:

Yeah, it is a little bit complicated.

There are a number of ways you could implement it. Basically, you need some way of indicating which models are associated with which database.

First option

Here's the code that I use; hope it helps.

from django.db import connections

class DBRouter(object):
    """A router to control all database operations on models in
    the contrib.auth application"""

    def db_for_read(self, model, **hints):
        m = model.__module__.split('.')
        try:
            d = m[-1]
            if d in connections:
                return d
        except IndexError:
            pass
        return None

    def db_for_write(self, model, **hints):
        m = model.__module__.split('.')
        try:
            d = m[-1]
            if d in connections:
                return d
        except IndexError:
            pass
        return None

    def allow_syncdb(self, db, model):
        "Make sure syncdb doesn't run on anything but default"
        if model._meta.app_label == 'myapp':
            return False
        elif db == 'default':
            return True
        return None

The way this works is I create a file with the name of the database to use that holds my models. In your case, you'd create a separate models-style file called asterisk.py that was in the same folder as the models for your app.

In your models.py file, you'd add

from asterisk import *

Then when you actually request a record from that model, it works something like this:

  1. records = MyModel.object.all()
  2. module for MyModel is myapp.asterisk
  3. there's a connection called "asterisk" so use it instead of "default"

Second Option

If you want to have per-model control of database choice, something like this would work:

from django.db import connections

class DBRouter(object):
    """A router to control all database operations on models in
    the contrib.auth application"""

    def db_for_read(self, model, **hints):
        if hasattr(model,'connection_name'):
            return model.connection_name
        return None

    def db_for_write(self, model, **hints):
        if hasattr(model,'connection_name'):
            return model.connection_name
        return None

    def allow_syncdb(self, db, model):
        if hasattr(model,'connection_name'):
            return model.connection_name
        return None

Then for each model:

class MyModel(models.Model):
    connection_name="asterisk"
    #etc...

Note that I have not tested this second option.

qid & accept id: (3708418, 3708441) query: Regular Expression (Python) to extract strings of text from inside of < and > - e.g. etc soup:

Since the tag names of Stackoverflow do not have embedded < > you can use the regex:

\n
<(.*?)>\n
\n

or

\n
<([^>]*)>\n
\n

Explanation:

\n
    \n
  • < : A literal <
  • \n
  • (..) : To group and remember the\nmatch.
  • \n
  • .*? : To match anything in\nnon-greedy way.
  • \n
  • > : A literal <
  • \n
  • [^>] : A char class to match\nanything other than a >
  • \n
\n soup wrap:

Since the tag names of Stackoverflow do not have embedded < > you can use the regex:

<(.*?)>

or

<([^>]*)>

Explanation:

  • < : A literal <
  • (..) : To group and remember the match.
  • .*? : To match anything in non-greedy way.
  • > : A literal <
  • [^>] : A char class to match anything other than a >
qid & accept id: (3724488, 3724532) query: Django model form with selected rows soup:

This is how I'd go about it if this were a pure Django application (rather than app engine). You may perhaps find it useful.

\n

The key is to override the __init__() method of your ModelForm class to supply the currently logged in user instance.

\n
# forms.py\nclass TicketForm(forms.ModelForm):\n    def __init__(self, current_user, *args, **kwargs):\n        super(TicketForm, self).__init__(*args, **kwargs)\n        self.fields['event'].queryset = Event.objects.filter(creator = \n             current_user)\n
\n

You can then supply the user instance while creating an instance of the form.

\n
ticket_form = TicketForm(request.user)\n
\n soup wrap:

This is how I'd go about it if this were a pure Django application (rather than app engine). You may perhaps find it useful.

The key is to override the __init__() method of your ModelForm class to supply the currently logged in user instance.

# forms.py
class TicketForm(forms.ModelForm):
    def __init__(self, current_user, *args, **kwargs):
        super(TicketForm, self).__init__(*args, **kwargs)
        self.fields['event'].queryset = Event.objects.filter(creator = 
             current_user)

You can then supply the user instance while creating an instance of the form.

ticket_form = TicketForm(request.user)
qid & accept id: (3738269, 3738402) query: How to insert arrays into a database? soup:

You'll probably want to start out with a dogs table containing all the flat (non array) data for each dog, things which each dog has one of, like a name, a sex, and an age:

\n
CREATE TABLE `dogs` (\n  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,\n  `name` VARCHAR(64),\n  `age` INT UNSIGNED,\n  `sex` ENUM('Male','Female')\n);\n
\n

From there, each dog "has many" measurements, so you need a dog_mesaurements table to store the 24 measurements:

\n
CREATE TABLE `dog_measurements` (\n  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,\n  `dog_id` INT UNSIGNED NOT NULL,\n  `paw` ENUM ('Front Left','Front Right','Rear Left','Rear Right'),\n  `taken_at` DATETIME NOT NULL\n);\n
\n

Then whenever you take a measurement, you INSERT INTO dog_measurements (dog_id,taken_at) VALUES (*?*, NOW()); where * ? * is the dog's ID from the dogs table.

\n

You'll then want tables to store the actual frames for each measurement, something like:

\n
CREATE TABLE `dog_measurement_data` (\n  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,\n  `dog_measurement_id` INT UNSIGNED NOT NULL,\n  `frame` INT UNSIGNED,\n  `sensor_row` INT UNSIGNED,\n  `sensor_col` INT UNSIGNED,\n  `value` NUMBER\n);\n
\n

That way, for each of the 250 frames, you loop through each of the 63 sensors, and store the value for that sensor with the frame number into the database:

\n
INSERT INTO `dog_measurement_data` (`dog_measurement_id`,`frame`,`sensor_row`,`sensor_col`,`value`) VALUES\n(*measurement_id?*, *frame_number?*, *sensor_row?*, *sensor_col?*, *value?*)\n
\n

Obviously replace measurement_id?, frame_number?, sensor_number?, value? with real values :-)

\n

So basically, each dog_measurement_data is a single sensor value for a given frame. That way, to get all the sensor values for all a given frame, you would:

\n
SELECT `sensor_row`,sensor_col`,`value` FROM `dog_measurement_data`\nWHERE `dog_measurement_id`=*some measurement id* AND `frame`=*some frame number*\nORDER BY `sensor_row`,`sensor_col`\n
\n

And this will give you all the rows and cols for that frame.

\n soup wrap:

You'll probably want to start out with a dogs table containing all the flat (non array) data for each dog, things which each dog has one of, like a name, a sex, and an age:

CREATE TABLE `dogs` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `name` VARCHAR(64),
  `age` INT UNSIGNED,
  `sex` ENUM('Male','Female')
);

From there, each dog "has many" measurements, so you need a dog_mesaurements table to store the 24 measurements:

CREATE TABLE `dog_measurements` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `dog_id` INT UNSIGNED NOT NULL,
  `paw` ENUM ('Front Left','Front Right','Rear Left','Rear Right'),
  `taken_at` DATETIME NOT NULL
);

Then whenever you take a measurement, you INSERT INTO dog_measurements (dog_id,taken_at) VALUES (*?*, NOW()); where * ? * is the dog's ID from the dogs table.

You'll then want tables to store the actual frames for each measurement, something like:

CREATE TABLE `dog_measurement_data` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `dog_measurement_id` INT UNSIGNED NOT NULL,
  `frame` INT UNSIGNED,
  `sensor_row` INT UNSIGNED,
  `sensor_col` INT UNSIGNED,
  `value` NUMBER
);

That way, for each of the 250 frames, you loop through each of the 63 sensors, and store the value for that sensor with the frame number into the database:

INSERT INTO `dog_measurement_data` (`dog_measurement_id`,`frame`,`sensor_row`,`sensor_col`,`value`) VALUES
(*measurement_id?*, *frame_number?*, *sensor_row?*, *sensor_col?*, *value?*)

Obviously replace measurement_id?, frame_number?, sensor_number?, value? with real values :-)

So basically, each dog_measurement_data is a single sensor value for a given frame. That way, to get all the sensor values for all a given frame, you would:

SELECT `sensor_row`,sensor_col`,`value` FROM `dog_measurement_data`
WHERE `dog_measurement_id`=*some measurement id* AND `frame`=*some frame number*
ORDER BY `sensor_row`,`sensor_col`

And this will give you all the rows and cols for that frame.

qid & accept id: (3748356, 3749261) query: Summarizing inside a Django template soup:

From my experience with Django, I would say that these things aren't easily done in the template. I try to do my calculations in the view instead of the template.

\n

My recommendation would be to calculate the two sums you need in the view instead of the template.

\n

That beings said, it is possible to do some work in the template using custom filters and tags. Using filters it might look like this:

\n
{% documento.cuentasxdocumento_set.all | sum_monto:"pos" %}\n{% documento.cuentasxdocumento_set.all | sum_monto:"neg" %}\n
\n

Filters take two arguments, the value that you pass to the filter and an argument that you can use to control its behavior. You could use the last argument to tell sum_monto to sum the positive values or the negative values.

\n

This is a quick untested filter implementation off the top of my head:

\n
from django import template\n\nregister = template.Library()\n\n@register.filter\ndef sum_monto(cuentas, op):\n    if op == "pos":\n         return sum(c.monto for c in cuentas if c.monto > 0)\n    else\n         return sum(c.monto for c in cuentas if c.monto < 0)\n
\n soup wrap:

From my experience with Django, I would say that these things aren't easily done in the template. I try to do my calculations in the view instead of the template.

My recommendation would be to calculate the two sums you need in the view instead of the template.

That beings said, it is possible to do some work in the template using custom filters and tags. Using filters it might look like this:

{% documento.cuentasxdocumento_set.all | sum_monto:"pos" %}
{% documento.cuentasxdocumento_set.all | sum_monto:"neg" %}

Filters take two arguments, the value that you pass to the filter and an argument that you can use to control its behavior. You could use the last argument to tell sum_monto to sum the positive values or the negative values.

This is a quick untested filter implementation off the top of my head:

from django import template

register = template.Library()

@register.filter
def sum_monto(cuentas, op):
    if op == "pos":
         return sum(c.monto for c in cuentas if c.monto > 0)
    else
         return sum(c.monto for c in cuentas if c.monto < 0)
qid & accept id: (3788439, 3789047) query: Python socket send EOF soup:

Design a protocol (an agreement between client and server) on how to send messages. One simple way is "the first byte is the length of the message, followed by the message". Rough example:

\n

Client

\n
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32\nType "help", "copyright", "credits" or "license" for more information.\n>>> from socket import *\n>>> s=socket()\n>>> s.connect(('localhost',5000))\n>>> f=s.makefile()\n>>> f.write('\x04abcd')\n>>> f.flush()\n
\n

Server

\n
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32\nType "help", "copyright", "credits" or "license" for more information.\n>>> from socket import *\n>>> s=socket()\n>>> s.bind(('localhost',5000))\n>>> s.listen(1)\n>>> c,a=s.accept()\n>>> f=c.makefile()\n>>> length=ord(f.read(1))\n>>> f.read(length)\n'abcd'\n
\n soup wrap:

Design a protocol (an agreement between client and server) on how to send messages. One simple way is "the first byte is the length of the message, followed by the message". Rough example:

Client

Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from socket import *
>>> s=socket()
>>> s.connect(('localhost',5000))
>>> f=s.makefile()
>>> f.write('\x04abcd')
>>> f.flush()

Server

Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from socket import *
>>> s=socket()
>>> s.bind(('localhost',5000))
>>> s.listen(1)
>>> c,a=s.accept()
>>> f=c.makefile()
>>> length=ord(f.read(1))
>>> f.read(length)
'abcd'
qid & accept id: (3821957, 3824511) query: Compile Python 2.5.5 on OS X 10.6 soup:

Python 2.5 does not build correctly out of the box on Mac OS X 10.6. (It does build OK as is on 10.5 or 10.4, though.) There is at least one configure fix that needs to be backported from later Pythons. And you need to use gcc-4.0, not -4.2. Once you have extracted the source:

\n
cd ./Python-2.5.5/\ncat >patch-configure-for-10-6.patch <
\n

Then there are various less obvious build issues like third-party libraries that are needed for all of the standard library modules to build and work as expected - GNU readline and bsddb come to mind - so there is no guarantee that you won't run into other problems.

\n
$ python2.5\nPython 2.5.5 (r255:77872, Sep 29 2010, 10:23:54) \n[GCC 4.0.1 (Apple Inc. build 5494)] on darwin\nType "help", "copyright", "credits" or "license" for more information.\nModule readline not available.\n>>> \n
\n

You could try using the installer build script in the source tree (in Mac/BuildScript/) but it will likely need to be patched to work correctly on 10.6.

\n

Even though there is no official python.org installer for 2.5.5 (which just has security fixes), there is an OS X installer for 2.5.4 which works fine on 10.6. Or use the Apple-supplied 2.5.4. Or try MacPorts. It will be nice when GAE is supported on current Python versions.

\n soup wrap:

Python 2.5 does not build correctly out of the box on Mac OS X 10.6. (It does build OK as is on 10.5 or 10.4, though.) There is at least one configure fix that needs to be backported from later Pythons. And you need to use gcc-4.0, not -4.2. Once you have extracted the source:

cd ./Python-2.5.5/
cat >patch-configure-for-10-6.patch <

Then there are various less obvious build issues like third-party libraries that are needed for all of the standard library modules to build and work as expected - GNU readline and bsddb come to mind - so there is no guarantee that you won't run into other problems.

$ python2.5
Python 2.5.5 (r255:77872, Sep 29 2010, 10:23:54) 
[GCC 4.0.1 (Apple Inc. build 5494)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Module readline not available.
>>> 

You could try using the installer build script in the source tree (in Mac/BuildScript/) but it will likely need to be patched to work correctly on 10.6.

Even though there is no official python.org installer for 2.5.5 (which just has security fixes), there is an OS X installer for 2.5.4 which works fine on 10.6. Or use the Apple-supplied 2.5.4. Or try MacPorts. It will be nice when GAE is supported on current Python versions.

qid & accept id: (3843017, 3843124) query: Efficiently detect sign-changes in python soup:

What about:

\n
import numpy\na = [1, 2, 1, 1, -3, -4, 7, 8, 9, 10, -2, 1, -3, 5, 6, 7, -10]\nzero_crossings = numpy.where(numpy.diff(numpy.sign(a)))[0]\n
\n

Output:

\n
> zero_crossings\narray([ 3,  5,  9, 10, 11, 12, 15])\n
\n

i.e. zero_crossings will contain the indices of elements after which a zero crossing occurs. If you want the elements before, just add 1 to that array.

\n soup wrap:

What about:

import numpy
a = [1, 2, 1, 1, -3, -4, 7, 8, 9, 10, -2, 1, -3, 5, 6, 7, -10]
zero_crossings = numpy.where(numpy.diff(numpy.sign(a)))[0]

Output:

> zero_crossings
array([ 3,  5,  9, 10, 11, 12, 15])

i.e. zero_crossings will contain the indices of elements after which a zero crossing occurs. If you want the elements before, just add 1 to that array.

qid & accept id: (3862310, 3862957) query: How can I find all subclasses of a class given its name? soup:

New-style classes (i.e. subclassed from object, which is the default in Python 3) have a __subclasses__ method which returns the subclasses:

\n
class Foo(object): pass\nclass Bar(Foo): pass\nclass Baz(Foo): pass\nclass Bing(Bar): pass\n
\n

Here are the names of the subclasses:

\n
print([cls.__name__ for cls in vars()['Foo'].__subclasses__()])\n# ['Bar', 'Baz']\n
\n

Here are the subclasses themselves:

\n
print(vars()['Foo'].__subclasses__())\n# [, ]\n
\n

Confirmation that the subclasses do indeed list Foo as their base:

\n
for cls in vars()['Foo'].__subclasses__():\n    print(cls.__base__)\n# \n# \n
\n

Note if you want subsubclasses, you'll have to recurse:

\n
def all_subclasses(cls):\n    return cls.__subclasses__() + [g for s in cls.__subclasses__()\n                                   for g in all_subclasses(s)]\n\nprint(all_subclasses(vars()['Foo']))\n# [, , ]\n
\n soup wrap:

New-style classes (i.e. subclassed from object, which is the default in Python 3) have a __subclasses__ method which returns the subclasses:

class Foo(object): pass
class Bar(Foo): pass
class Baz(Foo): pass
class Bing(Bar): pass

Here are the names of the subclasses:

print([cls.__name__ for cls in vars()['Foo'].__subclasses__()])
# ['Bar', 'Baz']

Here are the subclasses themselves:

print(vars()['Foo'].__subclasses__())
# [, ]

Confirmation that the subclasses do indeed list Foo as their base:

for cls in vars()['Foo'].__subclasses__():
    print(cls.__base__)
# 
# 

Note if you want subsubclasses, you'll have to recurse:

def all_subclasses(cls):
    return cls.__subclasses__() + [g for s in cls.__subclasses__()
                                   for g in all_subclasses(s)]

print(all_subclasses(vars()['Foo']))
# [, , ]
qid & accept id: (3947313, 3947323) query: Python script to loop through all files in directory, delete any that are less than 200 kB in size soup:

This does directory and all subdirectories:

\n
import os, os.path\n\nfor root, _, files in os.walk(dirtocheck):\n    for f in files:\n        fullpath = os.path.join(root, f)\n        if os.path.getsize(fullpath) < 200 * 1024:\n            os.remove(fullpath)\n
\n

Or:

\n
import os, os.path\n\nfileiter = (os.path.join(root, f)\n    for root, _, files in os.walk(dirtocheck)\n    for f in files)\nsmallfileiter = (f for f in fileiter if os.path.getsize(f) < 200 * 1024)\nfor small in smallfileiter:\n    os.remove(small)\n
\n soup wrap:

This does directory and all subdirectories:

import os, os.path

for root, _, files in os.walk(dirtocheck):
    for f in files:
        fullpath = os.path.join(root, f)
        if os.path.getsize(fullpath) < 200 * 1024:
            os.remove(fullpath)

Or:

import os, os.path

fileiter = (os.path.join(root, f)
    for root, _, files in os.walk(dirtocheck)
    for f in files)
smallfileiter = (f for f in fileiter if os.path.getsize(f) < 200 * 1024)
for small in smallfileiter:
    os.remove(small)
qid & accept id: (3947654, 3947666) query: Python - removing items from lists soup:

Here are some tries:

\n
L4 = [ n for n in L1 if (n not in L2) and (n not in L3) ]  # parens for clarity\n\ntmpset = set( L2 + L3 )\nL4 = [ n for n in L1 if n not in tmpset ]\n
\n

Now that I have had a moment to think, I realize that the L2 + L3 thing creates a temporary list that immediately gets thrown away. So an even better way is:

\n
tmpset = set(L2)\ntmpset.update(L3)\nL4 = [ n for n in L1 if n not in tmpset ]\n
\n

Update: I see some extravagant claims being thrown around about performance, and I want to assert that my solution was already as fast as possible. Creating intermediate results, whether they be intermediate lists or intermediate iterators that then have to be called into repeatedly, will be slower, always, than simply giving L2 and L3 for the set to iterate over directly like I have done here.

\n
$ python -m timeit \\n  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \\n  'ts = set(L2); ts.update(L3); L4 = [ n for n in L1 if n not in ts ]'\n10000 loops, best of 3: 39.7 usec per loop\n
\n

All other alternatives (that I can think of) will necessarily be slower than this. Doing the loops ourselves, for example, rather than letting the set() constructor do them, adds expense:

\n
$ python -m timeit \\n  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \\n  'unwanted = frozenset(item for lst in (L2, L3) for item in lst); L4 = [ n for n in L1 if n not in unwanted ]'\n10000 loops, best of 3: 46.4 usec per loop\n
\n

Using iterators, will all of the state-saving and callbacks they involve, will obviously be even more expensive:

\n
$ python -m timeit \\n  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2);from itertools import ifilterfalse, chain' \\n  'L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))' \n10000 loops, best of 3: 47.1 usec per loop\n
\n

So I believe that the answer I gave last night is still far and away (for values of "far and away" greater than around 5µsec, obviously) the best, unless the questioner will have duplicates in L1 and wants them removed once each for every time the duplicate appears in one of the other lists.

\n soup wrap:

Here are some tries:

L4 = [ n for n in L1 if (n not in L2) and (n not in L3) ]  # parens for clarity

tmpset = set( L2 + L3 )
L4 = [ n for n in L1 if n not in tmpset ]

Now that I have had a moment to think, I realize that the L2 + L3 thing creates a temporary list that immediately gets thrown away. So an even better way is:

tmpset = set(L2)
tmpset.update(L3)
L4 = [ n for n in L1 if n not in tmpset ]

Update: I see some extravagant claims being thrown around about performance, and I want to assert that my solution was already as fast as possible. Creating intermediate results, whether they be intermediate lists or intermediate iterators that then have to be called into repeatedly, will be slower, always, than simply giving L2 and L3 for the set to iterate over directly like I have done here.

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
  'ts = set(L2); ts.update(L3); L4 = [ n for n in L1 if n not in ts ]'
10000 loops, best of 3: 39.7 usec per loop

All other alternatives (that I can think of) will necessarily be slower than this. Doing the loops ourselves, for example, rather than letting the set() constructor do them, adds expense:

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2)' \
  'unwanted = frozenset(item for lst in (L2, L3) for item in lst); L4 = [ n for n in L1 if n not in unwanted ]'
10000 loops, best of 3: 46.4 usec per loop

Using iterators, will all of the state-saving and callbacks they involve, will obviously be even more expensive:

$ python -m timeit \
  -s 'L1=range(300);L2=range(30,70,2);L3=range(120,220,2);from itertools import ifilterfalse, chain' \
  'L4 = list(ifilterfalse(frozenset(chain(L2, L3)).__contains__, L1))' 
10000 loops, best of 3: 47.1 usec per loop

So I believe that the answer I gave last night is still far and away (for values of "far and away" greater than around 5µsec, obviously) the best, unless the questioner will have duplicates in L1 and wants them removed once each for every time the duplicate appears in one of the other lists.

qid & accept id: (3955571, 3955630) query: How to pass variable arguments from bash script to python script soup:

Edit, since code has been posted

\n

Your code is doing the correct thing - except that the output from your bar.py script is being captured into the array joined. Since it looks like you're not printing out the contents of joined, you never see any output.

\n

Here's a demonstration:

\n

File pybash.sh

\n
#!/bin/bash\n\ndeclare -a list1\ndeclare -a list2\n\nlist1=("Hello" "there" "honey")\nlist2=("More" "strings" "here")\n\ndeclare -a joined\n\njoined=($(./pytest.py ${list1[@]} ${list2[@]}))\necho ${joined[@]}\n
\n

File pytest.py

\n
#!/usr/bin/python\n\nimport sys\n\nfor i in sys.argv:\n    print "hi"\n
\n

This will print out a bunch of 'hi' strings if you run the bash script.

\n soup wrap:

Edit, since code has been posted

Your code is doing the correct thing - except that the output from your bar.py script is being captured into the array joined. Since it looks like you're not printing out the contents of joined, you never see any output.

Here's a demonstration:

File pybash.sh

#!/bin/bash

declare -a list1
declare -a list2

list1=("Hello" "there" "honey")
list2=("More" "strings" "here")

declare -a joined

joined=($(./pytest.py ${list1[@]} ${list2[@]}))
echo ${joined[@]}

File pytest.py

#!/usr/bin/python

import sys

for i in sys.argv:
    print "hi"

This will print out a bunch of 'hi' strings if you run the bash script.

qid & accept id: (3966201, 3966225) query: how to use python list comprehensions replace the function invoke inside of "for" stmt? soup:

For the second question

\n

List comprehensions are used for generating another list as output of iteration over other list or lists. Since you want to run foo a numer of times, it is more elegant and less confusing to use for .. in range(..) loop.

\n

If you are interested in collating the return value of foo, then you should use list comprehension else for loop is good. At least I would write it that way.

\n

See the example below:

\n
>>> [x for x in range(10)]\n[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n>>> def foo(): print 'foo'\n... \n>>> \n>>> [foo() for x in range(10)]\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\n[None, None, None, None, None, None, None, None, None, None]\n>>> \n
\n

[Edit: As per request]

\n

The iter version that was provided by eumiro.

\n
>>> results = ( foo() for _ in xrange(10) )\n>>> results\n at 0x10041f960>\n>>> list(results)\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\nfoo\n[None, None, None, None, None, None, None, None, None, None]\n>>> \n
\n soup wrap:

For the second question

List comprehensions are used for generating another list as output of iteration over other list or lists. Since you want to run foo a numer of times, it is more elegant and less confusing to use for .. in range(..) loop.

If you are interested in collating the return value of foo, then you should use list comprehension else for loop is good. At least I would write it that way.

See the example below:

>>> [x for x in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> def foo(): print 'foo'
... 
>>> 
>>> [foo() for x in range(10)]
foo
foo
foo
foo
foo
foo
foo
foo
foo
foo
[None, None, None, None, None, None, None, None, None, None]
>>> 

[Edit: As per request]

The iter version that was provided by eumiro.

>>> results = ( foo() for _ in xrange(10) )
>>> results
 at 0x10041f960>
>>> list(results)
foo
foo
foo
foo
foo
foo
foo
foo
foo
foo
[None, None, None, None, None, None, None, None, None, None]
>>> 
qid & accept id: (3984539, 3984615) query: Python: use regular expression to remove the white space from all lines soup:

Python's regex module does not default to multi-line ^ matching, so you need to specify that flag explicitly.

\n
r = re.compile(r"^\s+", re.MULTILINE)\nr.sub("", "a\n b\n c") # "a\nb\nc"\n\n# or without compiling (only possible for Python 2.7+ because the flags option\n# didn't exist in earlier versions of re.sub)\n\nre.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)\n\n# but mind that \s includes newlines:\nr.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"\n
\n

It's also possible to include the flag inline to the pattern:

\n
re.sub(r"(?m)^\s+", "", "a\n b\n c")\n
\n

An easier solution is to avoid regular expressions because the original problem is very simple:

\n
content = 'a\n b\n\n c'\nstripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))\n# stripped_content == 'a\nb\n\nc'\n
\n soup wrap:

Python's regex module does not default to multi-line ^ matching, so you need to specify that flag explicitly.

r = re.compile(r"^\s+", re.MULTILINE)
r.sub("", "a\n b\n c") # "a\nb\nc"

# or without compiling (only possible for Python 2.7+ because the flags option
# didn't exist in earlier versions of re.sub)

re.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)

# but mind that \s includes newlines:
r.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"

It's also possible to include the flag inline to the pattern:

re.sub(r"(?m)^\s+", "", "a\n b\n c")

An easier solution is to avoid regular expressions because the original problem is very simple:

content = 'a\n b\n\n c'
stripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))
# stripped_content == 'a\nb\n\nc'
qid & accept id: (3986345, 3986876) query: How to find the local minima of a smooth multidimensional array in NumPy efficiently? soup:

The location of the local minima can be found for an array of arbitrary dimension\nusing Ivan's detect_peaks function, with minor modifications:

\n
import numpy as np\nimport scipy.ndimage.filters as filters\nimport scipy.ndimage.morphology as morphology\n\ndef detect_local_minima(arr):\n    # https://stackoverflow.com/questions/3684484/peak-detection-in-a-2d-array/3689710#3689710\n    """\n    Takes an array and detects the troughs using the local maximum filter.\n    Returns a boolean mask of the troughs (i.e. 1 when\n    the pixel's value is the neighborhood maximum, 0 otherwise)\n    """\n    # define an connected neighborhood\n    # http://www.scipy.org/doc/api_docs/SciPy.ndimage.morphology.html#generate_binary_structure\n    neighborhood = morphology.generate_binary_structure(len(arr.shape),2)\n    # apply the local minimum filter; all locations of minimum value \n    # in their neighborhood are set to 1\n    # http://www.scipy.org/doc/api_docs/SciPy.ndimage.filters.html#minimum_filter\n    local_min = (filters.minimum_filter(arr, footprint=neighborhood)==arr)\n    # local_min is a mask that contains the peaks we are \n    # looking for, but also the background.\n    # In order to isolate the peaks we must remove the background from the mask.\n    # \n    # we create the mask of the background\n    background = (arr==0)\n    # \n    # a little technicality: we must erode the background in order to \n    # successfully subtract it from local_min, otherwise a line will \n    # appear along the background border (artifact of the local minimum filter)\n    # http://www.scipy.org/doc/api_docs/SciPy.ndimage.morphology.html#binary_erosion\n    eroded_background = morphology.binary_erosion(\n        background, structure=neighborhood, border_value=1)\n    # \n    # we obtain the final mask, containing only peaks, \n    # by removing the background from the local_min mask\n    detected_minima = local_min - eroded_background\n    return np.where(detected_minima)       \n
\n

which you can use like this:

\n
arr=np.array([[[0,0,0,-1],[0,0,0,0],[0,0,0,0],[0,0,0,0],[-1,0,0,0]],\n              [[0,0,0,0],[0,-1,0,0],[0,0,0,0],[0,0,0,-1],[0,0,0,0]]])\nlocal_minima_locations = detect_local_minima(arr)\nprint(arr)\n# [[[ 0  0  0 -1]\n#   [ 0  0  0  0]\n#   [ 0  0  0  0]\n#   [ 0  0  0  0]\n#   [-1  0  0  0]]\n\n#  [[ 0  0  0  0]\n#   [ 0 -1  0  0]\n#   [ 0  0  0  0]\n#   [ 0  0  0 -1]\n#   [ 0  0  0  0]]]\n
\n

This says the minima occur at indices [0,0,3], [0,4,0], [1,1,1] and [1,3,3]:

\n
print(local_minima_locations)\n# (array([0, 0, 1, 1]), array([0, 4, 1, 3]), array([3, 0, 1, 3]))\nprint(arr[local_minima_locations])\n# [-1 -1 -1 -1]\n
\n soup wrap:

The location of the local minima can be found for an array of arbitrary dimension using Ivan's detect_peaks function, with minor modifications:

import numpy as np
import scipy.ndimage.filters as filters
import scipy.ndimage.morphology as morphology

def detect_local_minima(arr):
    # https://stackoverflow.com/questions/3684484/peak-detection-in-a-2d-array/3689710#3689710
    """
    Takes an array and detects the troughs using the local maximum filter.
    Returns a boolean mask of the troughs (i.e. 1 when
    the pixel's value is the neighborhood maximum, 0 otherwise)
    """
    # define an connected neighborhood
    # http://www.scipy.org/doc/api_docs/SciPy.ndimage.morphology.html#generate_binary_structure
    neighborhood = morphology.generate_binary_structure(len(arr.shape),2)
    # apply the local minimum filter; all locations of minimum value 
    # in their neighborhood are set to 1
    # http://www.scipy.org/doc/api_docs/SciPy.ndimage.filters.html#minimum_filter
    local_min = (filters.minimum_filter(arr, footprint=neighborhood)==arr)
    # local_min is a mask that contains the peaks we are 
    # looking for, but also the background.
    # In order to isolate the peaks we must remove the background from the mask.
    # 
    # we create the mask of the background
    background = (arr==0)
    # 
    # a little technicality: we must erode the background in order to 
    # successfully subtract it from local_min, otherwise a line will 
    # appear along the background border (artifact of the local minimum filter)
    # http://www.scipy.org/doc/api_docs/SciPy.ndimage.morphology.html#binary_erosion
    eroded_background = morphology.binary_erosion(
        background, structure=neighborhood, border_value=1)
    # 
    # we obtain the final mask, containing only peaks, 
    # by removing the background from the local_min mask
    detected_minima = local_min - eroded_background
    return np.where(detected_minima)       

which you can use like this:

arr=np.array([[[0,0,0,-1],[0,0,0,0],[0,0,0,0],[0,0,0,0],[-1,0,0,0]],
              [[0,0,0,0],[0,-1,0,0],[0,0,0,0],[0,0,0,-1],[0,0,0,0]]])
local_minima_locations = detect_local_minima(arr)
print(arr)
# [[[ 0  0  0 -1]
#   [ 0  0  0  0]
#   [ 0  0  0  0]
#   [ 0  0  0  0]
#   [-1  0  0  0]]

#  [[ 0  0  0  0]
#   [ 0 -1  0  0]
#   [ 0  0  0  0]
#   [ 0  0  0 -1]
#   [ 0  0  0  0]]]

This says the minima occur at indices [0,0,3], [0,4,0], [1,1,1] and [1,3,3]:

print(local_minima_locations)
# (array([0, 0, 1, 1]), array([0, 4, 1, 3]), array([3, 0, 1, 3]))
print(arr[local_minima_locations])
# [-1 -1 -1 -1]
qid & accept id: (4006970, 4007052) query: Monitor ZIP File Extraction Python soup:

here an example that you can start with, it's not optimized:

\n
import zipfile\n\nzf = zipfile.ZipFile('test.zip')\n\nuncompress_size = sum((file.file_size for file in zf.infolist()))\n\nextracted_size = 0\n\nfor file in zf.infolist():\n    extracted_size += file.file_size\n    print "%s %%" % (extracted_size * 100/uncompress_size)\n    zf.extract(file)\n
\n

to make it more beautiful do this when printing:

\n
 print "%s %%\r" % (extracted_size * 100/uncompress_size),\n
\n soup wrap:

here an example that you can start with, it's not optimized:

import zipfile

zf = zipfile.ZipFile('test.zip')

uncompress_size = sum((file.file_size for file in zf.infolist()))

extracted_size = 0

for file in zf.infolist():
    extracted_size += file.file_size
    print "%s %%" % (extracted_size * 100/uncompress_size)
    zf.extract(file)

to make it more beautiful do this when printing:

 print "%s %%\r" % (extracted_size * 100/uncompress_size),
qid & accept id: (4046986, 4047415) query: python - how to get the numebr of active threads started by specific class? soup:

This is a minor modification of Doug Hellman's multiprocessing ActivePool example code (to use threading). The idea is to have your workers register themselves in a pool, unregister themselves when they finish, using a threading.Lock to coordinate modification of the pool's active list:

\n
import threading\nimport time\nimport random\n\nclass ActivePool(object):\n    def __init__(self):\n        super(ActivePool, self).__init__()\n        self.active=[]\n        self.lock=threading.Lock()\n    def makeActive(self, name):\n        with self.lock:\n            self.active.append(name)\n    def makeInactive(self, name):\n        with self.lock:\n            self.active.remove(name)\n    def numActive(self):\n        with self.lock:\n            return len(self.active)\n    def __str__(self):\n        with self.lock:\n            return str(self.active)\ndef worker(pool):\n    name=threading.current_thread().name\n    pool.makeActive(name)\n    print 'Now running: %s' % str(pool)\n    time.sleep(random.randint(1,3))\n    pool.makeInactive(name)\n\nif __name__=='__main__':\n    poolA=ActivePool()\n    poolB=ActivePool()    \n    jobs=[]\n    for i in range(5):\n        jobs.append(\n            threading.Thread(target=worker, name='A{0}'.format(i),\n                             args=(poolA,)))\n        jobs.append(\n            threading.Thread(target=worker, name='B{0}'.format(i),\n                             args=(poolB,)))\n    for j in jobs:\n        j.daemon=True\n        j.start()\n    while threading.activeCount()>1:\n        for j in jobs:\n            j.join(1)\n            print 'A-threads active: {0}, B-threads active: {1}'.format(\n                poolA.numActive(),poolB.numActive())\n
\n

yields

\n
Now running: ['A0']\nNow running: ['B0']\nNow running: ['A0', 'A1']\nNow running: ['B0', 'B1']\n Now running: ['A0', 'A1', 'A2']\n Now running: ['B0', 'B1', 'B2']\nNow running: ['A0', 'A1', 'A2', 'A3']\nNow running: ['B0', 'B1', 'B2', 'B3']\nNow running: ['A0', 'A1', 'A2', 'A3', 'A4']\nNow running: ['B0', 'B1', 'B2', 'B3', 'B4']\nA-threads active: 4, B-threads active: 5\nA-threads active: 2, B-threads active: 5\nA-threads active: 0, B-threads active: 3\nA-threads active: 0, B-threads active: 3\nA-threads active: 0, B-threads active: 3\nA-threads active: 0, B-threads active: 3\nA-threads active: 0, B-threads active: 3\nA-threads active: 0, B-threads active: 0\nA-threads active: 0, B-threads active: 0\nA-threads active: 0, B-threads active: 0\n
\n soup wrap:

This is a minor modification of Doug Hellman's multiprocessing ActivePool example code (to use threading). The idea is to have your workers register themselves in a pool, unregister themselves when they finish, using a threading.Lock to coordinate modification of the pool's active list:

import threading
import time
import random

class ActivePool(object):
    def __init__(self):
        super(ActivePool, self).__init__()
        self.active=[]
        self.lock=threading.Lock()
    def makeActive(self, name):
        with self.lock:
            self.active.append(name)
    def makeInactive(self, name):
        with self.lock:
            self.active.remove(name)
    def numActive(self):
        with self.lock:
            return len(self.active)
    def __str__(self):
        with self.lock:
            return str(self.active)
def worker(pool):
    name=threading.current_thread().name
    pool.makeActive(name)
    print 'Now running: %s' % str(pool)
    time.sleep(random.randint(1,3))
    pool.makeInactive(name)

if __name__=='__main__':
    poolA=ActivePool()
    poolB=ActivePool()    
    jobs=[]
    for i in range(5):
        jobs.append(
            threading.Thread(target=worker, name='A{0}'.format(i),
                             args=(poolA,)))
        jobs.append(
            threading.Thread(target=worker, name='B{0}'.format(i),
                             args=(poolB,)))
    for j in jobs:
        j.daemon=True
        j.start()
    while threading.activeCount()>1:
        for j in jobs:
            j.join(1)
            print 'A-threads active: {0}, B-threads active: {1}'.format(
                poolA.numActive(),poolB.numActive())

yields

Now running: ['A0']
Now running: ['B0']
Now running: ['A0', 'A1']
Now running: ['B0', 'B1']
 Now running: ['A0', 'A1', 'A2']
 Now running: ['B0', 'B1', 'B2']
Now running: ['A0', 'A1', 'A2', 'A3']
Now running: ['B0', 'B1', 'B2', 'B3']
Now running: ['A0', 'A1', 'A2', 'A3', 'A4']
Now running: ['B0', 'B1', 'B2', 'B3', 'B4']
A-threads active: 4, B-threads active: 5
A-threads active: 2, B-threads active: 5
A-threads active: 0, B-threads active: 3
A-threads active: 0, B-threads active: 3
A-threads active: 0, B-threads active: 3
A-threads active: 0, B-threads active: 3
A-threads active: 0, B-threads active: 3
A-threads active: 0, B-threads active: 0
A-threads active: 0, B-threads active: 0
A-threads active: 0, B-threads active: 0
qid & accept id: (4112561, 4112761) query: Boost.Python: Ownership of pointer variables soup:

Answering my own question:

\n

I've missed an FAQ entry in the Boost.Python documentation that gave me the right hint:

\n
//The node class should be held by std::auto_ptr\nclass_ >("Node")\n
\n

Create a thin wrapper function for the add_child method:

\n
void node_add_child(Node& n, std::auto_ptr child) {\n   n.add_child(child.get());\n   child.release();\n}\n
\n

Complete code to expose the node class:

\n
//The node class should be held by std::auto_ptr\nclass_ >("Node")\n//expose the thin wrapper function as node.add_child()\n.def("addChild", &node_add_child)\n;\n
\n soup wrap:

Answering my own question:

I've missed an FAQ entry in the Boost.Python documentation that gave me the right hint:

//The node class should be held by std::auto_ptr
class_ >("Node")

Create a thin wrapper function for the add_child method:

void node_add_child(Node& n, std::auto_ptr child) {
   n.add_child(child.get());
   child.release();
}

Complete code to expose the node class:

//The node class should be held by std::auto_ptr
class_ >("Node")
//expose the thin wrapper function as node.add_child()
.def("addChild", &node_add_child)
;
qid & accept id: (4201562, 4201718) query: Using lxml to extract data where all elements are not known in advance soup:

To get all the tags, we iter through the document like this:

\n

Suppose your XML structure is like this:

\n
\n One Main Street\n Gotham City\n 99999 0123\n 555-123-5467\n
\n
\n

We parse it:

\n
>>> from lxml import etree\n>>> f = etree.parse('foo.xml')  # path to XML file\n>>> root = f.getroot() # get the root element\n>>> for tags in root.iter(): # iter through the root element\n...     print tags.tag       # print all the tags\n... \nADDRESS\nSTREET\nCITY\nZIP\nPHONE\n
\n

Now suppose your XML has extra tags as well; tags you are not aware about. Since we are iterating through the XML, the above code will return those tags as well.

\n
\n One Main Street\n One Second Street\n Gotham City\n 99999 0123\n 555-123-5467 \n USA \n
\n
\n

The above code returns:

\n
ADDRESS\nSTREET\nSTREET1\nCITY\nZIP\nPHONE\nCOUNTRY\n
\n

Now if we want to get the text of the tags, the procedure is the same. Just print tag.text like this:

\n
>>> for tags in root.iter():\n...     print tags.text\n... \n\nOne Main Street\nOne Second Street\nGotham City\n99999 0123\n555-123-5467\nUSA\n
\n soup wrap:

To get all the tags, we iter through the document like this:

Suppose your XML structure is like this:

One Main Street Gotham City 99999 0123 555-123-5467

We parse it:

>>> from lxml import etree
>>> f = etree.parse('foo.xml')  # path to XML file
>>> root = f.getroot() # get the root element
>>> for tags in root.iter(): # iter through the root element
...     print tags.tag       # print all the tags
... 
ADDRESS
STREET
CITY
ZIP
PHONE

Now suppose your XML has extra tags as well; tags you are not aware about. Since we are iterating through the XML, the above code will return those tags as well.

One Main Street One Second Street Gotham City 99999 0123 555-123-5467 USA

The above code returns:

ADDRESS
STREET
STREET1
CITY
ZIP
PHONE
COUNTRY

Now if we want to get the text of the tags, the procedure is the same. Just print tag.text like this:

>>> for tags in root.iter():
...     print tags.text
... 

One Main Street
One Second Street
Gotham City
99999 0123
555-123-5467
USA
qid & accept id: (4219843, 4225433) query: container where values expire in python soup:

Here is a thread safe version of ExpireCounter:

\n
import datetime\nimport collections\nimport threading\n\nclass ExpireCounter:\n    """Tracks how many events were added in the preceding time period\n    """\n\n    def __init__(self, timeout=1):\n        self.lock=threading.Lock()        \n        self.timeout = timeout\n        self.events = collections.deque()\n\n    def add(self,item):\n        """Add event time\n        """\n        with self.lock:\n            self.events.append(item)\n            threading.Timer(self.timeout,self.expire).start()\n\n    def __len__(self):\n        """Return number of active events\n        """\n        with self.lock:\n            return len(self.events)\n\n    def expire(self):\n        """Remove any expired events\n        """\n        with self.lock:\n            self.events.popleft()\n\n    def __str__(self):\n        with self.lock:\n            return str(self.events)\n
\n

which can be used like this:

\n
import time\nc = ExpireCounter()\nassert(len(c) == 0)\nprint(c)\n# deque([])\n\nc.add(datetime.datetime.now())\ntime.sleep(0.75)\nc.add(datetime.datetime.now())    \nassert(len(c) == 2)\nprint(c)\n# deque([datetime.datetime(2010, 11, 19, 8, 50, 0, 91426), datetime.datetime(2010, 11, 19, 8, 50, 0, 842715)])\n\ntime.sleep(0.75)\nassert(len(c) == 1)\nprint(c)\n# deque([datetime.datetime(2010, 11, 19, 8, 50, 0, 842715)])\n
\n soup wrap:

Here is a thread safe version of ExpireCounter:

import datetime
import collections
import threading

class ExpireCounter:
    """Tracks how many events were added in the preceding time period
    """

    def __init__(self, timeout=1):
        self.lock=threading.Lock()        
        self.timeout = timeout
        self.events = collections.deque()

    def add(self,item):
        """Add event time
        """
        with self.lock:
            self.events.append(item)
            threading.Timer(self.timeout,self.expire).start()

    def __len__(self):
        """Return number of active events
        """
        with self.lock:
            return len(self.events)

    def expire(self):
        """Remove any expired events
        """
        with self.lock:
            self.events.popleft()

    def __str__(self):
        with self.lock:
            return str(self.events)

which can be used like this:

import time
c = ExpireCounter()
assert(len(c) == 0)
print(c)
# deque([])

c.add(datetime.datetime.now())
time.sleep(0.75)
c.add(datetime.datetime.now())    
assert(len(c) == 2)
print(c)
# deque([datetime.datetime(2010, 11, 19, 8, 50, 0, 91426), datetime.datetime(2010, 11, 19, 8, 50, 0, 842715)])

time.sleep(0.75)
assert(len(c) == 1)
print(c)
# deque([datetime.datetime(2010, 11, 19, 8, 50, 0, 842715)])
qid & accept id: (4339273, 4339327) query: Can I cleanse a numpy array without a loop? soup:
import numpy as np\ndeltas=np.diff(data)\ndeltas[deltas<0]=0\ndeltas[deltas>100]=0\n
\n

Also possible, and a bit quicker is

\n
deltas[(deltas<0) | (deltas>100)]=0\n
\n soup wrap:
import numpy as np
deltas=np.diff(data)
deltas[deltas<0]=0
deltas[deltas>100]=0

Also possible, and a bit quicker is

deltas[(deltas<0) | (deltas>100)]=0
qid & accept id: (4339736, 4339875) query: Create a dovecot SHA1 digest using bash or python or some other linux command-line tool soup:

You need to base64 encode the binary digest to get it into their format.

\n
>>> import hashlib\n>>> import base64\n\n>>> p = hashlib.sha1('password')\n>>> base64.b64encode(p.digest())\n'W6ph5Mm5Pz8GgiULbPgzG37mj9g='\n
\n

EDIT: By the way if you'd prefer to do this from a terminal/bash script, you can do

\n
$ echo -n 'password' | openssl sha1 -binary | base64     \nW6ph5Mm5Pz8GgiULbPgzG37mj9g=\n
\n

Also, you can tell dovecotpw didn't give a hexdigest of the hash anymore because it has more the chars aren't all hexidecimal [0-9a-f]. The use of characters [A-Za-z0-9+/] with the = ending suggests it was base64 conversion of the hash.

\n soup wrap:

You need to base64 encode the binary digest to get it into their format.

>>> import hashlib
>>> import base64

>>> p = hashlib.sha1('password')
>>> base64.b64encode(p.digest())
'W6ph5Mm5Pz8GgiULbPgzG37mj9g='

EDIT: By the way if you'd prefer to do this from a terminal/bash script, you can do

$ echo -n 'password' | openssl sha1 -binary | base64     
W6ph5Mm5Pz8GgiULbPgzG37mj9g=

Also, you can tell dovecotpw didn't give a hexdigest of the hash anymore because it has more the chars aren't all hexidecimal [0-9a-f]. The use of characters [A-Za-z0-9+/] with the = ending suggests it was base64 conversion of the hash.

qid & accept id: (4397859, 4411184) query: Smart filter with python soup:

I'm convinced Zach's answer is on the right track. Out of curiosity, I've implemented another version (incorporating Zach's comments about using a dict instead of bisect) and folded it into a solution that matches your example.

\n
#!/usr/bin/env python\nimport re\nfrom trieMatch import PrefixMatch # https://gist.github.com/736416\n\npm = PrefixMatch(['YELLOW', 'GREEN', 'RED', ]) # huge list of 10 000 members\n# if list is static, it might be worth picking "pm" to avoid rebuilding each time\n\nf = open("huge_file.txt", "r") ## file with > 100 000 lines\nlines = f.readlines()\nf.close()\n\nregexp = re.compile(r'^.*?fruit=([A-Z]+)')\nfiltered = (line for line in lines if pm.match(regexp.match(line).group(1)))\n
\n

For brevity, implementation of PrefixMatch is published here.

\n

If your list of necessary prefixes is static or changes infrequently, you can speed up subsequent runs by pickling and reusing the PickleMatch object instead of rebuilding it each time.

\n

update (on sorted results)

\n

According to the changelog for Python 2.4:

\n
\n

key should be a single-parameter function that takes a list element and\n returns a comparison key for the\n element. The list is then sorted using\n the comparison keys.

\n
\n

also, in the source code, line 1792:

\n
/* Special wrapper to support stable sorting using the decorate-sort-undecorate\n   pattern.  Holds a key which is used for comparisons and the original record\n   which is returned during the undecorate phase.  By exposing only the key\n   .... */\n
\n

This means that your regex pattern is only evaluated once for each entry (not once for each compare), hence it should not be too expensive to do:

\n
sorted_generator = sorted(filtered, key=regexp.match(line).group(1))\n
\n soup wrap:

I'm convinced Zach's answer is on the right track. Out of curiosity, I've implemented another version (incorporating Zach's comments about using a dict instead of bisect) and folded it into a solution that matches your example.

#!/usr/bin/env python
import re
from trieMatch import PrefixMatch # https://gist.github.com/736416

pm = PrefixMatch(['YELLOW', 'GREEN', 'RED', ]) # huge list of 10 000 members
# if list is static, it might be worth picking "pm" to avoid rebuilding each time

f = open("huge_file.txt", "r") ## file with > 100 000 lines
lines = f.readlines()
f.close()

regexp = re.compile(r'^.*?fruit=([A-Z]+)')
filtered = (line for line in lines if pm.match(regexp.match(line).group(1)))

For brevity, implementation of PrefixMatch is published here.

If your list of necessary prefixes is static or changes infrequently, you can speed up subsequent runs by pickling and reusing the PickleMatch object instead of rebuilding it each time.

update (on sorted results)

According to the changelog for Python 2.4:

key should be a single-parameter function that takes a list element and returns a comparison key for the element. The list is then sorted using the comparison keys.

also, in the source code, line 1792:

/* Special wrapper to support stable sorting using the decorate-sort-undecorate
   pattern.  Holds a key which is used for comparisons and the original record
   which is returned during the undecorate phase.  By exposing only the key
   .... */

This means that your regex pattern is only evaluated once for each entry (not once for each compare), hence it should not be too expensive to do:

sorted_generator = sorted(filtered, key=regexp.match(line).group(1))
qid & accept id: (4402383, 4402447) query: Split string into array with many char pro items soup:
>>> s = 'hello world'\n>>> [s[i:i+3] for i in range(len(s)) if not i % 3]\n['hel', 'lo ', 'wor', 'ld']\n
\n

For a more general solution (i.e. custom-defined splits), try this function:

\n
def split_on_parts(s, *parts):\n    total = 0\n    buildstr = []\n    for p in parts:\n        buildstr.append(s[total:total+p])\n        total += p\n    return buildstr\n\ns = 'hello world'\nprint split_on_parts(s, 3, 3, 3, 3)\nprint split_on_parts(s, 4, 3, 4)\n
\n

Which produces the output:

\n
['hel', 'lo ', 'wor', 'ld']\n['hell', 'o w', 'orld']\n
\n

OR if you're really in the mood for a one-liner:

\n
def split_on_parts(s, *parts):\n    return [s[sum(parts[:p]):sum(parts[:p+1])] for p in range(len(parts))]\n
\n soup wrap:
>>> s = 'hello world'
>>> [s[i:i+3] for i in range(len(s)) if not i % 3]
['hel', 'lo ', 'wor', 'ld']

For a more general solution (i.e. custom-defined splits), try this function:

def split_on_parts(s, *parts):
    total = 0
    buildstr = []
    for p in parts:
        buildstr.append(s[total:total+p])
        total += p
    return buildstr

s = 'hello world'
print split_on_parts(s, 3, 3, 3, 3)
print split_on_parts(s, 4, 3, 4)

Which produces the output:

['hel', 'lo ', 'wor', 'ld']
['hell', 'o w', 'orld']

OR if you're really in the mood for a one-liner:

def split_on_parts(s, *parts):
    return [s[sum(parts[:p]):sum(parts[:p+1])] for p in range(len(parts))]
qid & accept id: (4413798, 4413827) query: python restart the program after running a method soup:
while True:\n    #this is the menu\n    menu=input("What would you like to do?\ntype 1 for method1 or 2 for method2: ")\n    if(menu=="1"):\n        method1()\n    if(menu=="2"):\n        method2()\n
\n

If the endless loop "doesn't feel right", ask yourself when and why it should end. Should you have a third input option that exits the loop? Then add:

\n
if menu == "3":\n    break\n
\n soup wrap:
while True:
    #this is the menu
    menu=input("What would you like to do?\ntype 1 for method1 or 2 for method2: ")
    if(menu=="1"):
        method1()
    if(menu=="2"):
        method2()

If the endless loop "doesn't feel right", ask yourself when and why it should end. Should you have a third input option that exits the loop? Then add:

if menu == "3":
    break
qid & accept id: (4416013, 4416083) query: Beautiful Soup [Python] and the extracting of text in a table soup:

First find the table (as you are doing). Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list):

\n
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})\n
\n

Then use find again to find the first td:

\n
first_td = table.find('td')\n
\n

Then use renderContents() to extract the textual contents:

\n
text = first_td.renderContents()\n
\n

... and the job is done (though you may also want to use strip() to remove leading and trailing spaces:

\n
trimmed_text = text.strip()\n
\n

This should give:

\n
>>> print trimmed_text\nThis is a sample text\n>>>\n
\n

as desired.

\n soup wrap:

First find the table (as you are doing). Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list):

table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})

Then use find again to find the first td:

first_td = table.find('td')

Then use renderContents() to extract the textual contents:

text = first_td.renderContents()

... and the job is done (though you may also want to use strip() to remove leading and trailing spaces:

trimmed_text = text.strip()

This should give:

>>> print trimmed_text
This is a sample text
>>>

as desired.

qid & accept id: (4484985, 4484992) query: Extract data from HTML in PHP or Python soup:

A good place to start looking would be the python module BeautifulSoup which extracts the text and places it into a table.

\n

Assuming you've loaded the data into a variable called raw:

\n
from BeautifulSoup import BeautifulSoup\nsoup = BeautifulSoup(raw)\n\nfor x in soup.findAll("html:td"):\n   if x.string == "Equity share capital":\n       VALS = [y.string for y in x.parent.findAll() if y.has_key("class")]\n\nprint VALS\n
\n

This gives:

\n
[u'30.36', u'17.17', u'15.22', u'9.82', u'9.82']\n
\n

Which you'll note is a list of unicode strings, make sure to convert them to whatever type you desire before processing.

\n

There are many ways to do this via BeautifulSoup. The nice thing I've found however is the quick hack is often good enough (TM) to get the job done!

\n soup wrap:

A good place to start looking would be the python module BeautifulSoup which extracts the text and places it into a table.

Assuming you've loaded the data into a variable called raw:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(raw)

for x in soup.findAll("html:td"):
   if x.string == "Equity share capital":
       VALS = [y.string for y in x.parent.findAll() if y.has_key("class")]

print VALS

This gives:

[u'30.36', u'17.17', u'15.22', u'9.82', u'9.82']

Which you'll note is a list of unicode strings, make sure to convert them to whatever type you desire before processing.

There are many ways to do this via BeautifulSoup. The nice thing I've found however is the quick hack is often good enough (TM) to get the job done!

qid & accept id: (4534486, 4534526) query: finding the greatest Fibonacci number within limited time in python soup:

Use the timeit module to time the function:

\n
import timeit\n\ndef fib(x):\n    if x==0 or x==1: return 1\n    else: return fib(x-1)+fib(x-2)\n\nprint timeit.Timer('fib(5)', 'from __main__ import fib').timeit()\n
\n

Output:

\n
3.12172317505\n
\n

To directly answer the question in the title, you can use time.time() to get the current time since the epoch in seconds and keep calculating the subsequent fibonacci number until the time limit is reached. I've chosen to use an efficient method of computing fibonacci numbers below to give you a better demonstrating of this concept.

\n
def fibTimeLimited(limit):\n  start = time.time()\n  n, f0, f1 = 1, 0, 1\n  while time.time() < start + limit:\n    n += 1\n    f0, f1 = f1, f0+f1\n  return (n, f1)\n
\n

Sample output:

\n
Calculated 1st fibonacci number as 1 in 0.000001 seconds\nCalculated 31st fibonacci number as 1346269 in 0.000010 seconds\nCalculated 294th fibonacci number as 12384578529797304192493293627316781267732493780359086838016392 in 0.000100 seconds\n
\n soup wrap:

Use the timeit module to time the function:

import timeit

def fib(x):
    if x==0 or x==1: return 1
    else: return fib(x-1)+fib(x-2)

print timeit.Timer('fib(5)', 'from __main__ import fib').timeit()

Output:

3.12172317505

To directly answer the question in the title, you can use time.time() to get the current time since the epoch in seconds and keep calculating the subsequent fibonacci number until the time limit is reached. I've chosen to use an efficient method of computing fibonacci numbers below to give you a better demonstrating of this concept.

def fibTimeLimited(limit):
  start = time.time()
  n, f0, f1 = 1, 0, 1
  while time.time() < start + limit:
    n += 1
    f0, f1 = f1, f0+f1
  return (n, f1)

Sample output:

Calculated 1st fibonacci number as 1 in 0.000001 seconds
Calculated 31st fibonacci number as 1346269 in 0.000010 seconds
Calculated 294th fibonacci number as 12384578529797304192493293627316781267732493780359086838016392 in 0.000100 seconds
qid & accept id: (4554767, 4565650) query: Terminating subprocess in python soup:

You can use psutil to find out about child processes, e.g. in pseudo-code:

\n
p = Popen(...)\npp = psutil.Process(p.pid)\nfor child in pp.get_children():\n    child.send_signal(signal.SIGINT)\n
\n

Note the difference in processes when running without --reload, obtained using ps -ef | grep manage.py | grep -v grep:

\n
vinay 7864 7795  9 22:10 pts/0 00:00:00 python ./manage.py runserver\nvinay 7865 7864 16 22:10 pts/0 00:00:00 /usr/bin/python ./manage.py runserver\n
\n

compared with using the --noreload option:

\n
vinay 7874 7795  7 22:10 pts/0 00:00:00 python ./manage.py runserver --noreload\n
\n soup wrap:

You can use psutil to find out about child processes, e.g. in pseudo-code:

p = Popen(...)
pp = psutil.Process(p.pid)
for child in pp.get_children():
    child.send_signal(signal.SIGINT)

Note the difference in processes when running without --reload, obtained using ps -ef | grep manage.py | grep -v grep:

vinay 7864 7795  9 22:10 pts/0 00:00:00 python ./manage.py runserver
vinay 7865 7864 16 22:10 pts/0 00:00:00 /usr/bin/python ./manage.py runserver

compared with using the --noreload option:

vinay 7874 7795  7 22:10 pts/0 00:00:00 python ./manage.py runserver --noreload
qid & accept id: (4631601, 4631640) query: Making an object's attributes iterable soup:

I warn against doing this. There are rare exceptions where it's warranted, but almost all the time it's better avoiding this sort of hackish solution. If you want to though, you could use vars() to get a dictionary of attributes and iterate through it. As @Nick points out below, App Engine uses properties instead of values to define its members so you have to use getattr() to get their values.

\n
results = q.fetch(5)\nfor p in results:\n    for attribute in vars(p).keys()\n        print '%s = %s' % (attribute, str(getattr(p, attribute)))\n
\n

Demonstration of what vars() does:

\n
>>> class A:\n...     def __init__(self, a, b):\n...         self.a = a\n...         self.b = b\n... \n>>> a = A(1, 2)\n>>> vars(a)\n{'a': 1, 'b': 2}\n>>> for attribute in vars(a).keys():\n...     print '%s = %s' % (attribute, str(getattr(a, attribute)))\n... \na = 1\nb = 2\n
\n soup wrap:

I warn against doing this. There are rare exceptions where it's warranted, but almost all the time it's better avoiding this sort of hackish solution. If you want to though, you could use vars() to get a dictionary of attributes and iterate through it. As @Nick points out below, App Engine uses properties instead of values to define its members so you have to use getattr() to get their values.

results = q.fetch(5)
for p in results:
    for attribute in vars(p).keys()
        print '%s = %s' % (attribute, str(getattr(p, attribute)))

Demonstration of what vars() does:

>>> class A:
...     def __init__(self, a, b):
...         self.a = a
...         self.b = b
... 
>>> a = A(1, 2)
>>> vars(a)
{'a': 1, 'b': 2}
>>> for attribute in vars(a).keys():
...     print '%s = %s' % (attribute, str(getattr(a, attribute)))
... 
a = 1
b = 2
qid & accept id: (4659579, 4660395) query: How to see traceback on xmlrpc server, not client? soup:

You can do something like this:

\n
from SimpleXMLRPCServer import SimpleXMLRPCServer, SimpleXMLRPCRequestHandler\n\nport = 9999\n\ndef func():\n    print 'Hi!'\n    print x # error!\n    print 'Bye!'\n\nclass Handler(SimpleXMLRPCRequestHandler):\n     def _dispatch(self, method, params):\n         try: \n             return self.server.funcs[method](*params)\n         except:\n             import traceback\n             traceback.print_exc()\n             raise\n\n\nif __name__ == '__main__':\n    server = SimpleXMLRPCServer(("localhost", port), Handler)\n    server.register_function(func)\n    print "Listening on port %s..." % port\n    server.serve_forever()\n
\n

Traceback server side:

\n
Listening on port 9999...\nHi!\nTraceback (most recent call last):\n  File "xml.py", line 13, in _dispatch\n    value = self.server.funcs[method](*params)\n  File "xml.py", line 7, in func\n    print x # error!\nNameError: global name 'x' is not defined\nlocalhost - - [11/Jan/2011 17:13:16] "POST /RPC2 HTTP/1.0" 200 \n
\n soup wrap:

You can do something like this:

from SimpleXMLRPCServer import SimpleXMLRPCServer, SimpleXMLRPCRequestHandler

port = 9999

def func():
    print 'Hi!'
    print x # error!
    print 'Bye!'

class Handler(SimpleXMLRPCRequestHandler):
     def _dispatch(self, method, params):
         try: 
             return self.server.funcs[method](*params)
         except:
             import traceback
             traceback.print_exc()
             raise


if __name__ == '__main__':
    server = SimpleXMLRPCServer(("localhost", port), Handler)
    server.register_function(func)
    print "Listening on port %s..." % port
    server.serve_forever()

Traceback server side:

Listening on port 9999...
Hi!
Traceback (most recent call last):
  File "xml.py", line 13, in _dispatch
    value = self.server.funcs[method](*params)
  File "xml.py", line 7, in func
    print x # error!
NameError: global name 'x' is not defined
localhost - - [11/Jan/2011 17:13:16] "POST /RPC2 HTTP/1.0" 200 
qid & accept id: (4660250, 4660332) query: How to return every 5 items from a list in python? soup:

Including the padding, this might work. (There are list comprehensions in 2.1, right? Just looked it up -- they were added in 2.0.)

\n
a = the_list\na += [0] * (-len(a) % 5)\nresult = [a[i:i + 5] for i in range(0, len(a), 5)]\n
\n

In less ancient Python, I would replace the last line by

\n
result = zip(*[iter(a)] * 5)\n
\n soup wrap:

Including the padding, this might work. (There are list comprehensions in 2.1, right? Just looked it up -- they were added in 2.0.)

a = the_list
a += [0] * (-len(a) % 5)
result = [a[i:i + 5] for i in range(0, len(a), 5)]

In less ancient Python, I would replace the last line by

result = zip(*[iter(a)] * 5)
qid & accept id: (4696418, 4696492) query: Regex to extract all URLs from a page soup:

HTML is not a regular language, and thus cannot be parsed by regular expressions.

\n

It's possible to make reasonable guesses using regular expressions, and/or to recognize a restricted subset of URIs, but that way lies madness (lengthy debugging processes, inaccurate results).

\n

That said, if you're willing to go that path, see John Gruber's regex for the purpose:

\n
def extract_urls(your_text):\n  url_re = re.compile(r'\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))')\n  for match in url_re.finditer(your_text):\n    yield match.group(0)\n
\n

This can be used as follows:

\n
>>> for uri in extract_urls('http://foo.bar/baz irc://freenode.org/bash'):\n...   print uri\nhttp://foo.bar/\nirc://freenode.org\n
\n soup wrap:

HTML is not a regular language, and thus cannot be parsed by regular expressions.

It's possible to make reasonable guesses using regular expressions, and/or to recognize a restricted subset of URIs, but that way lies madness (lengthy debugging processes, inaccurate results).

That said, if you're willing to go that path, see John Gruber's regex for the purpose:

def extract_urls(your_text):
  url_re = re.compile(r'\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))')
  for match in url_re.finditer(your_text):
    yield match.group(0)

This can be used as follows:

>>> for uri in extract_urls('http://foo.bar/baz irc://freenode.org/bash'):
...   print uri
http://foo.bar/
irc://freenode.org
qid & accept id: (4702518, 4705131) query: How to access members of an rdf list with rdflib (or plain sparql) soup:

rdf containers are a pain in general, quite annoying to handle them. I am posting two solutions one without SPARQL and another wit SPARQL. Personally I prefer the second one, the one that uses SPARQL.

\n

Example 1: without SPARQL

\n

To get all the authors for a given article like in your case you could do \nsomething like the code I am posting below.

\n

I have added comments so that is self-explains. The most important bit\nis the use of g.triple(triple_pattern) with this graph function basically\nyou can filter an rdflib Graph and search for the triple patterns you need.

\n

When an rdf:Seq is parsed then predicates of the form :

\n

http://www.w3.org/1999/02/22-rdf-syntax-ns#_1

\n

http://www.w3.org/1999/02/22-rdf-syntax-ns#_2

\n

http://www.w3.org/1999/02/22-rdf-syntax-ns#_3

\n

are created, rdflib retrieve them in random order so you need to sort them to\n traverse them in the right order.

\n
import rdflib\n\nRDF = rdflib.namespace.RDF\n\n#Parse the file\ng = rdflib.Graph()\ng.parse("zot.rdf")\n\n#So that we are sure we get something back\nprint "Number of triples",len(g)\n\n#Couple of handy namespaces to use later\nBIB = rdflib.Namespace("http://purl.org/net/biblio#")\nFOAF = rdflib.Namespace("http://xmlns.com/foaf/0.1/")\n\n#Author counter to print at the bottom\ni=0\n\n#Article for wich we want the list of authors\narticle = rdflib.term.URIRef("http://www.ncbi.nlm.nih.gov/pubmed/18273724")\n\n#First loop filters is equivalent to "get all authors for article x" \nfor triple in g.triples((article,BIB["authors"],None)):\n\n    #This expresions removes the rdf:type predicate cause we only want the bnodes\n    # of the form http://www.w3.org/1999/02/22-rdf-syntax-ns#_SEQ_NUMBER\n    # where SEQ_NUMBER is the index of the element in the rdf:Seq\n    list_triples = filter(lambda y: RDF['type'] != y[1], g.triples((triple[2],None,None)))\n\n    #We sort the authors by the predicate of the triple - order in sequences do matter ;-)\n    # so "http://www.w3.org/1999/02/22-rdf-syntax-ns#_435"[44:] returns 435\n    # and since we want numberic order we do int(x[1][44:]) - (BTW x[1] is the predicate)\n    authors_sorted =  sorted(list_triples,key=lambda x: int(x[1][44:]))\n\n    #We iterate the authors bNodes and we get surname and givenname\n    for author_bnode in authors_sorted:\n        for x in g.triples((author_bnode[2],FOAF['surname'],None)):\n            author_surname = x[2]\n        for y in g.triples((author_bnode[2],FOAF['givenname'],None)):\n            author_name = y[2]\n        print "author(%s): %s %s"%(i,author_name,author_surname)\n        i += 1\n
\n

This example shows how to do this without using SPARQL.

\n

Example 2: With SPARQL

\n

Now there is exactly the same example but using SPARQL.

\n
rdflib.plugin.register('sparql', rdflib.query.Processor,\n                       'rdfextras.sparql.processor', 'Processor')\nrdflib.plugin.register('sparql', rdflib.query.Result,\n                       'rdfextras.sparql.query', 'SPARQLQueryResult')\n\nquery = """\nSELECT ?seq_index ?name ?surname WHERE {\n      bib:authors ?seq .\n     ?seq ?seq_index ?seq_bnode .\n     ?seq_bnode foaf:givenname ?name .\n     ?seq_bnode foaf:surname ?surname .\n}\n"""\nfor row in sorted(g.query(query, initNs=dict(rdf=RDF,foaf=FOAF,bib=BIB)),\n                                                  key=lambda x:int(x[0][44:])):\n    print "Author(%s) %s %s"%(row[0][44:],row[1],row[2])\n
\n

As it shows we still have to do the sorting thing because the library doesn't handle it by itself. In the query the variable seq_index holds the predicate that contains the information about the sequence order and that is the one to do the sort in the lambda function.

\n soup wrap:

rdf containers are a pain in general, quite annoying to handle them. I am posting two solutions one without SPARQL and another wit SPARQL. Personally I prefer the second one, the one that uses SPARQL.

Example 1: without SPARQL

To get all the authors for a given article like in your case you could do something like the code I am posting below.

I have added comments so that is self-explains. The most important bit is the use of g.triple(triple_pattern) with this graph function basically you can filter an rdflib Graph and search for the triple patterns you need.

When an rdf:Seq is parsed then predicates of the form :

http://www.w3.org/1999/02/22-rdf-syntax-ns#_1

http://www.w3.org/1999/02/22-rdf-syntax-ns#_2

http://www.w3.org/1999/02/22-rdf-syntax-ns#_3

are created, rdflib retrieve them in random order so you need to sort them to traverse them in the right order.

import rdflib

RDF = rdflib.namespace.RDF

#Parse the file
g = rdflib.Graph()
g.parse("zot.rdf")

#So that we are sure we get something back
print "Number of triples",len(g)

#Couple of handy namespaces to use later
BIB = rdflib.Namespace("http://purl.org/net/biblio#")
FOAF = rdflib.Namespace("http://xmlns.com/foaf/0.1/")

#Author counter to print at the bottom
i=0

#Article for wich we want the list of authors
article = rdflib.term.URIRef("http://www.ncbi.nlm.nih.gov/pubmed/18273724")

#First loop filters is equivalent to "get all authors for article x" 
for triple in g.triples((article,BIB["authors"],None)):

    #This expresions removes the rdf:type predicate cause we only want the bnodes
    # of the form http://www.w3.org/1999/02/22-rdf-syntax-ns#_SEQ_NUMBER
    # where SEQ_NUMBER is the index of the element in the rdf:Seq
    list_triples = filter(lambda y: RDF['type'] != y[1], g.triples((triple[2],None,None)))

    #We sort the authors by the predicate of the triple - order in sequences do matter ;-)
    # so "http://www.w3.org/1999/02/22-rdf-syntax-ns#_435"[44:] returns 435
    # and since we want numberic order we do int(x[1][44:]) - (BTW x[1] is the predicate)
    authors_sorted =  sorted(list_triples,key=lambda x: int(x[1][44:]))

    #We iterate the authors bNodes and we get surname and givenname
    for author_bnode in authors_sorted:
        for x in g.triples((author_bnode[2],FOAF['surname'],None)):
            author_surname = x[2]
        for y in g.triples((author_bnode[2],FOAF['givenname'],None)):
            author_name = y[2]
        print "author(%s): %s %s"%(i,author_name,author_surname)
        i += 1

This example shows how to do this without using SPARQL.

Example 2: With SPARQL

Now there is exactly the same example but using SPARQL.

rdflib.plugin.register('sparql', rdflib.query.Processor,
                       'rdfextras.sparql.processor', 'Processor')
rdflib.plugin.register('sparql', rdflib.query.Result,
                       'rdfextras.sparql.query', 'SPARQLQueryResult')

query = """
SELECT ?seq_index ?name ?surname WHERE {
      bib:authors ?seq .
     ?seq ?seq_index ?seq_bnode .
     ?seq_bnode foaf:givenname ?name .
     ?seq_bnode foaf:surname ?surname .
}
"""
for row in sorted(g.query(query, initNs=dict(rdf=RDF,foaf=FOAF,bib=BIB)),
                                                  key=lambda x:int(x[0][44:])):
    print "Author(%s) %s %s"%(row[0][44:],row[1],row[2])

As it shows we still have to do the sorting thing because the library doesn't handle it by itself. In the query the variable seq_index holds the predicate that contains the information about the sequence order and that is the one to do the sort in the lambda function.

qid & accept id: (4787291, 4787804) query: Dynamic importing of modules followed by instantiation of objects with a certain baseclass from said modules soup:

You might do something like this:

\n
for c in candidates:\n    modname = os.path.splitext(c)[0]\n    try:\n        module=__import__(modname)   #<-- You can get the module this way\n    except (ImportError,NotImplementedError):\n        continue\n    for cls in dir(module):          #<-- Loop over all objects in the module's namespace\n        cls=getattr(module,cls)\n        if (inspect.isclass(cls)                # Make sure it is a class \n            and inspect.getmodule(cls)==module  # Make sure it was defined in module, not just imported\n            and issubclass(cls,base)):          # Make sure it is a subclass of base\n            # print('found in {f}: {c}'.format(f=module.__name__,c=cls))\n            classList.append(cls)\n
\n

To test the above, I had to modify your code a bit; below is the full script.

\n
import sys\nimport inspect\nimport os\n\nclass PluginBase(object): pass\n\ndef search(base):\n    for root, dirs, files in os.walk('.'):\n        candidates = [fname for fname in files if fname.endswith('.py') \n                      and not fname.startswith('__')]\n        classList=[]\n        if candidates:\n            for c in candidates:\n                modname = os.path.splitext(c)[0]\n                try:\n                    module=__import__(modname)\n                except (ImportError,NotImplementedError):\n                    continue\n                for cls in dir(module):\n                    cls=getattr(module,cls)\n                    if (inspect.isclass(cls)\n                        and inspect.getmodule(cls)==module\n                        and issubclass(cls,base)):\n                        # print('found in {f}: {c}'.format(f=module.__name__,c=cls))\n                        classList.append(cls)\n        print(classList)\n\nsearch(PluginBase)\n
\n soup wrap:

You might do something like this:

for c in candidates:
    modname = os.path.splitext(c)[0]
    try:
        module=__import__(modname)   #<-- You can get the module this way
    except (ImportError,NotImplementedError):
        continue
    for cls in dir(module):          #<-- Loop over all objects in the module's namespace
        cls=getattr(module,cls)
        if (inspect.isclass(cls)                # Make sure it is a class 
            and inspect.getmodule(cls)==module  # Make sure it was defined in module, not just imported
            and issubclass(cls,base)):          # Make sure it is a subclass of base
            # print('found in {f}: {c}'.format(f=module.__name__,c=cls))
            classList.append(cls)

To test the above, I had to modify your code a bit; below is the full script.

import sys
import inspect
import os

class PluginBase(object): pass

def search(base):
    for root, dirs, files in os.walk('.'):
        candidates = [fname for fname in files if fname.endswith('.py') 
                      and not fname.startswith('__')]
        classList=[]
        if candidates:
            for c in candidates:
                modname = os.path.splitext(c)[0]
                try:
                    module=__import__(modname)
                except (ImportError,NotImplementedError):
                    continue
                for cls in dir(module):
                    cls=getattr(module,cls)
                    if (inspect.isclass(cls)
                        and inspect.getmodule(cls)==module
                        and issubclass(cls,base)):
                        # print('found in {f}: {c}'.format(f=module.__name__,c=cls))
                        classList.append(cls)
        print(classList)

search(PluginBase)
qid & accept id: (4791080, 4791169) query: Delete newline / return carriage in file output soup:
>>> string = "testing\n"\n>>> string\n'testing\n'\n>>> string = string[:-1]\n>>> string\n'testing'\n
\n

This basically says "chop off the last thing in the string" The : is the "slice" operator. It would be a good idea to read up on how it works as it is very useful.

\n

EDIT

\n

I just read your updated question. I think I understand now. You have a file, like this:

\n
aqua:test$ cat wordlist.txt \nTesting\n\nThis\n\nWordlist\n\nWith\n\nReturns\n\nBetween\n\nLines\n
\n

and you want to get rid of the empty lines. Instead of modifying the file while you're reading from it, create a new file that you can write the non-empty lines from the old file into, like so:

\n
# script    \nrf = open("wordlist.txt")\nwf = open("newwordlist.txt","w")\nfor line in rf:\n    newline = line.rstrip('\r\n')\n    wf.write(newline)\n    wf.write('\n')  # remove to leave out line breaks\nrf.close()\nwf.close()\n
\n

You should get:

\n
aqua:test$ cat newwordlist.txt \nTesting\nThis\nWordlist\nWith\nReturns\nBetween\nLines\n
\n

If you want something like

\n
TestingThisWordlistWithReturnsBetweenLines\n
\n

just comment out

\n
wf.write('\n')\n
\n soup wrap:
>>> string = "testing\n"
>>> string
'testing\n'
>>> string = string[:-1]
>>> string
'testing'

This basically says "chop off the last thing in the string" The : is the "slice" operator. It would be a good idea to read up on how it works as it is very useful.

EDIT

I just read your updated question. I think I understand now. You have a file, like this:

aqua:test$ cat wordlist.txt 
Testing

This

Wordlist

With

Returns

Between

Lines

and you want to get rid of the empty lines. Instead of modifying the file while you're reading from it, create a new file that you can write the non-empty lines from the old file into, like so:

# script    
rf = open("wordlist.txt")
wf = open("newwordlist.txt","w")
for line in rf:
    newline = line.rstrip('\r\n')
    wf.write(newline)
    wf.write('\n')  # remove to leave out line breaks
rf.close()
wf.close()

You should get:

aqua:test$ cat newwordlist.txt 
Testing
This
Wordlist
With
Returns
Between
Lines

If you want something like

TestingThisWordlistWithReturnsBetweenLines

just comment out

wf.write('\n')
qid & accept id: (4797704, 4798493) query: Webpy: how to set http status code to 300 soup:

The way web.py does this for 301 and other redirect types is by subclassing web.HTTPError (which in turn sets web.ctx.status). For example:

\n
class MultipleChoices(web.HTTPError):\n    def __init__(self, choices):\n        status = '300 Multiple Choices'\n        headers = {'Content-Type': 'text/html'}\n        data = '

Multiple Choices

\n
    \n'\n data += ''.join('
  • {0}
  • \n'.format(c)\n for c in choices)\n data += '
'\n web.HTTPError.__init__(self, status, headers, data)\n
\n

Then to output this status code you raise MultipleChoices in your handler:

\n
class MyHandler:\n    def GET(self):\n        raise MultipleChoices(['http://example.com/', 'http://www.google.com/'])\n
\n

It'll need tuning for your particular unAPI application, of course.

\n

See also the source for web.HTTPError in webapi.py.

\n soup wrap:

The way web.py does this for 301 and other redirect types is by subclassing web.HTTPError (which in turn sets web.ctx.status). For example:

class MultipleChoices(web.HTTPError):
    def __init__(self, choices):
        status = '300 Multiple Choices'
        headers = {'Content-Type': 'text/html'}
        data = '

Multiple Choices

\n
    \n' data += ''.join('
  • {0}
  • \n'.format(c) for c in choices) data += '
' web.HTTPError.__init__(self, status, headers, data)

Then to output this status code you raise MultipleChoices in your handler:

class MyHandler:
    def GET(self):
        raise MultipleChoices(['http://example.com/', 'http://www.google.com/'])

It'll need tuning for your particular unAPI application, of course.

See also the source for web.HTTPError in webapi.py.

qid & accept id: (4808753, 4809350) query: Find occurrence using multiple attributes in ElementTree/Python soup:

This depends on what version you're using. If you have ElementTree 1.3+ (including in Python 2.7 standard library) you can use a basic xpath expression, as described in the docs, like [@attrib=’value’]:

\n
x = ElmentTree(file='testdata.xml')\ncases = x.findall(".//testcase[@name='VHDL_BUILD_Passthrough'][@classname='TestOne']"\n
\n

Unfortunately if you're using an earlier version of ElementTree (1.2, included in standard library for python 2.5 and 2.6) you can't use that convenience and need to filter yourself.

\n
x = ElmentTree(file='testdata.xml')\nallcases = x12.findall(".//testcase")\ncases = [c for c in allcases if c.get('classname') == 'TestOne' and c.get('name') == 'VHDL_BUILD_Passthrough']\n
\n soup wrap:

This depends on what version you're using. If you have ElementTree 1.3+ (including in Python 2.7 standard library) you can use a basic xpath expression, as described in the docs, like [@attrib=’value’]:

x = ElmentTree(file='testdata.xml')
cases = x.findall(".//testcase[@name='VHDL_BUILD_Passthrough'][@classname='TestOne']"

Unfortunately if you're using an earlier version of ElementTree (1.2, included in standard library for python 2.5 and 2.6) you can't use that convenience and need to filter yourself.

x = ElmentTree(file='testdata.xml')
allcases = x12.findall(".//testcase")
cases = [c for c in allcases if c.get('classname') == 'TestOne' and c.get('name') == 'VHDL_BUILD_Passthrough']
qid & accept id: (4867037, 4867340) query: Django: css referencing media in static files (django dev / 1.3 / static files) soup:

You said you had trouble with relative paths, but I don't understand exactly what you meant.

\n

I ran into the same issue, and I've used relative paths to solve it. The only thing to keep in mind is that when deploying the images need to (obviously) remain in the same path relative to the CSS files.

\n

My setup in a nutshell:

\n

Note I'm still using django-staticfiles with Django 1.2, but it should work similarly for Django 1.3

\n
STATIC_URL = "/site_media/static/"\nSTATIC_ROOT = os.path.join(PROJECT_ROOT, "site_media", "static")\nSTATICFILES_DIRS = (\n    os.path.join(PROJECT_ROOT, "static_media"),\n)\n
\n

Then I serve the CSS from {{ STATIC_URL }}css/style.css which references images at ../images/logo.png.

\n

and my project looks like this:

\n
project_dir\n  ...\n  stuff\n  static_media\n    ...\n    css\n    images\n
\n

Let me know if you have any questions, and I'll clarify.

\n soup wrap:

You said you had trouble with relative paths, but I don't understand exactly what you meant.

I ran into the same issue, and I've used relative paths to solve it. The only thing to keep in mind is that when deploying the images need to (obviously) remain in the same path relative to the CSS files.

My setup in a nutshell:

Note I'm still using django-staticfiles with Django 1.2, but it should work similarly for Django 1.3

STATIC_URL = "/site_media/static/"
STATIC_ROOT = os.path.join(PROJECT_ROOT, "site_media", "static")
STATICFILES_DIRS = (
    os.path.join(PROJECT_ROOT, "static_media"),
)

Then I serve the CSS from {{ STATIC_URL }}css/style.css which references images at ../images/logo.png.

and my project looks like this:

project_dir
  ...
  stuff
  static_media
    ...
    css
    images

Let me know if you have any questions, and I'll clarify.

qid & accept id: (4868900, 4869131) query: How do I store multiple copies of the same field in Django? soup:

It sounds like the best way would be via many to many relationships, like this:

\n
class author(models.Model):\n    # fields?\n\nclass language(models.Model):\n    iso_lang_code = models.CharField() # probably need some constraints here\n\nclass resource(models.Model):\n    name = models.CharField()\n    authors = models.ManyToManyField(Author)\n    languages = models.ManyToManyField(Language)\n
\n

Then when it comes to create a resource, you simply do:

\n
r = resource(name="")\na1 = author(name="ninefingers")\na2 = author(name="jon skeet", type="god")\nr.authors.add(a1)\nr.authors.add(a2)\nenglish = languages.objects.get(iso_lang_code="en-GB")\nr.add(english)\nr.save()\n
\n

And you can also do some really fancy stuff like:

\n
english = languages.objects.get(iso_lang_code="en-GB")\nresourcesinenglish = english.resource_set.all()\n\nfor r in resourcesinenglish:\n    # do something on r.\n
\n

So using the ORM this way is really powerful. Yes, you basically end up with an ISO list of languages in an SQL table, but is that a problem? If so, you could always replace it with a \nstring and use objects.filter(language='en-GB') which (roughly) translates to the sql of \nWHERE language='en-GB'. Of course, you are then limited to one language only.

\n

Another approach might be to write all the languages as ISO codes modified by a splitter, say ; then do

\n
r = resource.objects.get(id=701)\nlangs = r.languages.split(';')\nfor l in language:\n    print l\n
\n

Of course, maintaining said list becomes more difficult that way. I think the ORM is easier by far.

\n

As for more complex types like Authors the ORM is by far the easiest way to go.

\n

Note that if you're concerned about the number of database requests this is creating, you can always use select_near. This does exactly what it sounds like - follows all foreign keys, so your database gets hit one massively and then is left alone as the objects are then in memory (cached).

\n soup wrap:

It sounds like the best way would be via many to many relationships, like this:

class author(models.Model):
    # fields?

class language(models.Model):
    iso_lang_code = models.CharField() # probably need some constraints here

class resource(models.Model):
    name = models.CharField()
    authors = models.ManyToManyField(Author)
    languages = models.ManyToManyField(Language)

Then when it comes to create a resource, you simply do:

r = resource(name="")
a1 = author(name="ninefingers")
a2 = author(name="jon skeet", type="god")
r.authors.add(a1)
r.authors.add(a2)
english = languages.objects.get(iso_lang_code="en-GB")
r.add(english)
r.save()

And you can also do some really fancy stuff like:

english = languages.objects.get(iso_lang_code="en-GB")
resourcesinenglish = english.resource_set.all()

for r in resourcesinenglish:
    # do something on r.

So using the ORM this way is really powerful. Yes, you basically end up with an ISO list of languages in an SQL table, but is that a problem? If so, you could always replace it with a string and use objects.filter(language='en-GB') which (roughly) translates to the sql of WHERE language='en-GB'. Of course, you are then limited to one language only.

Another approach might be to write all the languages as ISO codes modified by a splitter, say ; then do

r = resource.objects.get(id=701)
langs = r.languages.split(';')
for l in language:
    print l

Of course, maintaining said list becomes more difficult that way. I think the ORM is easier by far.

As for more complex types like Authors the ORM is by far the easiest way to go.

Note that if you're concerned about the number of database requests this is creating, you can always use select_near. This does exactly what it sounds like - follows all foreign keys, so your database gets hit one massively and then is left alone as the objects are then in memory (cached).

qid & accept id: (4910789, 4912902) query: Getting the row index for a 2D numPy array when multiple column values are known soup:

Here are ways to handle conditions on columns or rows, inspired by the Zen of Python.

\n
In []: import this\nThe Zen of Python, by Tim Peters\n\nBeautiful is better than ugly.\nExplicit is better than implicit.\n...\n
\n

So following the second advice:
\na) conditions on column(s), applied to row(s):

\n
In []: a= arange(12).reshape(3, 4)\nIn []: a\nOut[]:\narray([[ 0,  1,  2,  3],\n       [ 4,  5,  6,  7],\n       [ 8,  9, 10, 11]])\nIn []: a[2, logical_and(1== a[0, :], 5== a[1, :])]+= 12\nIn []: a\nOut[]:\narray([[ 0,  1,  2,  3],\n       [ 4,  5,  6,  7],\n       [ 8, 21, 10, 11]])\n
\n

b) conditions on row(s), applied to column(s):

\n
In []: a= a.T\nIn []: a\nOut[]:\narray([[ 0,  4,  8],\n       [ 1,  5, 21],\n       [ 2,  6, 10],\n       [ 3,  7, 11]])\nIn []: a[logical_and(1== a[:, 0], 5== a[:, 1]), 2]+= 12\nIn []: a\nOut[]:\narray([[ 0,  4,  8],\n       [ 1,  5, 33],\n       [ 2,  6, 10],\n       [ 3,  7, 11]])\n
\n

So I hope this really makes sense to allways be explicit when accessing columns and rows. Code is typically read by people with various backgrounds.

\n soup wrap:

Here are ways to handle conditions on columns or rows, inspired by the Zen of Python.

In []: import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
...

So following the second advice:
a) conditions on column(s), applied to row(s):

In []: a= arange(12).reshape(3, 4)
In []: a
Out[]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In []: a[2, logical_and(1== a[0, :], 5== a[1, :])]+= 12
In []: a
Out[]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8, 21, 10, 11]])

b) conditions on row(s), applied to column(s):

In []: a= a.T
In []: a
Out[]:
array([[ 0,  4,  8],
       [ 1,  5, 21],
       [ 2,  6, 10],
       [ 3,  7, 11]])
In []: a[logical_and(1== a[:, 0], 5== a[:, 1]), 2]+= 12
In []: a
Out[]:
array([[ 0,  4,  8],
       [ 1,  5, 33],
       [ 2,  6, 10],
       [ 3,  7, 11]])

So I hope this really makes sense to allways be explicit when accessing columns and rows. Code is typically read by people with various backgrounds.

qid & accept id: (4936507, 4941976) query: How do I write a logging middleware for pyramid/pylons 2? soup:

Standart middleware

\n
class LoggerMiddleware(object):\n    '''WSGI middleware'''\n\n    def __init__(self, application):\n\n        self.app = application\n\n    def __call__(self, environ, start_response):\n\n        # write logs\n\n        try:\n            return self.app(environ, start_response)\n        except Exception, e:\n            # write logs\n            pass\n        finally:\n            # write logs\n            pass\n
\n

In pyramid creating app code:

\n
from paste.httpserver import serve\nfrom pyramid.response import Response\nfrom pyramid.view import view_config\n\n@view_config()\ndef hello(request):\n    return Response('Hello')\n\nif __name__ == '__main__':\n    from pyramid.config import Configurator\n    config = Configurator()\n    config.scan()\n    app = config.make_wsgi_app()\n\n    # Put middleware\n    app = LoggerMiddleware(app)\n\n    serve(app, host='0.0.0.0')\n
\n soup wrap:

Standart middleware

class LoggerMiddleware(object):
    '''WSGI middleware'''

    def __init__(self, application):

        self.app = application

    def __call__(self, environ, start_response):

        # write logs

        try:
            return self.app(environ, start_response)
        except Exception, e:
            # write logs
            pass
        finally:
            # write logs
            pass

In pyramid creating app code:

from paste.httpserver import serve
from pyramid.response import Response
from pyramid.view import view_config

@view_config()
def hello(request):
    return Response('Hello')

if __name__ == '__main__':
    from pyramid.config import Configurator
    config = Configurator()
    config.scan()
    app = config.make_wsgi_app()

    # Put middleware
    app = LoggerMiddleware(app)

    serve(app, host='0.0.0.0')
qid & accept id: (4951751, 4952238) query: Creating a new corpus with NLTK soup:

I think the PlaintextCorpusReader already segments the input with a punkt tokenizer, at least if your input language is english.

\n

PlainTextCorpusReader's constructor

\n
def __init__(self, root, fileids,\n             word_tokenizer=WordPunctTokenizer(),\n             sent_tokenizer=nltk.data.LazyLoader(\n                 'tokenizers/punkt/english.pickle'),\n             para_block_reader=read_blankline_block,\n             encoding='utf8'):\n
\n

You can pass the reader a word and sentence tokenizer, but for the latter the default already is nltk.data.LazyLoader('tokenizers/punkt/english.pickle').

\n

For a single string, a tokenizer would be used as follows (explained here, see section 5 for punkt tokenizer).

\n
>>> import nltk.data\n>>> text = """\n... Punkt knows that the periods in Mr. Smith and Johann S. Bach\n... do not mark sentence boundaries.  And sometimes sentences\n... can start with non-capitalized words.  i is a good variable\n... name.\n... """\n>>> tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')\n>>> tokenizer.tokenize(text.strip())\n
\n soup wrap:

I think the PlaintextCorpusReader already segments the input with a punkt tokenizer, at least if your input language is english.

PlainTextCorpusReader's constructor

def __init__(self, root, fileids,
             word_tokenizer=WordPunctTokenizer(),
             sent_tokenizer=nltk.data.LazyLoader(
                 'tokenizers/punkt/english.pickle'),
             para_block_reader=read_blankline_block,
             encoding='utf8'):

You can pass the reader a word and sentence tokenizer, but for the latter the default already is nltk.data.LazyLoader('tokenizers/punkt/english.pickle').

For a single string, a tokenizer would be used as follows (explained here, see section 5 for punkt tokenizer).

>>> import nltk.data
>>> text = """
... Punkt knows that the periods in Mr. Smith and Johann S. Bach
... do not mark sentence boundaries.  And sometimes sentences
... can start with non-capitalized words.  i is a good variable
... name.
... """
>>> tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
>>> tokenizer.tokenize(text.strip())
qid & accept id: (4975563, 4975581) query: Radical Use of Admin's Interface soup:

It is entirely possible to do this. You can do this with regular views, and then create templates that extend the "admin/base_site.html" template like so:

\n
{% extends "admin/base_site.html" %}\n
\n

You can also do breadcrumbs like this:

\n
{% block breadcrumbs %}{% if not is_popup %}\n    \n{% endif %}{% endblock %}\n
\n

And then put whatever content you want inside of the "content" block.

\n soup wrap:

It is entirely possible to do this. You can do this with regular views, and then create templates that extend the "admin/base_site.html" template like so:

{% extends "admin/base_site.html" %}

You can also do breadcrumbs like this:

{% block breadcrumbs %}{% if not is_popup %}
    
{% endif %}{% endblock %}

And then put whatever content you want inside of the "content" block.

qid & accept id: (4976964, 4976986) query: how to get unique values set from a repeating values list soup:

I would use Python dictionaries where the dictionary keys are column A values and the dictionary values are Python's built-in Set type holding column B values

\n
def parse_the_file():\n    lower = str.lower\n    split = str.split\n    with open('f.txt') as f:\n        d = {}\n        lines = f.read().split('\n')\n        for A,B in [split(l) for l in lines]:\n            try:\n                d[lower(A)].add(B)\n            except KeyError:\n                d[lower(A)] = set(B)\n\n        for a in d:\n            print "%s - %s" % (a,",".join(list(d[a])))\n\nif __name__ == "__main__":\n    parse_the_file()\n
\n

The advantage of using a dictionary is that you'll have a single dictionary key per column A value. The advantage of using a set is that you'll have a unique set of column B values.

\n

Efficiency notes:

\n
    \n
  • The use of try-catch is more efficient than using an if\else statement to check for initial cases.
  • \n
  • The evaluation and assignment of the str functions outside of the loop is more efficient than simply using them inside the loop.
  • \n
  • Depending on the proportion of new A values vs. reappearance of A values throughout the file, you may consider using a = lower(A) before the try catch statement
  • \n
  • I used a function, as accessing local variables is more efficient in Python than accessing global variables
  • \n
  • Some of these performance tips are from here
  • \n
\n

Testing the code above on your input example yields:

\n
xxxd - 4\nxxxa - 1,3,2\nxxxb - 2\nxxxc - 3\n
\n soup wrap:

I would use Python dictionaries where the dictionary keys are column A values and the dictionary values are Python's built-in Set type holding column B values

def parse_the_file():
    lower = str.lower
    split = str.split
    with open('f.txt') as f:
        d = {}
        lines = f.read().split('\n')
        for A,B in [split(l) for l in lines]:
            try:
                d[lower(A)].add(B)
            except KeyError:
                d[lower(A)] = set(B)

        for a in d:
            print "%s - %s" % (a,",".join(list(d[a])))

if __name__ == "__main__":
    parse_the_file()

The advantage of using a dictionary is that you'll have a single dictionary key per column A value. The advantage of using a set is that you'll have a unique set of column B values.

Efficiency notes:

  • The use of try-catch is more efficient than using an if\else statement to check for initial cases.
  • The evaluation and assignment of the str functions outside of the loop is more efficient than simply using them inside the loop.
  • Depending on the proportion of new A values vs. reappearance of A values throughout the file, you may consider using a = lower(A) before the try catch statement
  • I used a function, as accessing local variables is more efficient in Python than accessing global variables
  • Some of these performance tips are from here

Testing the code above on your input example yields:

xxxd - 4
xxxa - 1,3,2
xxxb - 2
xxxc - 3
qid & accept id: (4981815, 4981918) query: How to remove lines in a Matplotlib plot soup:

I'm showing that a combination of lines.pop(0) l.remove() and del l does the trick.

\n
from matplotlib import pyplot\nimport numpy, weakref\na = numpy.arange(int(1e3))\nfig = pyplot.Figure()\nax  = fig.add_subplot(1, 1, 1)\nlines = ax.plot(a)\n\nl = lines.pop(0)\nwl = weakref.ref(l)  # create a weak reference to see if references still exist\n#                      to this object\nprint wl  # not dead\nl.remove()\nprint wl  # not dead\ndel l\nprint wl  # dead  (remove either of the steps above and this is still live)\n
\n

I checked your large dataset and the release of the memory is confirmed on the system monitor as well.

\n

Of course the simpler way (when not trouble-shooting) would be to pop it from the list and call remove on the line object without creating a hard reference to it:

\n
lines.pop(0).remove()\n
\n soup wrap:

I'm showing that a combination of lines.pop(0) l.remove() and del l does the trick.

from matplotlib import pyplot
import numpy, weakref
a = numpy.arange(int(1e3))
fig = pyplot.Figure()
ax  = fig.add_subplot(1, 1, 1)
lines = ax.plot(a)

l = lines.pop(0)
wl = weakref.ref(l)  # create a weak reference to see if references still exist
#                      to this object
print wl  # not dead
l.remove()
print wl  # not dead
del l
print wl  # dead  (remove either of the steps above and this is still live)

I checked your large dataset and the release of the memory is confirmed on the system monitor as well.

Of course the simpler way (when not trouble-shooting) would be to pop it from the list and call remove on the line object without creating a hard reference to it:

lines.pop(0).remove()
qid & accept id: (5051795, 5051850) query: Truncate the length of a Python dictionary soup:

Do you really to modify the dictionary in-place? You can easily generate a new one (thanks to iterators, without even touching the items you don't need):

\n
OrderedDict(itertools.islice(d.iteritems(), 500))\n
\n

You could also truncate the original one, but that would be less performant for large one and is propably not needed. Semantics are different if someone else is using d, of course.

\n
# can't use .iteritems() as you can't/shouldn't modify something while iterating it\nto_remove = d.keys()[500:] # slice off first 500 keys\nfor key in to_remove:\n    del d[key]\n
\n soup wrap:

Do you really to modify the dictionary in-place? You can easily generate a new one (thanks to iterators, without even touching the items you don't need):

OrderedDict(itertools.islice(d.iteritems(), 500))

You could also truncate the original one, but that would be less performant for large one and is propably not needed. Semantics are different if someone else is using d, of course.

# can't use .iteritems() as you can't/shouldn't modify something while iterating it
to_remove = d.keys()[500:] # slice off first 500 keys
for key in to_remove:
    del d[key]
qid & accept id: (5073624, 5073649) query: in python, how do I check to see if keys in a dictionary all have the same value x? soup:

I will assume you meant the same value:

\n
d = {'a':1, 'b':1, 'c':1}\nlen(set(d.values()))==1    # -> True\n
\n

If you want to check for a specific value, how about

\n
testval = 1\nall(val==testval for val in d.values())   # -> True\n
\n

this code will most often fail early (quickly)

\n soup wrap:

I will assume you meant the same value:

d = {'a':1, 'b':1, 'c':1}
len(set(d.values()))==1    # -> True

If you want to check for a specific value, how about

testval = 1
all(val==testval for val in d.values())   # -> True

this code will most often fail early (quickly)

qid & accept id: (5103329, 5103392) query: How to find out what methods, properties, etc a python module possesses soup:

As for Python modules, you can do

\n
>>> import module\n>>> help(module)\n
\n

and you'll get a list of supported methods (more exactly, you get the docstring, which might not contain every single method). If you want that, you can use

\n
>>> dir(module)\n
\n

although now you'd just get a long list of all properties, methods, classes etc. in that module.

\n

In your first example, you're calling an external program, though. Of course Python has no idea which features wmic.exe has. How should it?

\n soup wrap:

As for Python modules, you can do

>>> import module
>>> help(module)

and you'll get a list of supported methods (more exactly, you get the docstring, which might not contain every single method). If you want that, you can use

>>> dir(module)

although now you'd just get a long list of all properties, methods, classes etc. in that module.

In your first example, you're calling an external program, though. Of course Python has no idea which features wmic.exe has. How should it?

qid & accept id: (5148790, 5148839) query: how to convert value of column defined as character into integer in python soup:

Don't forget to use try/except statements in conversion to avoid surprises like this:

\n
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) \n[GCC 4.4.5] on linux2\nType "help", "copyright", "credits" or "license" for more information.\n>>> a='a'\n>>> int(a)\nTraceback (most recent call last):\n  File "", line 1, in \nValueError: invalid literal for int() with base 10: 'a'\n
\n

Solution:

\n
try:\n    int(myvar)\nexcept ValueError:\n    ...Handle the exception...\n
\n soup wrap:

Don't forget to use try/except statements in conversion to avoid surprises like this:

Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a='a'
>>> int(a)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: invalid literal for int() with base 10: 'a'

Solution:

try:
    int(myvar)
except ValueError:
    ...Handle the exception...
qid & accept id: (5162130, 5162574) query: Elegant way of reducing list by averaging? soup:

what you actually want to do is to apply a moving average of 2 samples trough your list, mathematically you convolve a window of [.5,.5], then take just the even samples. To avoid dividing by two the last element of odd arrays, you should duplicate it, this does not affect even arrays.

\n

Using numpy it gets pretty elegant:

\n
import numpy as np\n\nnp.convolve(a + [a[-1]], [.5,.5], mode='valid')[::2]\narray([  1.,  11.])\n\nnp.convolve(b + [b[-1]], [.5,.5], mode='valid')[::2]\narray([  1.,  11.,  20.])\n
\n

you can convert back to list using list(outputarray).

\n

using numpy is very useful if performance matters, optimized C math code is doing the work:

\n
In [10]: %time a=reduce(list(np.arange(1000000))) #chosen answer\nCPU times: user 6.38 s, sys: 0.08 s, total: 6.46 s\nWall time: 6.39 s\n\nIn [11]: %time c=np.convolve(list(np.arange(1000000)), [.5,.5], mode='valid')[::2]\nCPU times: user 0.59 s, sys: 0.01 s, total: 0.60 s\nWall time: 0.61 s\n
\n soup wrap:

what you actually want to do is to apply a moving average of 2 samples trough your list, mathematically you convolve a window of [.5,.5], then take just the even samples. To avoid dividing by two the last element of odd arrays, you should duplicate it, this does not affect even arrays.

Using numpy it gets pretty elegant:

import numpy as np

np.convolve(a + [a[-1]], [.5,.5], mode='valid')[::2]
array([  1.,  11.])

np.convolve(b + [b[-1]], [.5,.5], mode='valid')[::2]
array([  1.,  11.,  20.])

you can convert back to list using list(outputarray).

using numpy is very useful if performance matters, optimized C math code is doing the work:

In [10]: %time a=reduce(list(np.arange(1000000))) #chosen answer
CPU times: user 6.38 s, sys: 0.08 s, total: 6.46 s
Wall time: 6.39 s

In [11]: %time c=np.convolve(list(np.arange(1000000)), [.5,.5], mode='valid')[::2]
CPU times: user 0.59 s, sys: 0.01 s, total: 0.60 s
Wall time: 0.61 s
qid & accept id: (5185944, 5187895) query: Extract domain from body of email soup:

The cleanest way to do it is with cssselect from lxml.html and urlparse. Here is how:

\n
from lxml import html\nfrom urlparse import urlparse\ndoc = html.fromstring(html_data)\nlinks = doc.cssselect("a")\ndomains = set([])\nfor link in links:\n    try: href=link.attrib['href']\n    except KeyError: continue\n    parsed=urlparse(href)\n    domains.add(parsed.netloc)\nprint domains\n
\n

First you load the html data into the a document object with fromstring. You query the document for links using standard css selectors with cssselect. You traverse the links, grab their urls with .attrib['href'] - and skip them if they don't have any (except - continue). Parse the url into a named tuple with urlparse and put the domain (netloc) into a set. Voila!

\n

Try avoiding regular expressions when you have good libraries online. They are hard for maintenance. Also a no-go for a html parsing.

\n

UPDATE:\nThe href filter suggestion in the comments is very helpful, the code will look like this:

\n
from lxml import html\nfrom urlparse import urlparse\ndoc = html.fromstring(html_data)\nlinks = doc.cssselect("a[href]")\ndomains = set([])\nfor link in links:\n    href=link.attrib['href']\n    parsed=urlparse(href)\n    domains.add(parsed.netloc)\nprint domains\n
\n

You don't need the try-catch block since the href filter makes sure you catch only the anchors that have href attribute in them.

\n soup wrap:

The cleanest way to do it is with cssselect from lxml.html and urlparse. Here is how:

from lxml import html
from urlparse import urlparse
doc = html.fromstring(html_data)
links = doc.cssselect("a")
domains = set([])
for link in links:
    try: href=link.attrib['href']
    except KeyError: continue
    parsed=urlparse(href)
    domains.add(parsed.netloc)
print domains

First you load the html data into the a document object with fromstring. You query the document for links using standard css selectors with cssselect. You traverse the links, grab their urls with .attrib['href'] - and skip them if they don't have any (except - continue). Parse the url into a named tuple with urlparse and put the domain (netloc) into a set. Voila!

Try avoiding regular expressions when you have good libraries online. They are hard for maintenance. Also a no-go for a html parsing.

UPDATE: The href filter suggestion in the comments is very helpful, the code will look like this:

from lxml import html
from urlparse import urlparse
doc = html.fromstring(html_data)
links = doc.cssselect("a[href]")
domains = set([])
for link in links:
    href=link.attrib['href']
    parsed=urlparse(href)
    domains.add(parsed.netloc)
print domains

You don't need the try-catch block since the href filter makes sure you catch only the anchors that have href attribute in them.

qid & accept id: (5198116, 5198430) query: getting pixels value in a checkerboard pattern in python soup:

Let's use smaller dimensions so the result is easier to see:

\n
import numpy as np\n# w=2948\n# h=1536\nw=6\nh=4\narr=np.arange(w*h).reshape(w,h)\nprint(arr)\nprint(arr.shape)\n# [[ 0  1  2  3]\n#  [ 4  5  6  7]\n#  [ 8  9 10 11]\n#  [12 13 14 15]\n#  [16 17 18 19]\n#  [20 21 22 23]]\n# (6, 4)\n
\n

We can construct a boolean array in a checkerboard pattern:

\n
coords=np.ogrid[0:w,0:h]\nidx=(coords[0]+coords[1])%2 == 1\nprint(idx)\nprint(idx.shape)\n# [[False  True False  True]\n#  [ True False  True False]\n#  [False  True False  True]\n#  [ True False  True False]\n#  [False  True False  True]\n#  [ True False  True False]]\n# (6, 4)\n
\n

Using this boolean array for indexing, we can extract the values we desire:

\n
checkerboard=arr[idx].reshape(w,h//2)\nprint(checkerboard)\nprint(checkerboard.shape)\n# [[ 1  3]\n#  [ 4  6]\n#  [ 9 11]\n#  [12 14]\n#  [17 19]\n#  [20 22]]\n# (6, 2)\n
\n

PS. Inspiration for this answer came from Ned Batchelder's answer here.

\n soup wrap:

Let's use smaller dimensions so the result is easier to see:

import numpy as np
# w=2948
# h=1536
w=6
h=4
arr=np.arange(w*h).reshape(w,h)
print(arr)
print(arr.shape)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]
#  [12 13 14 15]
#  [16 17 18 19]
#  [20 21 22 23]]
# (6, 4)

We can construct a boolean array in a checkerboard pattern:

coords=np.ogrid[0:w,0:h]
idx=(coords[0]+coords[1])%2 == 1
print(idx)
print(idx.shape)
# [[False  True False  True]
#  [ True False  True False]
#  [False  True False  True]
#  [ True False  True False]
#  [False  True False  True]
#  [ True False  True False]]
# (6, 4)

Using this boolean array for indexing, we can extract the values we desire:

checkerboard=arr[idx].reshape(w,h//2)
print(checkerboard)
print(checkerboard.shape)
# [[ 1  3]
#  [ 4  6]
#  [ 9 11]
#  [12 14]
#  [17 19]
#  [20 22]]
# (6, 2)

PS. Inspiration for this answer came from Ned Batchelder's answer here.

qid & accept id: (5222333, 5222710) query: authentication in python script to run as root soup:

The other thing you can do is have your script automatically invoke sudo if it wasn't executed as root:

\n
import os\nimport sys\n\neuid = os.geteuid()\nif euid != 0:\n    print "Script not started as root. Running sudo.."\n    args = ['sudo', sys.executable] + sys.argv + [os.environ]\n    # the next line replaces the currently-running process with the sudo\n    os.execlpe('sudo', *args)\n\nprint 'Running. Your euid is', euid\n
\n

Output:

\n
Script not started as root. Running sudo..\n[sudo] password for bob:\nRunning. Your euid is 0\n
\n

Use sudo -k for testing, to clear your sudo timestamp so the next time the script is run it will require the password again.

\n soup wrap:

The other thing you can do is have your script automatically invoke sudo if it wasn't executed as root:

import os
import sys

euid = os.geteuid()
if euid != 0:
    print "Script not started as root. Running sudo.."
    args = ['sudo', sys.executable] + sys.argv + [os.environ]
    # the next line replaces the currently-running process with the sudo
    os.execlpe('sudo', *args)

print 'Running. Your euid is', euid

Output:

Script not started as root. Running sudo..
[sudo] password for bob:
Running. Your euid is 0

Use sudo -k for testing, to clear your sudo timestamp so the next time the script is run it will require the password again.

qid & accept id: (5276837, 5278679) query: Special End-line characters/string from lines read from text file, using Python soup:

Here's a generator function thats acts as an iterator on a file, cuting the lines according exotic newline being identical in all the file.

\n

It reads the file by chunks of lenchunk characters and displays the lines in each current chunk, chunk after chunk.

\n

Since the newline is 3 characters in my exemple (':;:'), it may happen that a chunk ends with a cut newline: this generator function takes care of this possibility and manages to display the correct lines.

\n

In case of a newline being only one character, the function could be simplified. I wrote only the function for the most delicate case.

\n

Employing this function allows to read a file one line at a time, without reading the entire file into memory.

\n
from random import randrange, choice\n\n\n# this part is to create an exemple file with newline being :;:\nalphabet = 'abcdefghijklmnopqrstuvwxyz '\nch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))\n                for i in xrange(50))\nwith open('fofo.txt','wb') as g:\n    g.write(ch)\n\n\n# this generator function is an iterator for a file\n# if nl receives an argument whose bool is True,\n# the newlines :;: are returned in the lines\n\ndef liner(filename,eol,lenchunk,nl=0):\n    # nl = 0 or 1 acts as 0 or 1 in splitlines()\n    L = len(eol)\n    NL = len(eol) if nl else 0\n    with open(filename,'rb') as f:\n        chunk = f.read(lenchunk)\n        tail = ''\n        while chunk:\n            last = chunk.rfind(eol)\n            if last==-1:\n                kept = chunk\n                newtail = ''\n            else:\n                kept = chunk[0:last+L]   # here: L\n                newtail = chunk[last+L:] # here: L\n            chunk = tail + kept\n            tail = newtail\n            x = y = 0\n            while y+1:\n                y = chunk.find(eol,x)\n                if y+1: yield chunk[x:y+NL] # here: NL\n                else: break\n                x = y+L # here: L\n            chunk = f.read(lenchunk)\n        yield tail\n\n\n\nfor line in liner('fofo.txt',':;:'):\n    print line\n
\n

Here's the same, with printings here and there to allow to follow the algorithm.

\n
from random import randrange, choice\n\n\n# this part is to create an exemple file with newline being :;:\nalphabet = 'abcdefghijklmnopqrstuvwxyz '\nch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))\n                for i in xrange(50))\nwith open('fofo.txt','wb') as g:\n    g.write(ch)\n\n\n# this generator function is an iterator for a file\n# if nl receives an argument whose bool is True,\n# the newlines :;: are returned in the lines\n\ndef liner(filename,eol,lenchunk,nl=0):\n    L = len(eol)\n    NL = len(eol) if nl else 0\n    with open(filename,'rb') as f:\n        ch = f.read()\n        the_end = '\n\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'+\\n                  '\nend of the file=='+ch[-50:]+\\n                  '\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n'\n        f.seek(0,0)\n        chunk = f.read(lenchunk)\n        tail = ''\n        while chunk:\n            if (chunk[-1]==':' and chunk[-3:]!=':;:') or chunk[-2:]==':;':\n                wr = [' ##########---------- cut newline cut ----------##########'+\\n                     '\nchunk== '+chunk+\\n                     '\n---------------------------------------------------']\n            else:\n                wr = ['chunk== '+chunk+\\n                     '\n---------------------------------------------------']\n            last = chunk.rfind(eol)\n            if last==-1:\n                kept = chunk\n                newtail = ''\n            else:\n                kept = chunk[0:last+L]   # here: L\n                newtail = chunk[last+L:] # here: L\n            wr.append('\nkept== '+kept+\\n                      '\n---------------------------------------------------'+\\n                      '\nnewtail== '+newtail)\n            chunk = tail + kept\n            tail = newtail\n            wr.append('\n---------------------------------------------------'+\\n                      '\ntail + kept== '+chunk+\\n                      '\n---------------------------------------------------')\n            print ''.join(wr)\n            x = y = 0\n            while y+1:\n                y = chunk.find(eol,x)\n                if y+1: yield chunk[x:y+NL] # here: NL\n                else: break\n                x = y+L # here: L\n            print '\n\n==================================================='\n            chunk = f.read(lenchunk)\n        yield tail\n        print the_end\n\n\n\nfor line in liner('fofo.txt',':;:',1):\n    print 'line== '+line\n
\n

.

\n

EDIT

\n

I compared the times of execution of my code and of the chmullig's code.

\n

With a 'fofo.txt' file about 10 MB, created with

\n
alphabet = 'abcdefghijklmnopqrstuvwxyz '\nch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,60)))\n                for i in xrange(324000))\nwith open('fofo.txt','wb') as g:\n    g.write(ch)\n
\n

and measuring times like that:

\n
te = clock()\nfor line in liner('fofo.txt',':;:', 65536):\n    pass\nprint clock()-te\n\n\nfh = open('fofo.txt', 'rb')\nzenBreaker = SpecialDelimiters(fh, ':;:', 65536)\n\nte = clock()\nfor line in zenBreaker:\n    pass\nprint clock()-te\n
\n

I obtained the following minimum times observed on several essays:

\n
\n

............my code 0,7067 seconds

\n

chmullig's code 0.8373 seconds

\n
\n

.

\n

EDIT 2

\n

I changed my generator function: liner2() takes a file-handler instead of the file's name. So the opening of the file can be put out of the measuring of time, as it is for the measuring of chmullig's code

\n
def liner2(fh,eol,lenchunk,nl=0):\n    L = len(eol)\n    NL = len(eol) if nl else 0\n    chunk = fh.read(lenchunk)\n    tail = ''\n    while chunk:\n        last = chunk.rfind(eol)\n        if last==-1:\n            kept = chunk\n            newtail = ''\n        else:\n            kept = chunk[0:last+L]   # here: L\n            newtail = chunk[last+L:] # here: L\n        chunk = tail + kept\n        tail = newtail\n        x = y = 0\n        while y+1:\n            y = chunk.find(eol,x)\n            if y+1: yield chunk[x:y+NL] # here: NL\n            else: break\n            x = y+L # here: L\n        chunk = fh.read(lenchunk)\n    yield tail\n\nfh = open('fofo.txt', 'rb')\nte = clock()\nfor line in liner2(fh,':;:', 65536):\n    pass\nprint clock()-te\n
\n

The results, after numerous essays to see the minimum times, are

\n
\n

.........with liner() 0.7067seconds

\n

.......with liner2() 0.7064 seconds

\n

chmullig's code 0.8373 seconds

\n
\n

In fact the opening of the file counts for an infinitesimal part in the total time.

\n soup wrap:

Here's a generator function thats acts as an iterator on a file, cuting the lines according exotic newline being identical in all the file.

It reads the file by chunks of lenchunk characters and displays the lines in each current chunk, chunk after chunk.

Since the newline is 3 characters in my exemple (':;:'), it may happen that a chunk ends with a cut newline: this generator function takes care of this possibility and manages to display the correct lines.

In case of a newline being only one character, the function could be simplified. I wrote only the function for the most delicate case.

Employing this function allows to read a file one line at a time, without reading the entire file into memory.

from random import randrange, choice


# this part is to create an exemple file with newline being :;:
alphabet = 'abcdefghijklmnopqrstuvwxyz '
ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))
                for i in xrange(50))
with open('fofo.txt','wb') as g:
    g.write(ch)


# this generator function is an iterator for a file
# if nl receives an argument whose bool is True,
# the newlines :;: are returned in the lines

def liner(filename,eol,lenchunk,nl=0):
    # nl = 0 or 1 acts as 0 or 1 in splitlines()
    L = len(eol)
    NL = len(eol) if nl else 0
    with open(filename,'rb') as f:
        chunk = f.read(lenchunk)
        tail = ''
        while chunk:
            last = chunk.rfind(eol)
            if last==-1:
                kept = chunk
                newtail = ''
            else:
                kept = chunk[0:last+L]   # here: L
                newtail = chunk[last+L:] # here: L
            chunk = tail + kept
            tail = newtail
            x = y = 0
            while y+1:
                y = chunk.find(eol,x)
                if y+1: yield chunk[x:y+NL] # here: NL
                else: break
                x = y+L # here: L
            chunk = f.read(lenchunk)
        yield tail



for line in liner('fofo.txt',':;:'):
    print line

Here's the same, with printings here and there to allow to follow the algorithm.

from random import randrange, choice


# this part is to create an exemple file with newline being :;:
alphabet = 'abcdefghijklmnopqrstuvwxyz '
ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,40)))
                for i in xrange(50))
with open('fofo.txt','wb') as g:
    g.write(ch)


# this generator function is an iterator for a file
# if nl receives an argument whose bool is True,
# the newlines :;: are returned in the lines

def liner(filename,eol,lenchunk,nl=0):
    L = len(eol)
    NL = len(eol) if nl else 0
    with open(filename,'rb') as f:
        ch = f.read()
        the_end = '\n\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'+\
                  '\nend of the file=='+ch[-50:]+\
                  '\nxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n'
        f.seek(0,0)
        chunk = f.read(lenchunk)
        tail = ''
        while chunk:
            if (chunk[-1]==':' and chunk[-3:]!=':;:') or chunk[-2:]==':;':
                wr = [' ##########---------- cut newline cut ----------##########'+\
                     '\nchunk== '+chunk+\
                     '\n---------------------------------------------------']
            else:
                wr = ['chunk== '+chunk+\
                     '\n---------------------------------------------------']
            last = chunk.rfind(eol)
            if last==-1:
                kept = chunk
                newtail = ''
            else:
                kept = chunk[0:last+L]   # here: L
                newtail = chunk[last+L:] # here: L
            wr.append('\nkept== '+kept+\
                      '\n---------------------------------------------------'+\
                      '\nnewtail== '+newtail)
            chunk = tail + kept
            tail = newtail
            wr.append('\n---------------------------------------------------'+\
                      '\ntail + kept== '+chunk+\
                      '\n---------------------------------------------------')
            print ''.join(wr)
            x = y = 0
            while y+1:
                y = chunk.find(eol,x)
                if y+1: yield chunk[x:y+NL] # here: NL
                else: break
                x = y+L # here: L
            print '\n\n==================================================='
            chunk = f.read(lenchunk)
        yield tail
        print the_end



for line in liner('fofo.txt',':;:',1):
    print 'line== '+line

.

EDIT

I compared the times of execution of my code and of the chmullig's code.

With a 'fofo.txt' file about 10 MB, created with

alphabet = 'abcdefghijklmnopqrstuvwxyz '
ch = ':;:'.join(''.join(choice(alphabet) for nc in xrange(randrange(0,60)))
                for i in xrange(324000))
with open('fofo.txt','wb') as g:
    g.write(ch)

and measuring times like that:

te = clock()
for line in liner('fofo.txt',':;:', 65536):
    pass
print clock()-te


fh = open('fofo.txt', 'rb')
zenBreaker = SpecialDelimiters(fh, ':;:', 65536)

te = clock()
for line in zenBreaker:
    pass
print clock()-te

I obtained the following minimum times observed on several essays:

............my code 0,7067 seconds

chmullig's code 0.8373 seconds

.

EDIT 2

I changed my generator function: liner2() takes a file-handler instead of the file's name. So the opening of the file can be put out of the measuring of time, as it is for the measuring of chmullig's code

def liner2(fh,eol,lenchunk,nl=0):
    L = len(eol)
    NL = len(eol) if nl else 0
    chunk = fh.read(lenchunk)
    tail = ''
    while chunk:
        last = chunk.rfind(eol)
        if last==-1:
            kept = chunk
            newtail = ''
        else:
            kept = chunk[0:last+L]   # here: L
            newtail = chunk[last+L:] # here: L
        chunk = tail + kept
        tail = newtail
        x = y = 0
        while y+1:
            y = chunk.find(eol,x)
            if y+1: yield chunk[x:y+NL] # here: NL
            else: break
            x = y+L # here: L
        chunk = fh.read(lenchunk)
    yield tail

fh = open('fofo.txt', 'rb')
te = clock()
for line in liner2(fh,':;:', 65536):
    pass
print clock()-te

The results, after numerous essays to see the minimum times, are

.........with liner() 0.7067seconds

.......with liner2() 0.7064 seconds

chmullig's code 0.8373 seconds

In fact the opening of the file counts for an infinitesimal part in the total time.

qid & accept id: (5286022, 5286049) query: Python: Suppress exponential format (i.e. 9e-10) in float to string conversion? soup:

Try something like

\n
"%.16f" % f\n
\n

This will still use exponential format if the number is too small, so you have to treat this case separately, for example

\n
"%.16f" % f if f >= 1e-16 else "0.0"\n
\n soup wrap:

Try something like

"%.16f" % f

This will still use exponential format if the number is too small, so you have to treat this case separately, for example

"%.16f" % f if f >= 1e-16 else "0.0"
qid & accept id: (5300387, 5302119) query: Set a kind name independently of the model name (App Engine datastore) soup:

Just override the kind() method of your class:

\n
class MyModel(db.Model):\n  @classmethod\n  def kind(cls):\n    return 'prefix_%s' % super(MyModel, cls).kind()\n
\n

You can define a custom baseclass that does this for you:

\n
class ModuleModel(db.Model):\n  @classmethod\n  def kind(cls):\n    return '%s_%s' % (cls.__module__, super(ModuleModel, cls).kind())\n
\n

Any class that extends ModuleModel will have the name of the module it's defined in prefixed to the kind name.

\n soup wrap:

Just override the kind() method of your class:

class MyModel(db.Model):
  @classmethod
  def kind(cls):
    return 'prefix_%s' % super(MyModel, cls).kind()

You can define a custom baseclass that does this for you:

class ModuleModel(db.Model):
  @classmethod
  def kind(cls):
    return '%s_%s' % (cls.__module__, super(ModuleModel, cls).kind())

Any class that extends ModuleModel will have the name of the module it's defined in prefixed to the kind name.

qid & accept id: (5321466, 5321480) query: How to convert string timezones in form (Country/city) into datetime.tzinfo soup:

Yes, you need the pytz library:

\n
import datetime, pytz\nzoneName = 'America/New_York'\nnow = datetime.datetime.now(pytz.timezone(zoneName))\n
\n

returns:

\n
datetime.datetime(2011, 3, 16, 1, 39, 33, 87375, tzinfo=)\n
\n soup wrap:

Yes, you need the pytz library:

import datetime, pytz
zoneName = 'America/New_York'
now = datetime.datetime.now(pytz.timezone(zoneName))

returns:

datetime.datetime(2011, 3, 16, 1, 39, 33, 87375, tzinfo=)
qid & accept id: (5332701, 5332716) query: How to search & replace in Python? soup:

It depends on the rule you are using to decide where to insert the extra character.

\n

If you want it between the 5th and 6th characters you could try this:

\n
s = s[:5] + '-' + s[5:]\n
\n

If you want it after the first hyphen and then one more character:

\n
i = s.index('-') + 2\ns = s[:i] + '-' + s[i:]\n
\n

If you want it just before the first digit:

\n
import re\ni = re.search('\d', s).start()\ns = s[:i] + '-' + s[i:]\n
\n
\n
\n

Can I add a character after the first 2 number, ie from 'ABC-D1234' to 'ABC-D12-34'

\n
\n

Sure:

\n
i = re.search('(?<=\d\d)', s).start()\ns = s[:i] + '-' + s[i:]\n
\n

or:

\n
s = re.sub('(?<=\d\d)', '-', s, 1)\n
\n

or:

\n
s = re.sub('(\d\d)', r'\1-', s, 1)\n
\n soup wrap:

It depends on the rule you are using to decide where to insert the extra character.

If you want it between the 5th and 6th characters you could try this:

s = s[:5] + '-' + s[5:]

If you want it after the first hyphen and then one more character:

i = s.index('-') + 2
s = s[:i] + '-' + s[i:]

If you want it just before the first digit:

import re
i = re.search('\d', s).start()
s = s[:i] + '-' + s[i:]

Can I add a character after the first 2 number, ie from 'ABC-D1234' to 'ABC-D12-34'

Sure:

i = re.search('(?<=\d\d)', s).start()
s = s[:i] + '-' + s[i:]

or:

s = re.sub('(?<=\d\d)', '-', s, 1)

or:

s = re.sub('(\d\d)', r'\1-', s, 1)
qid & accept id: (5447428, 5447531) query: Accessing a Dynamically Generated Nested Dictionary soup:

I can see a few problems with your current implementation. How do you mark if a node in the trie is a word? A better implementation would be to initialize tree to something like tree = [{}, None] where None indicates if the current node is the end of a word.

\n

Your addTerm method could then be something like:

\n
def addTerm(self, term):\n   node = self.tree\n   for c in term:\n      c = c.lower()\n      if re.match("[a-z]",c):\n         node = node[0].setdefault(c,[{},None])\n   node[1] = term\n
\n

You could set node[1] to True if you don't care about what word is at the node.

\n

Searching if a word is in the trie would be something like

\n
def findTerm(self, term):\n    node = self.tree\n    for c in term:\n        c = c.lower()\n        if re.match("[a-z]",c):\n            if c in node[0]:\n                node = node[0][c]\n            else:\n                return False\n    return node[1] != None\n
\n soup wrap:

I can see a few problems with your current implementation. How do you mark if a node in the trie is a word? A better implementation would be to initialize tree to something like tree = [{}, None] where None indicates if the current node is the end of a word.

Your addTerm method could then be something like:

def addTerm(self, term):
   node = self.tree
   for c in term:
      c = c.lower()
      if re.match("[a-z]",c):
         node = node[0].setdefault(c,[{},None])
   node[1] = term

You could set node[1] to True if you don't care about what word is at the node.

Searching if a word is in the trie would be something like

def findTerm(self, term):
    node = self.tree
    for c in term:
        c = c.lower()
        if re.match("[a-z]",c):
            if c in node[0]:
                node = node[0][c]
            else:
                return False
    return node[1] != None
qid & accept id: (5464504, 5464830) query: Accessing an object created in another module using python soup:

Yeah,

\n

Thats right,

\n
from module import desired_object\n
\n

module exapmle:

\n
# Desired Module:\n\ndesired_object = None\ndef my_func():\n    global desired_object\n    desired_object = SomeObject()\n
\n

But make sure the 'my_func' must be called before importing desired_object from your module

\n soup wrap:

Yeah,

Thats right,

from module import desired_object

module exapmle:

# Desired Module:

desired_object = None
def my_func():
    global desired_object
    desired_object = SomeObject()

But make sure the 'my_func' must be called before importing desired_object from your module

qid & accept id: (5470210, 5473327) query: django one session per user soup:

I created extra field for user (userattributes extends user):

\n
class UserAttributes(User):\n    last_session_key = models.CharField(blank=True, null=True, max_length=40)\n
\n

and method:

\n
def set_session_key(self, key):\n    if self.last_session_key and not self.last_session_key == key:\n        Session.objects.get(session_key=self.last_session_key).delete()\n    self.last_session_key = key\n    self.save()  \n
\n

and i called it just after login:

\n
auth.login(request, user)\nuser.userattributes.set_session_key(request.session.session_key)\n
\n soup wrap:

I created extra field for user (userattributes extends user):

class UserAttributes(User):
    last_session_key = models.CharField(blank=True, null=True, max_length=40)

and method:

def set_session_key(self, key):
    if self.last_session_key and not self.last_session_key == key:
        Session.objects.get(session_key=self.last_session_key).delete()
    self.last_session_key = key
    self.save()  

and i called it just after login:

auth.login(request, user)
user.userattributes.set_session_key(request.session.session_key)
qid & accept id: (5472771, 5528088) query: customize django runserver output soup:

i think best way is use logging and add some code to this

\n

like

\n
from django.db import connection\nsql=connection.queries\n
\n

and

\n
doc = {\n                'record_hash': hash,\n                'level': record.level,\n                'channel': record.channel or u'',\n                'location': u'%s:%d' % (record.filename, record.lineno),\n                "message": record.msg,\n                'module': record.module or u'',\n                'occurrence_count': 0,\n                'solved': False,\n                'app_id': app_id,\n                'sql': sql,\n            }\n
\n

read more about this in http://docs.djangoproject.com/en/dev/topics/logging/

\n soup wrap:

i think best way is use logging and add some code to this

like

from django.db import connection
sql=connection.queries

and

doc = {
                'record_hash': hash,
                'level': record.level,
                'channel': record.channel or u'',
                'location': u'%s:%d' % (record.filename, record.lineno),
                "message": record.msg,
                'module': record.module or u'',
                'occurrence_count': 0,
                'solved': False,
                'app_id': app_id,
                'sql': sql,
            }

read more about this in http://docs.djangoproject.com/en/dev/topics/logging/

qid & accept id: (5478351, 5478448) query: Python time measure function soup:

First and foremost, I highly suggest using a profiler or atleast use timeit.

\n

However if you wanted to write your own timing method strictly to learn, here is somewhere to get started using a decorator.

\n
def timing(f):\n    def wrap(*args):\n        time1 = time.time()\n        ret = f(*args)\n        time2 = time.time()\n        print '%s function took %0.3f ms' % (f.func_name, (time2-time1)*1000.0)\n        return ret\n    return wrap\n
\n

And the usage is very simple, just use the @timing decorator:

\n
@timing\ndef do_work():\n  #code\n
\n

Note I'm calling f.func_name to get the function name as a string(in Python 2), or f.__name__ in Python 3.

\n soup wrap:

First and foremost, I highly suggest using a profiler or atleast use timeit.

However if you wanted to write your own timing method strictly to learn, here is somewhere to get started using a decorator.

def timing(f):
    def wrap(*args):
        time1 = time.time()
        ret = f(*args)
        time2 = time.time()
        print '%s function took %0.3f ms' % (f.func_name, (time2-time1)*1000.0)
        return ret
    return wrap

And the usage is very simple, just use the @timing decorator:

@timing
def do_work():
  #code

Note I'm calling f.func_name to get the function name as a string(in Python 2), or f.__name__ in Python 3.

qid & accept id: (5530857, 5533742) query: Parse XML file into Python object soup:

My beloved SD Chargers hat is off to you if you think a regex is easier than this:

\n
#!/usr/bin/env python\nimport xml.etree.cElementTree as et\n\nsxml="""\n\n  \n   some filename.mp3\n   Gogo (after 3.0)\n   131\n  \n  \n   another filename.mp3\n   iTunes\n   128  \n  \n\n"""\ntree=et.fromstring(sxml)\n\nfor el in tree.findall('file'):\n    print '-------------------'\n    for ch in el.getchildren():\n        print '{:>15}: {:<30}'.format(ch.tag, ch.text) \n\nprint "\nan alternate way:"  \nel=tree.find('file[2]/Name')  # xpath\nprint '{:>15}: {:<30}'.format(el.tag, el.text)  \n
\n

Output:

\n
-------------------\n           Name: some filename.mp3             \n        Encoder: Gogo (after 3.0)              \n        Bitrate: 131                           \n-------------------\n           Name: another filename.mp3          \n        Encoder: iTunes                        \n        Bitrate: 128                           \n\nan alternate way:\n           Name: another filename.mp3  \n
\n

If your attraction to a regex is being terse, here is an equally incomprehensible bit of list comprehension to create a data structure:

\n
[(ch.tag,ch.text) for e in tree.findall('file') for ch in e.getchildren()]\n
\n

Which creates a list of tuples of the XML children of in document order:

\n
[('Name', 'some filename.mp3'), \n ('Encoder', 'Gogo (after 3.0)'), \n ('Bitrate', '131'), \n ('Name', 'another filename.mp3'), \n ('Encoder', 'iTunes'), \n ('Bitrate', '128')]\n
\n

With a few more lines and a little more thought, obviously, you can create any data structure that you want from XML with ElementTree. It is part of the Python distribution.

\n

Edit

\n

Code golf is on!

\n
[{item.tag: item.text for item in ch} for ch in tree.findall('file')] \n[ {'Bitrate': '131', \n   'Name': 'some filename.mp3', \n   'Encoder': 'Gogo (after 3.0)'}, \n  {'Bitrate': '128', \n   'Name': 'another filename.mp3', \n   'Encoder': 'iTunes'}]\n
\n

If your XML only has the file section, you can choose your golf. If your XML has other tags, other sections, you need to account for the section the children are in and you will need to use findall

\n

There is a tutorial on ElementTree at Effbot.org

\n soup wrap:

My beloved SD Chargers hat is off to you if you think a regex is easier than this:

#!/usr/bin/env python
import xml.etree.cElementTree as et

sxml="""

  
   some filename.mp3
   Gogo (after 3.0)
   131
  
  
   another filename.mp3
   iTunes
   128  
  

"""
tree=et.fromstring(sxml)

for el in tree.findall('file'):
    print '-------------------'
    for ch in el.getchildren():
        print '{:>15}: {:<30}'.format(ch.tag, ch.text) 

print "\nan alternate way:"  
el=tree.find('file[2]/Name')  # xpath
print '{:>15}: {:<30}'.format(el.tag, el.text)  

Output:

-------------------
           Name: some filename.mp3             
        Encoder: Gogo (after 3.0)              
        Bitrate: 131                           
-------------------
           Name: another filename.mp3          
        Encoder: iTunes                        
        Bitrate: 128                           

an alternate way:
           Name: another filename.mp3  

If your attraction to a regex is being terse, here is an equally incomprehensible bit of list comprehension to create a data structure:

[(ch.tag,ch.text) for e in tree.findall('file') for ch in e.getchildren()]

Which creates a list of tuples of the XML children of in document order:

[('Name', 'some filename.mp3'), 
 ('Encoder', 'Gogo (after 3.0)'), 
 ('Bitrate', '131'), 
 ('Name', 'another filename.mp3'), 
 ('Encoder', 'iTunes'), 
 ('Bitrate', '128')]

With a few more lines and a little more thought, obviously, you can create any data structure that you want from XML with ElementTree. It is part of the Python distribution.

Edit

Code golf is on!

[{item.tag: item.text for item in ch} for ch in tree.findall('file')] 
[ {'Bitrate': '131', 
   'Name': 'some filename.mp3', 
   'Encoder': 'Gogo (after 3.0)'}, 
  {'Bitrate': '128', 
   'Name': 'another filename.mp3', 
   'Encoder': 'iTunes'}]

If your XML only has the file section, you can choose your golf. If your XML has other tags, other sections, you need to account for the section the children are in and you will need to use findall

There is a tutorial on ElementTree at Effbot.org

qid & accept id: (5532498, 5918298) query: Delete files with python through OS shell soup:

A slightly verbose writing of another method

\n
import os\ndir = "E:\\test"\nfiles = os.listdir(dir)\nfor file in files:\n    if file.endswith(".txt"):\n        os.remove(os.path.join(dir,file))\n
\n

Or

\n
import os\n[os.remove(os.path.join("E:\\test",f)) for f in os.listdir("E:\\test") if f.endswith(".txt")]\n
\n soup wrap:

A slightly verbose writing of another method

import os
dir = "E:\\test"
files = os.listdir(dir)
for file in files:
    if file.endswith(".txt"):
        os.remove(os.path.join(dir,file))

Or

import os
[os.remove(os.path.join("E:\\test",f)) for f in os.listdir("E:\\test") if f.endswith(".txt")]
qid & accept id: (5599022, 5599114) query: Python: Pass a generic dictionary as a command line arguments soup:

That should be fairly easy to parse yourself. Use of the helper libraries would be complicated by not knowing the keys in advance. The filename is in sys.argv[1]. You can build the dictionary with a list of strings split with the '=' character as a delimiter.

\n
import sys\nfilename = sys.argv[1]\nargs = dict([arg.split('=', maxsplit=1) for arg in sys.argv[2:]])\nprint filename\nprint args\n
\n

Output:

\n
$ Script.py file1 bob=1 sue=2 ben=3\nfile1\n{'bob': '1', 'ben': '3', 'sue': '2'}\n
\n

That's the gist of it, but you may need more robust parsing of the key-value pairs than just splitting the string. Also, make sure you have at least two arguments in sys.argv before trying to extract the filename.

\n soup wrap:

That should be fairly easy to parse yourself. Use of the helper libraries would be complicated by not knowing the keys in advance. The filename is in sys.argv[1]. You can build the dictionary with a list of strings split with the '=' character as a delimiter.

import sys
filename = sys.argv[1]
args = dict([arg.split('=', maxsplit=1) for arg in sys.argv[2:]])
print filename
print args

Output:

$ Script.py file1 bob=1 sue=2 ben=3
file1
{'bob': '1', 'ben': '3', 'sue': '2'}

That's the gist of it, but you may need more robust parsing of the key-value pairs than just splitting the string. Also, make sure you have at least two arguments in sys.argv before trying to extract the filename.

qid & accept id: (5629242, 5629275) query: Getting Every File in a Directory, Python soup:

You can use os.listdir(".") to list the contents of the current directory ("."):

\n
for name in os.listdir("."):\n    if name.endswith(".txt"):\n        print(name)\n
\n

If you want the whole list as a Python list, use a list comprehension:

\n
a = [name for name in os.listdir(".") if name.endswith(".txt")]\n
\n soup wrap:

You can use os.listdir(".") to list the contents of the current directory ("."):

for name in os.listdir("."):
    if name.endswith(".txt"):
        print(name)

If you want the whole list as a Python list, use a list comprehension:

a = [name for name in os.listdir(".") if name.endswith(".txt")]
qid & accept id: (5678136, 5678516) query: trying to create a dictionary but do not know how to deal with \n soup:

If you're not working with a ridiculously large file, you can actually avoid having to use .strip() at all. If you read in the entire file as a string using .read() and then perform .splitlines() on that string.

\n

Here is an example. I commented out your code where I changed things. I changed the example not to use slicing in exchange for explicit variable assignment.

\n
subject_dic = {}\ninputFile = open(filename)\n\n# Turn "line1\nline2\n" into ['line1', 'line2']\ninputData = inputFile.read().splitlines()\n\n#for line in inputFile:\nfor line in inputData:\n    #split_line = string.split(line, ',')\n    #subject_dic[split_line[0]] = tuple(split_line[1:3])\n    mykey, myval1, myval2 = line.split(',') # Strings always have .split()\n    subject_dic[mykey] = (myval1, myval2) # Explicit tuple assignment\n\nprint subject_dic\n
\n

Outputs:

\n
{'6.00': ('10', '1'),\n '6.01': ('5', '4'),\n '6.02': ('5', '6'),\n '6.03': ('2', '9'),\n '6.04': ('1', '2'),\n '6.05': ('1', '18'),\n '6.06': ('5', '19'),\n '6.07': ('2', '10'),\n '6.08': ('1', '10'),\n '6.09': ('3', '7'),\n '6.10': ('8', '18'),\n '6.11': ('6', '8'),\n '6.12': ('6', '3'),\n '6.13': ('9', '16'),\n '6.14': ('10', '8'),\n '6.15': ('10', '6'),\n '6.16': ('6', '9'),\n '6.17': ('9', '3'),\n '6.18': ('10', '4'),\n '6.19': ('8', '19')}\n
\n soup wrap:

If you're not working with a ridiculously large file, you can actually avoid having to use .strip() at all. If you read in the entire file as a string using .read() and then perform .splitlines() on that string.

Here is an example. I commented out your code where I changed things. I changed the example not to use slicing in exchange for explicit variable assignment.

subject_dic = {}
inputFile = open(filename)

# Turn "line1\nline2\n" into ['line1', 'line2']
inputData = inputFile.read().splitlines()

#for line in inputFile:
for line in inputData:
    #split_line = string.split(line, ',')
    #subject_dic[split_line[0]] = tuple(split_line[1:3])
    mykey, myval1, myval2 = line.split(',') # Strings always have .split()
    subject_dic[mykey] = (myval1, myval2) # Explicit tuple assignment

print subject_dic

Outputs:

{'6.00': ('10', '1'),
 '6.01': ('5', '4'),
 '6.02': ('5', '6'),
 '6.03': ('2', '9'),
 '6.04': ('1', '2'),
 '6.05': ('1', '18'),
 '6.06': ('5', '19'),
 '6.07': ('2', '10'),
 '6.08': ('1', '10'),
 '6.09': ('3', '7'),
 '6.10': ('8', '18'),
 '6.11': ('6', '8'),
 '6.12': ('6', '3'),
 '6.13': ('9', '16'),
 '6.14': ('10', '8'),
 '6.15': ('10', '6'),
 '6.16': ('6', '9'),
 '6.17': ('9', '3'),
 '6.18': ('10', '4'),
 '6.19': ('8', '19')}
qid & accept id: (5678950, 5679742) query: Matplotlib artists to stay the same size when zoomed in? soup:

Simply apply the transform=ax.transAxes keyword to the Polygon or Rectangle instance. You could also use transFigure if it makes more sense to anchor the patch to the figure instead of the axis. Here is the tutorial on transforms.

\n

And here is some sample code:

\n
from matplotlib import pyplot as plt\nfrom matplotlib.patches import Polygon\nimport numpy as np\nx = np.linspace(0,5,100)\ny = np.sin(x)\n\nplt.plot(x,y)\nax = plt.gca()\n\npolygon = Polygon([[.1,.1],[.3,.2],[.2,.3]], True, transform=ax.transAxes)\nax.add_patch(polygon)\n\nplt.show()\n
\n

If you do not want to place your polygon using axis coordinate system but rather want it positioned using data coordinate system, then you can use the transforms to statically convert the data before positioning. Best exemplified here:

\n
from matplotlib import pyplot as plt\nfrom matplotlib.patches import Polygon\nimport numpy as np\n\nx = np.linspace(0,5,100)\ny = np.sin(x)\n\nplt.plot(x,y)\nax = plt.gca()\n\ndta_pts = [[.5,-.75],[1.5,-.6],[1,-.4]]\n\n# coordinates converters:\n#ax_to_display = ax.transAxes.transform\ndisplay_to_ax = ax.transAxes.inverted().transform\ndata_to_display = ax.transData.transform\n#display_to_data = ax.transData.inverted().transform\n\nax_pts = display_to_ax(data_to_display(dta_pts))\n\n# this triangle will move with the plot\nax.add_patch(Polygon(dta_pts, True)) \n# this triangle will stay put relative to the axes bounds\nax.add_patch(Polygon(ax_pts, True, transform=ax.transAxes))\n\nplt.show()\n
\n soup wrap:

Simply apply the transform=ax.transAxes keyword to the Polygon or Rectangle instance. You could also use transFigure if it makes more sense to anchor the patch to the figure instead of the axis. Here is the tutorial on transforms.

And here is some sample code:

from matplotlib import pyplot as plt
from matplotlib.patches import Polygon
import numpy as np
x = np.linspace(0,5,100)
y = np.sin(x)

plt.plot(x,y)
ax = plt.gca()

polygon = Polygon([[.1,.1],[.3,.2],[.2,.3]], True, transform=ax.transAxes)
ax.add_patch(polygon)

plt.show()

If you do not want to place your polygon using axis coordinate system but rather want it positioned using data coordinate system, then you can use the transforms to statically convert the data before positioning. Best exemplified here:

from matplotlib import pyplot as plt
from matplotlib.patches import Polygon
import numpy as np

x = np.linspace(0,5,100)
y = np.sin(x)

plt.plot(x,y)
ax = plt.gca()

dta_pts = [[.5,-.75],[1.5,-.6],[1,-.4]]

# coordinates converters:
#ax_to_display = ax.transAxes.transform
display_to_ax = ax.transAxes.inverted().transform
data_to_display = ax.transData.transform
#display_to_data = ax.transData.inverted().transform

ax_pts = display_to_ax(data_to_display(dta_pts))

# this triangle will move with the plot
ax.add_patch(Polygon(dta_pts, True)) 
# this triangle will stay put relative to the axes bounds
ax.add_patch(Polygon(ax_pts, True, transform=ax.transAxes))

plt.show()
qid & accept id: (5701962, 5704630) query: Python .csv writer soup:

Guessing at you really mean, I would rewrite your code as follows:

\n
from urlparse import urlparse\nimport csv\nimport re\n\nifile =open(ipath,'r')\nofile = open(opath, 'wb')\nwriter = csv.writer(ofile, dialect='excel')\n\nurl =[urlparse(u).netloc for u in ifile]\nsitesource =  set([re.sub("www.", "", e) for e in url])\n\nfor u in sitesource:\n    print ("Creation de:", u)\n    writer.writerow([u]) \n\nofile.close()\nifile.close()\n
\n

I deleted liste as it's not used. I got rid of for row in file (ifile): as you already iterated over its contents when you created url.

\n

I changed

\n
url =[urlparse(u).netloc for u in file (ipath, "r+b")]\n
\n

to

\n
url =[urlparse(u).netloc for u in ifile]\n
\n

because you already had the file open. I assumed you did not want binary mode if you are reading strings.

\n

I changed writerow(u) to write a sequence: writerow([u]). This puts a single u per line, which means your csv file will not acutally have any commas in it. If you wanted all of your results in a single row, replace the final loop with this statment writer.writerow(sitesource).

\n soup wrap:

Guessing at you really mean, I would rewrite your code as follows:

from urlparse import urlparse
import csv
import re

ifile =open(ipath,'r')
ofile = open(opath, 'wb')
writer = csv.writer(ofile, dialect='excel')

url =[urlparse(u).netloc for u in ifile]
sitesource =  set([re.sub("www.", "", e) for e in url])

for u in sitesource:
    print ("Creation de:", u)
    writer.writerow([u]) 

ofile.close()
ifile.close()

I deleted liste as it's not used. I got rid of for row in file (ifile): as you already iterated over its contents when you created url.

I changed

url =[urlparse(u).netloc for u in file (ipath, "r+b")]

to

url =[urlparse(u).netloc for u in ifile]

because you already had the file open. I assumed you did not want binary mode if you are reading strings.

I changed writerow(u) to write a sequence: writerow([u]). This puts a single u per line, which means your csv file will not acutally have any commas in it. If you wanted all of your results in a single row, replace the final loop with this statment writer.writerow(sitesource).

qid & accept id: (5722767, 5740724) query: django-mptt get_descendants for a list of nodes soup:

Great thanks to Craig de Stigter answered my question on django-mptt-dev group, in case anybody need it I am kindly reposting his solution from http://groups.google.com/group/django-mptt-dev/browse_thread/thread/637c8b2fe816304d

\n
   from django.db.models import Q \n   import operator \n   def get_queryset_descendants(nodes, include_self=False): \n       if not nodes: \n           return Node.tree.none() \n       filters = [] \n       for n in nodes: \n           lft, rght = n.lft, n.rght \n           if include_self: \n               lft -=1 \n               rght += 1 \n           filters.append(Q(tree_id=n.tree_id, lft__gt=lft, rght__lt=rght)) \n       q = reduce(operator.or_, filters) \n       return Node.tree.filter(q) \n
\n

Example Node tree:

\n
T1 \n---T1.1 \n---T1.2 \nT2 \nT3 \n---T3.3 \n------T3.3.3 \n
\n

Example usage:

\n
   >> some_nodes = [, , ]  # QureySet\n   >> print get_queryset_descendants(some_nodes)\n   [, , , ] \n   >> print get_queryset_descendants(some_nodes, include_self=True)\n   [, , , , , , ] \n
\n soup wrap:

Great thanks to Craig de Stigter answered my question on django-mptt-dev group, in case anybody need it I am kindly reposting his solution from http://groups.google.com/group/django-mptt-dev/browse_thread/thread/637c8b2fe816304d

   from django.db.models import Q 
   import operator 
   def get_queryset_descendants(nodes, include_self=False): 
       if not nodes: 
           return Node.tree.none() 
       filters = [] 
       for n in nodes: 
           lft, rght = n.lft, n.rght 
           if include_self: 
               lft -=1 
               rght += 1 
           filters.append(Q(tree_id=n.tree_id, lft__gt=lft, rght__lt=rght)) 
       q = reduce(operator.or_, filters) 
       return Node.tree.filter(q) 

Example Node tree:

T1 
---T1.1 
---T1.2 
T2 
T3 
---T3.3 
------T3.3.3 

Example usage:

   >> some_nodes = [, , ]  # QureySet
   >> print get_queryset_descendants(some_nodes)
   [, , , ] 
   >> print get_queryset_descendants(some_nodes, include_self=True)
   [, , , , , , ] 
qid & accept id: (5726827, 5727089) query: how to print a dict which has japanese word using python soup:

You can derive your own class from dict and override __str__ method:

\n
# -*- coding: utf-8 -*-\n\nclass MyDict(dict):\n    def __str__(self):\n        return "{"+", ".join(["%s: %s" % (key, self[key]) for key in self])+"}" \n\na = {0:"Velmi žluťoučký kůň"}\nb = MyDict({0:"Velmi žluťoučký kůň"})\nc = "Velmi žluťoučký kůň"\nprint(a)\nprint(b)\nprint(c)\n
\n

Prints:

\n
{0: 'Velmi \xc5\xbelu\xc5\xa5ou\xc4\x8dk\xc3\xbd k\xc5\xaf\xc5\x88'}\n{0: Velmi žluťoučký kůň}\nVelmi žluťoučký kůň\n
\n

The derived class will behave exactly the same as dict, but it will print using the method you specify.

\n soup wrap:

You can derive your own class from dict and override __str__ method:

# -*- coding: utf-8 -*-

class MyDict(dict):
    def __str__(self):
        return "{"+", ".join(["%s: %s" % (key, self[key]) for key in self])+"}" 

a = {0:"Velmi žluťoučký kůň"}
b = MyDict({0:"Velmi žluťoučký kůň"})
c = "Velmi žluťoučký kůň"
print(a)
print(b)
print(c)

Prints:

{0: 'Velmi \xc5\xbelu\xc5\xa5ou\xc4\x8dk\xc3\xbd k\xc5\xaf\xc5\x88'}
{0: Velmi žluťoučký kůň}
Velmi žluťoučký kůň

The derived class will behave exactly the same as dict, but it will print using the method you specify.

qid & accept id: (5743548, 5743685) query: How to know the filetype through python soup:

'\x89' is a representation of non-printable byte with value 0x89 (that's 137).

\n

As for finding file types in Python, there is already mimetypes module for that.

\n
import mimetypes\ntype, subtype = mimetypes.guess_type(filename_or_url)\n
\n

In action:

\n
>>> mimetypes.guess_type('http://upload.wikimedia.org/wikipedia/commons/9/9a/PNG_transparency_demonstration_2.png')\n('image/png', None)\n
\n soup wrap:

'\x89' is a representation of non-printable byte with value 0x89 (that's 137).

As for finding file types in Python, there is already mimetypes module for that.

import mimetypes
type, subtype = mimetypes.guess_type(filename_or_url)

In action:

>>> mimetypes.guess_type('http://upload.wikimedia.org/wikipedia/commons/9/9a/PNG_transparency_demonstration_2.png')
('image/png', None)
qid & accept id: (5761617, 5761901) query: Pyramid authorization for stored items soup:

You can do this using the ACLAuthorizationPolicy combined with URL Dispatch by using a custom resource tree designed for this purpose.

\n

For example, you have permissions for Foo objects, and permissions for Bar objects. These ACLs can be found by traversing the resource tree using the urls:

\n
/foos/{obj}\n/bars/{obj}\n
\n

Your resource tree then becomes a hierarchy of permissions, where at any point in the tree you can place an __acl__ on the resource object:

\n
root                       (Root)\n|- foos                    (FooContainer)\n|  `- {obj}                (Foo)\n`- bars                    (BarContainer)\n   `- {obj}                (Bar)\n
\n

You can represent this hierarchy in a resource tree:

\n
class Root(dict):\n    # this is the root factory, you can set an __acl__ here for all resources\n    __acl__ = [\n        (Allow, 'admin', ALL_PERMISSIONS),\n    ]\n    def __init__(self, request):\n        self.request = request\n        self['foos'] = FooContainer(self, 'foos')\n        self['bars'] = BarContainer(self, 'bars')\n\nclass FooContainer(object):\n    # set ACL here for *all* objects of type Foo\n    __acl__ = [\n    ]\n\n    def __init__(self, parent, name):\n        self.__parent__ = parent\n        self.__name__ = name\n\n    def __getitem__(self, key):\n        # get a database connection\n        s = DBSession()\n        obj = s.query(Foo).filter_by(id=key).scalar()\n        if obj is None:\n            raise KeyError\n        obj.__parent__ = self\n        obj.__name__ = key\n        return obj\n\nclass Foo(object):\n    # this __acl__ is computed dynamically based on the specific object\n    @property\n    def __acl__(self):\n        acls = [(Allow, 'u:%d' % o.id, 'view') for o in self.owners]\n        return acls\n\n    owners = relation('FooOwner')\n\nclass Bar(object):\n    # allow any authenticated user to view Bar objects\n    __acl__ = [\n        (Allow, Authenticated, 'view')\n    ]\n
\n

With a setup like this, you can then map route patterns to your resource tree:

\n
config = Configurator()\nconfig.add_route('item_options', '/item/{item}/some_options',\n                 # tell pyramid where in the resource tree to go for this url\n                 traverse='/foos/{item}')\n
\n

You will also need to map your route to a specific view:

\n
config.add_view(route_name='item_options', view='.views.options_view',\n                permission='view', renderer='item_options.mako')\n
\n

Great, now we can define our view and use the loaded context object, knowing that if the view is executed, the user has the appropriate permissions!

\n
def options_view(request):\n    foo = request.context\n    return {\n        'foo': foo,\n    }\n
\n

Using this setup, you are using the default ACLAuthorizationPolicy, and you are providing row-level permissions for your objects with URL Dispatch. Note also, that because the objects set the __parent__ property on the children, the policy will bubble up the lineage, inheriting permissions from the parents. This can be avoided by simply putting a DENY_ALL ACE in your ACL, or by writing a custom policy that does not use the context's lineage.

\n

* Update *\nI've turned this post into an actual demo on Github. Hopefully it helps someone.\nhttps://github.com/mmerickel/pyramid_auth_demo

\n

* Update *\nI've written a full tutorial around pyramid's authentication and authorization system here: http://michael.merickel.org/projects/pyramid_auth_demo/

\n soup wrap:

You can do this using the ACLAuthorizationPolicy combined with URL Dispatch by using a custom resource tree designed for this purpose.

For example, you have permissions for Foo objects, and permissions for Bar objects. These ACLs can be found by traversing the resource tree using the urls:

/foos/{obj}
/bars/{obj}

Your resource tree then becomes a hierarchy of permissions, where at any point in the tree you can place an __acl__ on the resource object:

root                       (Root)
|- foos                    (FooContainer)
|  `- {obj}                (Foo)
`- bars                    (BarContainer)
   `- {obj}                (Bar)

You can represent this hierarchy in a resource tree:

class Root(dict):
    # this is the root factory, you can set an __acl__ here for all resources
    __acl__ = [
        (Allow, 'admin', ALL_PERMISSIONS),
    ]
    def __init__(self, request):
        self.request = request
        self['foos'] = FooContainer(self, 'foos')
        self['bars'] = BarContainer(self, 'bars')

class FooContainer(object):
    # set ACL here for *all* objects of type Foo
    __acl__ = [
    ]

    def __init__(self, parent, name):
        self.__parent__ = parent
        self.__name__ = name

    def __getitem__(self, key):
        # get a database connection
        s = DBSession()
        obj = s.query(Foo).filter_by(id=key).scalar()
        if obj is None:
            raise KeyError
        obj.__parent__ = self
        obj.__name__ = key
        return obj

class Foo(object):
    # this __acl__ is computed dynamically based on the specific object
    @property
    def __acl__(self):
        acls = [(Allow, 'u:%d' % o.id, 'view') for o in self.owners]
        return acls

    owners = relation('FooOwner')

class Bar(object):
    # allow any authenticated user to view Bar objects
    __acl__ = [
        (Allow, Authenticated, 'view')
    ]

With a setup like this, you can then map route patterns to your resource tree:

config = Configurator()
config.add_route('item_options', '/item/{item}/some_options',
                 # tell pyramid where in the resource tree to go for this url
                 traverse='/foos/{item}')

You will also need to map your route to a specific view:

config.add_view(route_name='item_options', view='.views.options_view',
                permission='view', renderer='item_options.mako')

Great, now we can define our view and use the loaded context object, knowing that if the view is executed, the user has the appropriate permissions!

def options_view(request):
    foo = request.context
    return {
        'foo': foo,
    }

Using this setup, you are using the default ACLAuthorizationPolicy, and you are providing row-level permissions for your objects with URL Dispatch. Note also, that because the objects set the __parent__ property on the children, the policy will bubble up the lineage, inheriting permissions from the parents. This can be avoided by simply putting a DENY_ALL ACE in your ACL, or by writing a custom policy that does not use the context's lineage.

* Update * I've turned this post into an actual demo on Github. Hopefully it helps someone. https://github.com/mmerickel/pyramid_auth_demo

* Update * I've written a full tutorial around pyramid's authentication and authorization system here: http://michael.merickel.org/projects/pyramid_auth_demo/

qid & accept id: (5771039, 5771164) query: Python: output for recursively printing out files and folders soup:

If you write a function to return the directory structure as a nested list like this:

\n
['DIR1/',['fileA','fileB','DIR3/',['fileE','fileF']],'DIR2/',['fileC','fileD']]\n
\n

then you could use pprint.pformat to create a passable string representation:

\n
import pprint\nimport textwrap\nimport re\n\ndata=['DIR1/',['fileA','fileB','DIR3/',['fileE','fileF']],'DIR2/',['fileC','fileD']]\nprint(textwrap.dedent(re.sub(r"[\]\[',]", r' ',\n                             pprint.pformat(data,indent=4,width=1))))\n
\n

yields

\n
DIR1/  \n    fileA  \n    fileB  \n    DIR3/  \n        fileE  \n        fileF    \nDIR2/  \n    fileC  \n    fileD   \n
\n

Note: The above code assumes your file and directory names do not contain any of the characters ,[]'...

\n soup wrap:

If you write a function to return the directory structure as a nested list like this:

['DIR1/',['fileA','fileB','DIR3/',['fileE','fileF']],'DIR2/',['fileC','fileD']]

then you could use pprint.pformat to create a passable string representation:

import pprint
import textwrap
import re

data=['DIR1/',['fileA','fileB','DIR3/',['fileE','fileF']],'DIR2/',['fileC','fileD']]
print(textwrap.dedent(re.sub(r"[\]\[',]", r' ',
                             pprint.pformat(data,indent=4,width=1))))

yields

DIR1/  
    fileA  
    fileB  
    DIR3/  
        fileE  
        fileF    
DIR2/  
    fileC  
    fileD   

Note: The above code assumes your file and directory names do not contain any of the characters ,[]'...

qid & accept id: (5808970, 5809035) query: Custom dictionary lookup in Python soup:

You can derive from dict to change the behaviour of the get() method:

\n
class ClosestDict(dict):\n    def get(self, key):\n        key = min(self.iterkeys(), key=lambda x: abs(x - key))\n        return dict.get(self, key)\n\nd = ClosestDict({10: 3, 100: 2, 1000: 1})\nprint (d.get(20), d.get(60), d.get(200))\n
\n

prints

\n
(3, 2, 2)\n
\n

Note that the complexity of get() no longer is O(1), but O(n).

\n soup wrap:

You can derive from dict to change the behaviour of the get() method:

class ClosestDict(dict):
    def get(self, key):
        key = min(self.iterkeys(), key=lambda x: abs(x - key))
        return dict.get(self, key)

d = ClosestDict({10: 3, 100: 2, 1000: 1})
print (d.get(20), d.get(60), d.get(200))

prints

(3, 2, 2)

Note that the complexity of get() no longer is O(1), but O(n).

qid & accept id: (5825921, 5825954) query: How do I count the number of identical characters in a string by position using python? soup:

I don't think any "clever" trick beats the obvious approach, if it's well executed:

\n
sum(c1 == c2 for c1, c2 in itertools.izip(s1, s2))\n
\n

Or, if the use of booleans for arithmetic irks you,

\n
sum(1 for c1, c2 in itertools.izip(s1, s2) if c1 == c2)\n
\n soup wrap:

I don't think any "clever" trick beats the obvious approach, if it's well executed:

sum(c1 == c2 for c1, c2 in itertools.izip(s1, s2))

Or, if the use of booleans for arithmetic irks you,

sum(1 for c1, c2 in itertools.izip(s1, s2) if c1 == c2)
qid & accept id: (5873969, 5874424) query: How can I scrape data from a text table using Python? soup:

Here is some code to get you started:

\n
text = """JOHN ...""" # text without the header\n\n# These can be inferred if necessary\ncols = [0, 24, 29, 39, 43, 52, 71, 84, 95, 109, 117]\n\ndb = []\nrow = []\nfor line in text.strip().split("\n"):\n    data = [line[cols[i]:cols[i+1]] for i in xrange((len(cols)-1))]\n    if data[0][0] != " ":\n        if row:\n            db.append(row)\n        row = map(lambda x: [x], data)\n    else:\n        for i, c in enumerate(data):\n            row[i].append(c)\nprint db\n
\n

This will produce an array with an element per person. Each element will be an array of all the columns, and that will hold an array of all the rows. This way you can easily access the different years, or do things like concatenate the person's title:

\n
for person in db:\n    print "Name:", person[0][0]\n    print " ".join(s.strip() for s in person[0][1:])\n    print\n
\n

Will yield:

\n
Name: JOHN W. WOODS           \nChairman, President, & Chief Executive Officer of AmSouth & AmSouth Bank N.A.\n\nName: C. STANLEY ...\n
\n soup wrap:

Here is some code to get you started:

text = """JOHN ...""" # text without the header

# These can be inferred if necessary
cols = [0, 24, 29, 39, 43, 52, 71, 84, 95, 109, 117]

db = []
row = []
for line in text.strip().split("\n"):
    data = [line[cols[i]:cols[i+1]] for i in xrange((len(cols)-1))]
    if data[0][0] != " ":
        if row:
            db.append(row)
        row = map(lambda x: [x], data)
    else:
        for i, c in enumerate(data):
            row[i].append(c)
print db

This will produce an array with an element per person. Each element will be an array of all the columns, and that will hold an array of all the rows. This way you can easily access the different years, or do things like concatenate the person's title:

for person in db:
    print "Name:", person[0][0]
    print " ".join(s.strip() for s in person[0][1:])
    print

Will yield:

Name: JOHN W. WOODS           
Chairman, President, & Chief Executive Officer of AmSouth & AmSouth Bank N.A.

Name: C. STANLEY ...
qid & accept id: (5901653, 5901750) query: Name of Current App in Google App Engine (Python) soup:

\n
import os\nappname = os.environ['APPLICATION_ID']\n
\n

\nEDIT: I just noticed this because I got a new upvote on it today (shame on you, upvoter!), but this is no longer correct.

\n
from google.appengine.api.app_identity import get_application_id\nappname = get_application_id()\n
\n

should be used. The value in os.environ will include a "s~" prefix for applications using the HR datastore and, by default, "dev~" on the development server. (os.environ should also be avoided entirely on App Engine anyway, since when concurrency support is added with the Python 2.7 runtime, use of os.environ won't be threadsafe and will allow data to leak from one request to another, although obviously the application ID itself would be the same for multiple requests to the same application at the same time...)

\n soup wrap:

import os
appname = os.environ['APPLICATION_ID']

EDIT: I just noticed this because I got a new upvote on it today (shame on you, upvoter!), but this is no longer correct.

from google.appengine.api.app_identity import get_application_id
appname = get_application_id()

should be used. The value in os.environ will include a "s~" prefix for applications using the HR datastore and, by default, "dev~" on the development server. (os.environ should also be avoided entirely on App Engine anyway, since when concurrency support is added with the Python 2.7 runtime, use of os.environ won't be threadsafe and will allow data to leak from one request to another, although obviously the application ID itself would be the same for multiple requests to the same application at the same time...)

qid & accept id: (5909816, 5955133) query: How to represent dbus type b(oss) in python? soup:

According to D-Bus specification, (b(oss)) is a struct of two elements, first is a boolean, second is a struct of three elements: an object path and two strings. In python this maps to something like:

\n
dbus.Struct((dbus.Boolean(a_boolean),\n             dbus.Struct((dbus.ObjectPath(s1),\n                          dbus.String(s2),\n                          dbus.String(s3)))),\n            signature="(b(oss))")\n
\n

but it can be used as if it was simply a python tuple like:

\n
( a_boolean, (s1, s2, s3) )\n
\n

Are you writing a client or a server? In the latter case you should also check this question which provides details on exporting properties using python dbus module.

\n soup wrap:

According to D-Bus specification, (b(oss)) is a struct of two elements, first is a boolean, second is a struct of three elements: an object path and two strings. In python this maps to something like:

dbus.Struct((dbus.Boolean(a_boolean),
             dbus.Struct((dbus.ObjectPath(s1),
                          dbus.String(s2),
                          dbus.String(s3)))),
            signature="(b(oss))")

but it can be used as if it was simply a python tuple like:

( a_boolean, (s1, s2, s3) )

Are you writing a client or a server? In the latter case you should also check this question which provides details on exporting properties using python dbus module.

qid & accept id: (5914627, 5917395) query: Prepend line to beginning of a file soup:

In modes 'a' or 'a+', any writing is done at the end of the file, even if at the current moment when the write() function is triggered the file's pointer is not at the end of the file: the pointer is moved to the end of file before any writing. You can do what you want in two manners.

\n

1st way, can be used if there are no issues to load the file into memory:

\n
def line_prepender(filename, line):\n    with open(filename, 'r+') as f:\n        content = f.read()\n        f.seek(0, 0)\n        f.write(line.rstrip('\r\n') + '\n' + content)\n
\n

2nd way:

\n
def line_pre_adder(filename, line_to_prepend):\n    f = fileinput.input(filename, inplace=1)\n    for xline in f:\n        if f.isfirstline():\n            print line_to_prepend.rstrip('\r\n') + '\n' + xline,\n        else:\n            print xline,\n
\n

I don't know how this method works under the hood and if it can be employed on big big file. The argument 1 passed to input is what allows to rewrite a line in place; the following lines must be moved forwards or backwards in order that the inplace operation takes place, but I don't know the mechanism

\n soup wrap:

In modes 'a' or 'a+', any writing is done at the end of the file, even if at the current moment when the write() function is triggered the file's pointer is not at the end of the file: the pointer is moved to the end of file before any writing. You can do what you want in two manners.

1st way, can be used if there are no issues to load the file into memory:

def line_prepender(filename, line):
    with open(filename, 'r+') as f:
        content = f.read()
        f.seek(0, 0)
        f.write(line.rstrip('\r\n') + '\n' + content)

2nd way:

def line_pre_adder(filename, line_to_prepend):
    f = fileinput.input(filename, inplace=1)
    for xline in f:
        if f.isfirstline():
            print line_to_prepend.rstrip('\r\n') + '\n' + xline,
        else:
            print xline,

I don't know how this method works under the hood and if it can be employed on big big file. The argument 1 passed to input is what allows to rewrite a line in place; the following lines must be moved forwards or backwards in order that the inplace operation takes place, but I don't know the mechanism

qid & accept id: (5930036, 5930176) query: Separating file extensions using python os.path module soup:

Split with os.extsep.

\n
>>> import os\n>>> 'filename.ext1.ext2'.split(os.extsep)\n['filename', 'ext1', 'ext2']\n
\n

If you want everything after the first dot:

\n
>>> 'filename.ext1.ext2'.split(os.extsep, 1)\n['filename', 'ext1.ext2']\n
\n

If you are using paths with directories that may contain dots:

\n
>>> def my_splitext(path):\n...     """splitext for paths with directories that may contain dots."""\n...     li = []\n...     path_without_extensions = os.path.join(os.path.dirname(path), os.path.basename(path).split(os.extsep)[0])\n...     extensions = os.path.basename(path).split(os.extsep)[1:]\n...     li.append(path_without_extensions)\n...     # li.append(extensions) if you want extensions in another list inside the list that is returned.\n...     li.extend(extensions)\n...     return li\n... \n>>> my_splitext('/path.with/dots./filename.ext1.ext2')\n['/path.with/dots./filename', 'ext1', 'ext2']\n
\n soup wrap:

Split with os.extsep.

>>> import os
>>> 'filename.ext1.ext2'.split(os.extsep)
['filename', 'ext1', 'ext2']

If you want everything after the first dot:

>>> 'filename.ext1.ext2'.split(os.extsep, 1)
['filename', 'ext1.ext2']

If you are using paths with directories that may contain dots:

>>> def my_splitext(path):
...     """splitext for paths with directories that may contain dots."""
...     li = []
...     path_without_extensions = os.path.join(os.path.dirname(path), os.path.basename(path).split(os.extsep)[0])
...     extensions = os.path.basename(path).split(os.extsep)[1:]
...     li.append(path_without_extensions)
...     # li.append(extensions) if you want extensions in another list inside the list that is returned.
...     li.extend(extensions)
...     return li
... 
>>> my_splitext('/path.with/dots./filename.ext1.ext2')
['/path.with/dots./filename', 'ext1', 'ext2']
qid & accept id: (5947137, 5947170) query: How can I use a list comprehension to extend a list in python? soup:

Do you mean something like this?

\n
accumulationList = []\nfor x in originalList:\n    accumulationList.extend(doSomething(x))\nreturn accumulationList\n
\n

or shorter code (but not optimal):

\n
return sum((doSomething(x) for x in originalList), [])\n
\n

or the same:

\n
return sum(map(doSomething, originalList), [])\n
\n

Thanks to @eyquem for the hint (if using Python 2.x):

\n
import itertools as it\n\nreturn sum(it.imap(doSomething, originalList), [])\n
\n soup wrap:

Do you mean something like this?

accumulationList = []
for x in originalList:
    accumulationList.extend(doSomething(x))
return accumulationList

or shorter code (but not optimal):

return sum((doSomething(x) for x in originalList), [])

or the same:

return sum(map(doSomething, originalList), [])

Thanks to @eyquem for the hint (if using Python 2.x):

import itertools as it

return sum(it.imap(doSomething, originalList), [])
qid & accept id: (5995478, 5995504) query: Is it possible to assign two different returned values from a python function to two separate variables? soup:

Python supports tuple unpacking.

\n
def foo():\n  return 'bar', 42\n\na, b = foo()\n
\n

It even works with other sequences.

\n
a, b = [c, d]\n
\n

Python 3.x extends the syntax.

\n
a, b, *c = (1, 2, 3, 4, 5)\n
\n soup wrap:

Python supports tuple unpacking.

def foo():
  return 'bar', 42

a, b = foo()

It even works with other sequences.

a, b = [c, d]

Python 3.x extends the syntax.

a, b, *c = (1, 2, 3, 4, 5)
qid & accept id: (5999241, 5999292) query: Using mimetools.Message in urllib2.urlopen soup:

Try using getheaders() to get a list of the cookies:

\n
>>> msg = resp.info()\n>>> msg.getheaders('Set-Cookie')\n['PREF=ID=5975a5ee255f0949:FF=0:TM=1305336283:LM=1305336283:S=1vkES6eF4Yxd-_oM; expires=Mon, 13-May-2013 01:24:43 GMT; path=/; domain=.google.com.au', 'NID=46=lQVFZg6yKUsoWT529Hqp5gA8B_CKYd2epPIbANmw_J0UzeMt2BhuMF-gtmGsRhenUTeajKz2zILXd9xWpHWT8ZGvDcmNdkzaGX-L_-sKyY1w4e2l3DKd80JzSkt2Vp-H; expires=Sun, 13-Nov-2011 01:24:43 GMT; path=/; domain=.google.com.au; HttpOnly']\n
\n

In this case, you get a list of two strings.

\n

Then you can iterate over that list and grab whichever cookie you like. str.startswith() is your friend:

\n
>>> cookies = msg.getheaders('Set-Cookie')\n>>> for cookie in cookies:\n...     if cookie.startswith('PREF='):\n...             print 'Got PREF: ', cookie\n...     else:\n...             print 'Got another: ', cookie\n... \nGot PREF:  PREF=ID=5975a5ee255f0949:FF=0:TM=1305336283:LM=1305336283:S=1vkES6eF4Yxd-_oM; expires=Mon, 13-May-2013 01:24:43 GMT; path=/; domain=.google.com.au\nGot another:  NID=46=lQVFZg6yKUsoWT529Hqp5gA8B_CKYd2epPIbANmw_J0UzeMt2BhuMF-gtmGsRhenUTeajKz2zILXd9xWpHWT8ZGvDcmNdkzaGX-L_-sKyY1w4e2l3DKd80JzSkt2Vp-H; expires=Sun, 13-Nov-2011 01:24:43 GMT; path=/; domain=.google.com.au; HttpOnly\n
\n

How a newbie can find the documentation in Python

\n
% python\nPython 2.7.1 (r271:86832, Jan 29 2011, 13:30:16) \n[GCC 4.2.1 (Apple Inc. build 5664)] on darwin\nType "help", "copyright", "credits" or "license" for more information.\n>>> import urllib2\n>>> req = urllib2.Request('http://www.google.com')\n>>> resp = urllib2.urlopen(req)\n>>> help(resp.info())\n
\n soup wrap:

Try using getheaders() to get a list of the cookies:

>>> msg = resp.info()
>>> msg.getheaders('Set-Cookie')
['PREF=ID=5975a5ee255f0949:FF=0:TM=1305336283:LM=1305336283:S=1vkES6eF4Yxd-_oM; expires=Mon, 13-May-2013 01:24:43 GMT; path=/; domain=.google.com.au', 'NID=46=lQVFZg6yKUsoWT529Hqp5gA8B_CKYd2epPIbANmw_J0UzeMt2BhuMF-gtmGsRhenUTeajKz2zILXd9xWpHWT8ZGvDcmNdkzaGX-L_-sKyY1w4e2l3DKd80JzSkt2Vp-H; expires=Sun, 13-Nov-2011 01:24:43 GMT; path=/; domain=.google.com.au; HttpOnly']

In this case, you get a list of two strings.

Then you can iterate over that list and grab whichever cookie you like. str.startswith() is your friend:

>>> cookies = msg.getheaders('Set-Cookie')
>>> for cookie in cookies:
...     if cookie.startswith('PREF='):
...             print 'Got PREF: ', cookie
...     else:
...             print 'Got another: ', cookie
... 
Got PREF:  PREF=ID=5975a5ee255f0949:FF=0:TM=1305336283:LM=1305336283:S=1vkES6eF4Yxd-_oM; expires=Mon, 13-May-2013 01:24:43 GMT; path=/; domain=.google.com.au
Got another:  NID=46=lQVFZg6yKUsoWT529Hqp5gA8B_CKYd2epPIbANmw_J0UzeMt2BhuMF-gtmGsRhenUTeajKz2zILXd9xWpHWT8ZGvDcmNdkzaGX-L_-sKyY1w4e2l3DKd80JzSkt2Vp-H; expires=Sun, 13-Nov-2011 01:24:43 GMT; path=/; domain=.google.com.au; HttpOnly

How a newbie can find the documentation in Python

% python
Python 2.7.1 (r271:86832, Jan 29 2011, 13:30:16) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> req = urllib2.Request('http://www.google.com')
>>> resp = urllib2.urlopen(req)
>>> help(resp.info())
qid & accept id: (6029912, 6030193) query: Boxplotting Masked Arrays soup:

I think you are right -- plt.boxplot ignores the mask if sent a masked array.\nSo it looks like you'll have to give boxplot some extra help by sending it only the values which are not masked. Since each row of the array may have a different number of unmasked values, you won't be able to use a numpy array. You'll have to form a Python sequence of vectors:

\n
z = [[y for y in row if y] for row in x.T]\n
\n

For example:

\n
import matplotlib.pyplot as plt\nimport numpy as np\n\nfig=plt.figure()\n\nN=20\nM=10\n\nx = np.random.random((M,N))\nmask=np.random.random_integers(0,1,N*M).reshape((M,N))\nx = np.ma.array(x,mask=mask)\nax1=fig.add_subplot(2,1,1)\nax1.boxplot(x)\n\nz = [[y for y in row if y] for row in x.T]\nax2=fig.add_subplot(2,1,2)\nax2.boxplot(z)\nplt.show()\n
\n

enter image description here

\n

Above, the first subplot shows a boxplot of all the data in x (ignoring the mask), and the second subplot shows a boxplot of only those values which are not masked.

\n soup wrap:

I think you are right -- plt.boxplot ignores the mask if sent a masked array. So it looks like you'll have to give boxplot some extra help by sending it only the values which are not masked. Since each row of the array may have a different number of unmasked values, you won't be able to use a numpy array. You'll have to form a Python sequence of vectors:

z = [[y for y in row if y] for row in x.T]

For example:

import matplotlib.pyplot as plt
import numpy as np

fig=plt.figure()

N=20
M=10

x = np.random.random((M,N))
mask=np.random.random_integers(0,1,N*M).reshape((M,N))
x = np.ma.array(x,mask=mask)
ax1=fig.add_subplot(2,1,1)
ax1.boxplot(x)

z = [[y for y in row if y] for row in x.T]
ax2=fig.add_subplot(2,1,2)
ax2.boxplot(z)
plt.show()

enter image description here

Above, the first subplot shows a boxplot of all the data in x (ignoring the mask), and the second subplot shows a boxplot of only those values which are not masked.

qid & accept id: (6046049, 6046151) query: python dictionary values sorting soup:

A simple solution for small dicts is

\n
dict1 = {"a":0.6, "b":0.3, "c":0.9, "d":1.2, "e":0.2}\ndict2 = {"a":1.4, "b":7.7, "c":9.0, "d":2.5, "e":2.0}\nk1 = sorted(dict1, key=dict1.get)\nk2 = sorted(dict2, key=dict2.get)\ndiffs = dict((k, k2.index(k) - k1.index(k)) for k in dict1)\n
\n

A more efficient, less readable version for larger dicts:

\n
ranks1 = dict(map(reversed, enumerate(sorted(dict1, key=dict1.get))))\nranks2 = dict(map(reversed, enumerate(sorted(dict2, key=dict2.get))))\ndiffs = dict((k, ranks2[k] - ranks1[k]) for k in dict1)\n
\n soup wrap:

A simple solution for small dicts is

dict1 = {"a":0.6, "b":0.3, "c":0.9, "d":1.2, "e":0.2}
dict2 = {"a":1.4, "b":7.7, "c":9.0, "d":2.5, "e":2.0}
k1 = sorted(dict1, key=dict1.get)
k2 = sorted(dict2, key=dict2.get)
diffs = dict((k, k2.index(k) - k1.index(k)) for k in dict1)

A more efficient, less readable version for larger dicts:

ranks1 = dict(map(reversed, enumerate(sorted(dict1, key=dict1.get))))
ranks2 = dict(map(reversed, enumerate(sorted(dict2, key=dict2.get))))
diffs = dict((k, ranks2[k] - ranks1[k]) for k in dict1)
qid & accept id: (6050187, 6050722) query: Write to file descriptor 3 of a Python subprocess.Popen object soup:

The subprocess proc inherits file descriptors opened in the parent process.\nSo you can use os.open to open passphrase.txt and obtain its associated file descriptor. You can then construct a command which uses that file descriptor:

\n
import subprocess\nimport shlex\nimport os\n\nfd=os.open('passphrase.txt',os.O_RDONLY)\ncmd='gpg --passphrase-fd {fd} -c'.format(fd=fd)\nwith open('filename.txt','r') as stdin_fh:\n    with open('filename.gpg','w') as stdout_fh:        \n        proc=subprocess.Popen(shlex.split(cmd),\n                              stdin=stdin_fh,\n                              stdout=stdout_fh)        \n        proc.communicate()\nos.close(fd)\n
\n
\n

To read from a pipe instead of a file, you could use os.pipe:

\n
import subprocess\nimport shlex\nimport os\n\nPASSPHRASE='...'\n\nin_fd,out_fd=os.pipe()\nos.write(out_fd,PASSPHRASE)\nos.close(out_fd)\ncmd='gpg --passphrase-fd {fd} -c'.format(fd=in_fd)\nwith open('filename.txt','r') as stdin_fh:\n    with open('filename.gpg','w') as stdout_fh:        \n        proc=subprocess.Popen(shlex.split(cmd),\n                              stdin=stdin_fh,\n                              stdout=stdout_fh )        \n        proc.communicate()\nos.close(in_fd)\n
\n soup wrap:

The subprocess proc inherits file descriptors opened in the parent process. So you can use os.open to open passphrase.txt and obtain its associated file descriptor. You can then construct a command which uses that file descriptor:

import subprocess
import shlex
import os

fd=os.open('passphrase.txt',os.O_RDONLY)
cmd='gpg --passphrase-fd {fd} -c'.format(fd=fd)
with open('filename.txt','r') as stdin_fh:
    with open('filename.gpg','w') as stdout_fh:        
        proc=subprocess.Popen(shlex.split(cmd),
                              stdin=stdin_fh,
                              stdout=stdout_fh)        
        proc.communicate()
os.close(fd)

To read from a pipe instead of a file, you could use os.pipe:

import subprocess
import shlex
import os

PASSPHRASE='...'

in_fd,out_fd=os.pipe()
os.write(out_fd,PASSPHRASE)
os.close(out_fd)
cmd='gpg --passphrase-fd {fd} -c'.format(fd=in_fd)
with open('filename.txt','r') as stdin_fh:
    with open('filename.gpg','w') as stdout_fh:        
        proc=subprocess.Popen(shlex.split(cmd),
                              stdin=stdin_fh,
                              stdout=stdout_fh )        
        proc.communicate()
os.close(in_fd)
qid & accept id: (6071784, 6072233) query: Regex: Match brackets both greedy and non greedy soup:

Pyparsing makes it easy to write simple one-off parsers for stuff like this:

\n
>>> text = """show the (name) of the (person)\n...\n... calc the sqrt of (+ (* (2 4) 3))"""\n>>> import pyparsing\n>>> for match in pyparsing.nestedExpr('(',')').searchString(text):\n...   print match[0]\n...\n['name']\n['person']\n['+', ['*', ['2', '4'], '3']]\n
\n

Note that the nesting parens have been discarded, and the nested text returned as a nested structure.

\n

If you want the original text for each parenthetical bit, then use the originalTextFor modifier:

\n
>>> for match in pyparsing.originalTextFor(pyparsing.nestedExpr('(',')')).searchString(text):\n...   print match[0]\n...\n(name)\n(person)\n(+ (* (2 4) 3))\n
\n soup wrap:

Pyparsing makes it easy to write simple one-off parsers for stuff like this:

>>> text = """show the (name) of the (person)
...
... calc the sqrt of (+ (* (2 4) 3))"""
>>> import pyparsing
>>> for match in pyparsing.nestedExpr('(',')').searchString(text):
...   print match[0]
...
['name']
['person']
['+', ['*', ['2', '4'], '3']]

Note that the nesting parens have been discarded, and the nested text returned as a nested structure.

If you want the original text for each parenthetical bit, then use the originalTextFor modifier:

>>> for match in pyparsing.originalTextFor(pyparsing.nestedExpr('(',')')).searchString(text):
...   print match[0]
...
(name)
(person)
(+ (* (2 4) 3))
qid & accept id: (6102103, 6602255) query: Using MongoEngine Document class methods for custom validation and pre-save hooks soup:

You can override save(), with the usual caveat that you must call the parent class's method.

\n

If you find that you want to add validation hooks to all your models, you might consider creating a custom child class of Document something like:

\n
class MyDocument(mongoengine.Document):\n\n    def save(self, *args, **kwargs):\n        for hook in self._pre_save_hooks:\n            # the callable can raise an exception if\n            # it determines that it is inappropriate\n            # to save this instance; or it can modify\n            # the instance before it is saved\n            hook(self):\n\n        super(MyDocument, self).save(*args, **kwargs)\n
\n

You can then define hooks for a given model class in a fairly natural way:

\n
class SomeModel(MyDocument):\n    # fields...\n\n    _pre_save_hooks = [\n        some_callable,\n        another_callable\n    ]\n
\n soup wrap:

You can override save(), with the usual caveat that you must call the parent class's method.

If you find that you want to add validation hooks to all your models, you might consider creating a custom child class of Document something like:

class MyDocument(mongoengine.Document):

    def save(self, *args, **kwargs):
        for hook in self._pre_save_hooks:
            # the callable can raise an exception if
            # it determines that it is inappropriate
            # to save this instance; or it can modify
            # the instance before it is saved
            hook(self):

        super(MyDocument, self).save(*args, **kwargs)

You can then define hooks for a given model class in a fairly natural way:

class SomeModel(MyDocument):
    # fields...

    _pre_save_hooks = [
        some_callable,
        another_callable
    ]
qid & accept id: (6154424, 6358083) query: Mixed content (float || unicode) for database column soup:

For a quick and dirty solution I would suggest at least using two different columns to store different answers. You can also add a CHECK constraint to the database to ensure that exactly one of them is used for any row and the other is NULL. Than do the quick-n-dirty code to calculate total Test score.

\n

The alternative

\n

The idea is build the proper object model, map it to RDMBS and the question does not need to be asked. Also I expect that when using Single Table Inheritance, the resulting DB schema would be almost identical to the current implementation (you can see the model when you run the script with the option echo=True):

\n
CREATE TABLE questions (\n    id INTEGER NOT NULL, \n    text VARCHAR NOT NULL, \n    type VARCHAR(10) NOT NULL, \n    PRIMARY KEY (id)\n)\n\nCREATE TABLE answer_options (\n    id INTEGER NOT NULL, \n    question_id INTEGER NOT NULL, \n    value INTEGER NOT NULL, \n    type VARCHAR(10) NOT NULL, \n    text VARCHAR, \n    input INTEGER, \n    PRIMARY KEY (id), \n    FOREIGN KEY(question_id) REFERENCES questions (id)\n)\n\nCREATE TABLE answers (\n    id INTEGER NOT NULL, \n    type VARCHAR(10) NOT NULL, \n    question_id INTEGER, \n    test_id INTEGER, \n    answer_option_id INTEGER, \n    answer_input INTEGER, \n    PRIMARY KEY (id), \n    FOREIGN KEY(question_id) REFERENCES questions (id), \n    FOREIGN KEY(answer_option_id) REFERENCES answer_options (id), \n    --FOREIGN KEY(test_id) REFERENCES tests (id)\n)\n
\n

The code below is a complete working script that shows both the object model, its mapping to the database and the usage scenarios. As it is designed, the model is easily extendable with other types of questions/answers without any impact on existing classes. Basically you get less hacky and more flexible code simply because you have an object model which properly reflects your case. The code is below:

\n
from sqlalchemy import create_engine, Column, Integer, SmallInteger, String, ForeignKey, Table, Index\nfrom sqlalchemy.orm import relationship, scoped_session, sessionmaker\nfrom sqlalchemy.ext.declarative import declarative_base\n\n# Configure test data SA\nengine = create_engine('sqlite:///:memory:', echo=True)\nsession = scoped_session(sessionmaker(bind=engine))\nBase = declarative_base()\nBase.query = session.query_property()\n\nclass _BaseMixin(object):\n    """ Just a helper mixin class to set properties on object creation.  \n    Also provides a convenient default __repr__() function, but be aware that \n    also relationships are printed, which might result in loading relations.\n    """\n    def __init__(self, **kwargs):\n        for k,v in kwargs.items():\n            setattr(self, k, v)\n\n    def __repr__(self):\n        return "<%s(%s)>" % (self.__class__.__name__, \n            ', '.join('%s=%r' % (k, self.__dict__[k]) \n                for k in sorted(self.__dict__) if '_sa_' != k[:4] and '_backref_' != k[:9])\n            )\n\n### AnswerOption hierarchy\nclass AnswerOption(Base, _BaseMixin):\n    """ Possible answer options (choice or any other configuration).  """\n    __tablename__ = u'answer_options'\n    id = Column(Integer, primary_key=True)\n    question_id = Column(Integer, ForeignKey('questions.id'), nullable=False)\n    value = Column(Integer, nullable=False)\n    type = Column(String(10), nullable=False)\n    __mapper_args__ = {'polymorphic_on': type}\n\nclass AnswerOptionChoice(AnswerOption):\n    """ A possible answer choice for the question.  """\n    text = Column(String, nullable=True) # when mapped to single-table, must be NULL in the DB\n    __mapper_args__ = {'polymorphic_identity': 'choice'}\n\nclass AnswerOptionInput(AnswerOption):\n    """ A configuration entry for the input-type of questions.  """\n    input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB\n    __mapper_args__ = {'polymorphic_identity': 'input'}\n\n### Question hierarchy\nclass Question(Base, _BaseMixin):\n    """ Base class for all types of questions.  """\n    __tablename__ = u'questions'\n    id = Column(Integer, primary_key=True)\n    text = Column(String, nullable=False)\n    type = Column(String(10), nullable=False)\n    answer_options = relationship(AnswerOption, backref='question')\n    __mapper_args__ = {'polymorphic_on': type}\n\n    def get_answer_value(self, answer):\n        """ function to get a value of the answer to the question.  """\n        raise Exception('must be implemented in a subclass')\n\nclass QuestionChoice(Question):\n    """ Single-choice question.  """\n    __mapper_args__ = {'polymorphic_identity': 'choice'}\n\n    def get_answer_value(self, answer):\n        assert isinstance(answer, AnswerChoice)\n        assert answer.answer_option in self.answer_options, "Incorrect choice"\n        return answer.answer_option.value\n\nclass QuestionInput(Question):\n    """ Input type question.  """\n    __mapper_args__ = {'polymorphic_identity': 'input'}\n\n    def get_answer_value(self, answer):\n        assert isinstance(answer, AnswerInput)\n        value_list = sorted([(_i.input, _i.value) for _i in self.answer_options])\n        if not value_list:\n            raise Exception("no input is specified for the question {0}".format(self))\n        if answer.answer_input <= value_list[0][0]:\n            return value_list[0][1]\n        elif answer.answer_input >= value_list[-1][0]:\n            return value_list[-1][1]\n        else: # interpolate in the range:\n            for _pos in range(len(value_list)-1):\n                if answer.answer_input == value_list[_pos+1][0]:\n                    return value_list[_pos+1][1]\n                elif answer.answer_input < value_list[_pos+1][0]:\n                    # interpolate between (_pos, _pos+1)\n                    assert (value_list[_pos][0] != value_list[_pos+1][0])\n                    return value_list[_pos][1] + (value_list[_pos+1][1] - value_list[_pos][1]) * (answer.answer_input - value_list[_pos][0]) / (value_list[_pos+1][0] - value_list[_pos][0])\n        assert False, "should never reach here"\n\n### Answer hierarchy\nclass Answer(Base, _BaseMixin):\n    """ Represents an answer to the question.  """\n    __tablename__ = u'answers'\n    id = Column(Integer, primary_key=True)\n    type = Column(String(10), nullable=False)\n    question_id = Column(Integer, ForeignKey('questions.id'), nullable=True) # when mapped to single-table, must be NULL in the DB\n    question = relationship(Question)\n    test_id = Column(Integer, ForeignKey('tests.id'), nullable=True) # @todo: decide if allow answers without a Test\n    __mapper_args__ = {'polymorphic_on': type}\n\n    def get_value(self):\n        return self.question.get_answer_value(self)\n\nclass AnswerChoice(Answer):\n    """ Represents an answer to the *Choice* question.  """\n    __mapper_args__ = {'polymorphic_identity': 'choice'}\n    answer_option_id = Column(Integer, ForeignKey('answer_options.id'), nullable=True) \n    answer_option = relationship(AnswerOption, single_parent=True)\n\nclass AnswerInput(Answer):\n    """ Represents an answer to the *Choice* question.  """\n    __mapper_args__ = {'polymorphic_identity': 'input'}\n    answer_input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB\n\n### other classes (Questionnaire, Test) and helper tables\nassociation_table = Table('questionnaire_question', Base.metadata,\n    Column('id', Integer, primary_key=True),\n    Column('questionnaire_id', Integer, ForeignKey('questions.id')),\n    Column('question_id', Integer, ForeignKey('questionnaires.id'))\n)\n_idx = Index('questionnaire_question_u_nci', \n            association_table.c.questionnaire_id, \n            association_table.c.question_id, \n            unique=True)\n\nclass Questionnaire(Base, _BaseMixin):\n    """ Questionnaire is a compilation of questions.  """\n    __tablename__ = u'questionnaires'\n    id = Column(Integer, primary_key=True)\n    name = Column(String, nullable=False)\n    # @note: could use relationship with order or even add question number\n    questions = relationship(Question, secondary=association_table)\n\nclass Test(Base, _BaseMixin):\n    """ Test is a 'test' - set of answers for a given questionnaire. """\n    __tablename__ = u'tests'\n    id = Column(Integer, primary_key=True)\n    # @todo: add user name or reference\n    questionnaire_id = Column(Integer, ForeignKey('questionnaires.id'), nullable=False)\n    questionnaire = relationship(Questionnaire, single_parent=True)\n    answers = relationship(Answer, backref='test')\n    def total_points(self):\n        return sum(ans.get_value() for ans in self.answers)\n\n# -- end of model definition --\n\nBase.metadata.create_all(engine)\n\n# -- insert test data --\nprint '-' * 20 + ' Insert TEST DATA ...'\nq1 =  QuestionChoice(text="What is your fav pet?")\nq1c1 = AnswerOptionChoice(text="cat", value=1, question=q1)\nq1c2 = AnswerOptionChoice(text="dog", value=2, question=q1)\nq1c3 = AnswerOptionChoice(text="caiman", value=3)\nq1.answer_options.append(q1c3)\na1 = AnswerChoice(question=q1, answer_option=q1c2)\nassert a1.get_value() == 2\nsession.add(a1)\nsession.flush()\n\nq2 =  QuestionInput(text="How many liters of beer do you drink a day?")\nq2i1 = AnswerOptionInput(input=0, value=0, question=q2)\nq2i2 = AnswerOptionInput(input=1, value=1, question=q2)\nq2i3 = AnswerOptionInput(input=3, value=5)\nq2.answer_options.append(q2i3)\n\n# test interpolation routine\n_test_ip = ((-100, 0),\n            (0, 0),\n            (0.5, 0.5),\n            (1, 1),\n            (2, 3),\n            (3, 5),\n            (100, 5)\n)\na2 = AnswerInput(question=q2, answer_input=None)\nfor _inp, _exp in _test_ip:\n    a2.answer_input = _inp\n    _res = a2.get_value()\n    assert _res == _exp, "{0}: {1} != {2}".format(_inp, _res, _exp)\na2.answer_input = 2\nsession.add(a2)\nsession.flush()\n\n# create a Questionnaire and a Test\nqn = Questionnaire(name='test questionnaire')\nqn.questions.append(q1)\nqn.questions.append(q2)\nsession.add(qn)\nte = Test(questionnaire=qn)\nte.answers.append(a1)\nte.answers.append(a2)\nassert te.total_points() == 5\nsession.add(te)\nsession.flush()\n\n# -- other tests --\nprint '-' * 20 + ' TEST QUERIES ...'\nsession.expunge_all() # clear the session cache\na1 = session.query(Answer).get(1)\nassert a1.get_value() == 2 # @note: will load all dependant objects (question and answer_options) automatically to compute the value\na2 = session.query(Answer).get(2)\nassert a2.get_value() == 3 # @note: will load all dependant objects (question and answer_options) automatically to compute the value\nte = session.query(Test).get(1)\nassert te.total_points() == 5\n
\n

I hope that this version of the code answers all the questions asked in the comments.

\n soup wrap:

For a quick and dirty solution I would suggest at least using two different columns to store different answers. You can also add a CHECK constraint to the database to ensure that exactly one of them is used for any row and the other is NULL. Than do the quick-n-dirty code to calculate total Test score.

The alternative

The idea is build the proper object model, map it to RDMBS and the question does not need to be asked. Also I expect that when using Single Table Inheritance, the resulting DB schema would be almost identical to the current implementation (you can see the model when you run the script with the option echo=True):

CREATE TABLE questions (
    id INTEGER NOT NULL, 
    text VARCHAR NOT NULL, 
    type VARCHAR(10) NOT NULL, 
    PRIMARY KEY (id)
)

CREATE TABLE answer_options (
    id INTEGER NOT NULL, 
    question_id INTEGER NOT NULL, 
    value INTEGER NOT NULL, 
    type VARCHAR(10) NOT NULL, 
    text VARCHAR, 
    input INTEGER, 
    PRIMARY KEY (id), 
    FOREIGN KEY(question_id) REFERENCES questions (id)
)

CREATE TABLE answers (
    id INTEGER NOT NULL, 
    type VARCHAR(10) NOT NULL, 
    question_id INTEGER, 
    test_id INTEGER, 
    answer_option_id INTEGER, 
    answer_input INTEGER, 
    PRIMARY KEY (id), 
    FOREIGN KEY(question_id) REFERENCES questions (id), 
    FOREIGN KEY(answer_option_id) REFERENCES answer_options (id), 
    --FOREIGN KEY(test_id) REFERENCES tests (id)
)

The code below is a complete working script that shows both the object model, its mapping to the database and the usage scenarios. As it is designed, the model is easily extendable with other types of questions/answers without any impact on existing classes. Basically you get less hacky and more flexible code simply because you have an object model which properly reflects your case. The code is below:

from sqlalchemy import create_engine, Column, Integer, SmallInteger, String, ForeignKey, Table, Index
from sqlalchemy.orm import relationship, scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

# Configure test data SA
engine = create_engine('sqlite:///:memory:', echo=True)
session = scoped_session(sessionmaker(bind=engine))
Base = declarative_base()
Base.query = session.query_property()

class _BaseMixin(object):
    """ Just a helper mixin class to set properties on object creation.  
    Also provides a convenient default __repr__() function, but be aware that 
    also relationships are printed, which might result in loading relations.
    """
    def __init__(self, **kwargs):
        for k,v in kwargs.items():
            setattr(self, k, v)

    def __repr__(self):
        return "<%s(%s)>" % (self.__class__.__name__, 
            ', '.join('%s=%r' % (k, self.__dict__[k]) 
                for k in sorted(self.__dict__) if '_sa_' != k[:4] and '_backref_' != k[:9])
            )

### AnswerOption hierarchy
class AnswerOption(Base, _BaseMixin):
    """ Possible answer options (choice or any other configuration).  """
    __tablename__ = u'answer_options'
    id = Column(Integer, primary_key=True)
    question_id = Column(Integer, ForeignKey('questions.id'), nullable=False)
    value = Column(Integer, nullable=False)
    type = Column(String(10), nullable=False)
    __mapper_args__ = {'polymorphic_on': type}

class AnswerOptionChoice(AnswerOption):
    """ A possible answer choice for the question.  """
    text = Column(String, nullable=True) # when mapped to single-table, must be NULL in the DB
    __mapper_args__ = {'polymorphic_identity': 'choice'}

class AnswerOptionInput(AnswerOption):
    """ A configuration entry for the input-type of questions.  """
    input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB
    __mapper_args__ = {'polymorphic_identity': 'input'}

### Question hierarchy
class Question(Base, _BaseMixin):
    """ Base class for all types of questions.  """
    __tablename__ = u'questions'
    id = Column(Integer, primary_key=True)
    text = Column(String, nullable=False)
    type = Column(String(10), nullable=False)
    answer_options = relationship(AnswerOption, backref='question')
    __mapper_args__ = {'polymorphic_on': type}

    def get_answer_value(self, answer):
        """ function to get a value of the answer to the question.  """
        raise Exception('must be implemented in a subclass')

class QuestionChoice(Question):
    """ Single-choice question.  """
    __mapper_args__ = {'polymorphic_identity': 'choice'}

    def get_answer_value(self, answer):
        assert isinstance(answer, AnswerChoice)
        assert answer.answer_option in self.answer_options, "Incorrect choice"
        return answer.answer_option.value

class QuestionInput(Question):
    """ Input type question.  """
    __mapper_args__ = {'polymorphic_identity': 'input'}

    def get_answer_value(self, answer):
        assert isinstance(answer, AnswerInput)
        value_list = sorted([(_i.input, _i.value) for _i in self.answer_options])
        if not value_list:
            raise Exception("no input is specified for the question {0}".format(self))
        if answer.answer_input <= value_list[0][0]:
            return value_list[0][1]
        elif answer.answer_input >= value_list[-1][0]:
            return value_list[-1][1]
        else: # interpolate in the range:
            for _pos in range(len(value_list)-1):
                if answer.answer_input == value_list[_pos+1][0]:
                    return value_list[_pos+1][1]
                elif answer.answer_input < value_list[_pos+1][0]:
                    # interpolate between (_pos, _pos+1)
                    assert (value_list[_pos][0] != value_list[_pos+1][0])
                    return value_list[_pos][1] + (value_list[_pos+1][1] - value_list[_pos][1]) * (answer.answer_input - value_list[_pos][0]) / (value_list[_pos+1][0] - value_list[_pos][0])
        assert False, "should never reach here"

### Answer hierarchy
class Answer(Base, _BaseMixin):
    """ Represents an answer to the question.  """
    __tablename__ = u'answers'
    id = Column(Integer, primary_key=True)
    type = Column(String(10), nullable=False)
    question_id = Column(Integer, ForeignKey('questions.id'), nullable=True) # when mapped to single-table, must be NULL in the DB
    question = relationship(Question)
    test_id = Column(Integer, ForeignKey('tests.id'), nullable=True) # @todo: decide if allow answers without a Test
    __mapper_args__ = {'polymorphic_on': type}

    def get_value(self):
        return self.question.get_answer_value(self)

class AnswerChoice(Answer):
    """ Represents an answer to the *Choice* question.  """
    __mapper_args__ = {'polymorphic_identity': 'choice'}
    answer_option_id = Column(Integer, ForeignKey('answer_options.id'), nullable=True) 
    answer_option = relationship(AnswerOption, single_parent=True)

class AnswerInput(Answer):
    """ Represents an answer to the *Choice* question.  """
    __mapper_args__ = {'polymorphic_identity': 'input'}
    answer_input = Column(Integer, nullable=True) # when mapped to single-table, must be NULL in the DB

### other classes (Questionnaire, Test) and helper tables
association_table = Table('questionnaire_question', Base.metadata,
    Column('id', Integer, primary_key=True),
    Column('questionnaire_id', Integer, ForeignKey('questions.id')),
    Column('question_id', Integer, ForeignKey('questionnaires.id'))
)
_idx = Index('questionnaire_question_u_nci', 
            association_table.c.questionnaire_id, 
            association_table.c.question_id, 
            unique=True)

class Questionnaire(Base, _BaseMixin):
    """ Questionnaire is a compilation of questions.  """
    __tablename__ = u'questionnaires'
    id = Column(Integer, primary_key=True)
    name = Column(String, nullable=False)
    # @note: could use relationship with order or even add question number
    questions = relationship(Question, secondary=association_table)

class Test(Base, _BaseMixin):
    """ Test is a 'test' - set of answers for a given questionnaire. """
    __tablename__ = u'tests'
    id = Column(Integer, primary_key=True)
    # @todo: add user name or reference
    questionnaire_id = Column(Integer, ForeignKey('questionnaires.id'), nullable=False)
    questionnaire = relationship(Questionnaire, single_parent=True)
    answers = relationship(Answer, backref='test')
    def total_points(self):
        return sum(ans.get_value() for ans in self.answers)

# -- end of model definition --

Base.metadata.create_all(engine)

# -- insert test data --
print '-' * 20 + ' Insert TEST DATA ...'
q1 =  QuestionChoice(text="What is your fav pet?")
q1c1 = AnswerOptionChoice(text="cat", value=1, question=q1)
q1c2 = AnswerOptionChoice(text="dog", value=2, question=q1)
q1c3 = AnswerOptionChoice(text="caiman", value=3)
q1.answer_options.append(q1c3)
a1 = AnswerChoice(question=q1, answer_option=q1c2)
assert a1.get_value() == 2
session.add(a1)
session.flush()

q2 =  QuestionInput(text="How many liters of beer do you drink a day?")
q2i1 = AnswerOptionInput(input=0, value=0, question=q2)
q2i2 = AnswerOptionInput(input=1, value=1, question=q2)
q2i3 = AnswerOptionInput(input=3, value=5)
q2.answer_options.append(q2i3)

# test interpolation routine
_test_ip = ((-100, 0),
            (0, 0),
            (0.5, 0.5),
            (1, 1),
            (2, 3),
            (3, 5),
            (100, 5)
)
a2 = AnswerInput(question=q2, answer_input=None)
for _inp, _exp in _test_ip:
    a2.answer_input = _inp
    _res = a2.get_value()
    assert _res == _exp, "{0}: {1} != {2}".format(_inp, _res, _exp)
a2.answer_input = 2
session.add(a2)
session.flush()

# create a Questionnaire and a Test
qn = Questionnaire(name='test questionnaire')
qn.questions.append(q1)
qn.questions.append(q2)
session.add(qn)
te = Test(questionnaire=qn)
te.answers.append(a1)
te.answers.append(a2)
assert te.total_points() == 5
session.add(te)
session.flush()

# -- other tests --
print '-' * 20 + ' TEST QUERIES ...'
session.expunge_all() # clear the session cache
a1 = session.query(Answer).get(1)
assert a1.get_value() == 2 # @note: will load all dependant objects (question and answer_options) automatically to compute the value
a2 = session.query(Answer).get(2)
assert a2.get_value() == 3 # @note: will load all dependant objects (question and answer_options) automatically to compute the value
te = session.query(Test).get(1)
assert te.total_points() == 5

I hope that this version of the code answers all the questions asked in the comments.

qid & accept id: (6165277, 6165303) query: compare list elements soup:

If you are manipulating numerical data, consider using numpy

\n
import numpy as np\n\nlst = [3.18,10.57,14.95]\narr = np.array(lst)\n\ndiff = np.diff(arr)\n\n>>> diff\narray([ 7.39,  4.38])\n
\n

You can convert it back to list if you have to:

\n
diff_list = list(diff)\n
\n

Otherwise you can iterate over it just like you iterate over a list:

\n
for item in diff: \n    print(item)\n\n7.39\n4.38\n
\n

EDIT: the five solutions I timed were pretty close to each other, so choose the one that's easier to read

\n
t = timeit.Timer("[b - a for a, b in zip(l, l[1:])]", "l = range(int(1e6))")\nprint(t.timeit(1))\n>>> 0.523894071579\n\nt = timeit.Timer("list(np.diff(np.array(l)))", "import numpy as np; l = range(int(1e6))")\nprint(t.timeit(1))\n>>> 0.484916915894\n\nt = timeit.Timer("diffs = [l[x + 1] - l[x] for x in range(len(l) - 1)]", "l = range(int(1e6))")\nprint(t.timeit(1))\n>>> 0.363043069839\n\nt = timeit.Timer("[(x, y, y - x) for (x, y) in itertools.izip(l, it)]", "l = range(int(1e6)); it = iter(l); it.next()")\nprint(t.timeit(1))\n>>> 0.54354596138\n\n# pairwise solution\nt = timeit.Timer("a, b = itertools.tee(l); next(b, None); [(x, y) for x, y in itertools.izip(a, b)]", "l = range(int(1e6));")\nprint(t.timeit(1))\n>>> 0.477301120758\n
\n soup wrap:

If you are manipulating numerical data, consider using numpy

import numpy as np

lst = [3.18,10.57,14.95]
arr = np.array(lst)

diff = np.diff(arr)

>>> diff
array([ 7.39,  4.38])

You can convert it back to list if you have to:

diff_list = list(diff)

Otherwise you can iterate over it just like you iterate over a list:

for item in diff: 
    print(item)

7.39
4.38

EDIT: the five solutions I timed were pretty close to each other, so choose the one that's easier to read

t = timeit.Timer("[b - a for a, b in zip(l, l[1:])]", "l = range(int(1e6))")
print(t.timeit(1))
>>> 0.523894071579

t = timeit.Timer("list(np.diff(np.array(l)))", "import numpy as np; l = range(int(1e6))")
print(t.timeit(1))
>>> 0.484916915894

t = timeit.Timer("diffs = [l[x + 1] - l[x] for x in range(len(l) - 1)]", "l = range(int(1e6))")
print(t.timeit(1))
>>> 0.363043069839

t = timeit.Timer("[(x, y, y - x) for (x, y) in itertools.izip(l, it)]", "l = range(int(1e6)); it = iter(l); it.next()")
print(t.timeit(1))
>>> 0.54354596138

# pairwise solution
t = timeit.Timer("a, b = itertools.tee(l); next(b, None); [(x, y) for x, y in itertools.izip(a, b)]", "l = range(int(1e6));")
print(t.timeit(1))
>>> 0.477301120758
qid & accept id: (6205592, 6206154) query: How to write small DSL parser with operator module in python soup:

Like this.

\n
class Rule( object ):\n    def __init__( self, text ):\n        self.text= text\n    def test( self, A, B, C, D, E, F, G ):\n        return eval( self.text )\n\nr1= Rule( "A==B" )\nr2= Rule( "A==B and B==C" )\nr3= Rule( "A in {listname!s}".format( listname=someList ) )\n
\n

etc.

\n
>>> r1.test( 89,  92,  18,  7,   90,  35, 60 )\nFalse\n
\n
\n

Edit.

\n
    \n
  • str(A) march regex"[2-5][0-2]"
  • \n
  • myfoo(A) > 100
  • \n
  • A is in myfoo(B)
  • \n
\n

These are all trivial Python code. I'm not sure why the comment is even included as being interesting or difficult.

\n
r4= Rule( "re.match( r'[2-5][0-2]', str(A) )" )\nr5= Rule( "myfoo(A) > 100" )\nr6= Rule( "A in myfoo(B)" )\n
\n

There's a trick to this. The trick is to write the Python code; and then enclose the code in quotes. Any Python code is legal.

\n

If the Python aspect of these rules is confusion, a Python tutorial may be helpful.

\n soup wrap:

Like this.

class Rule( object ):
    def __init__( self, text ):
        self.text= text
    def test( self, A, B, C, D, E, F, G ):
        return eval( self.text )

r1= Rule( "A==B" )
r2= Rule( "A==B and B==C" )
r3= Rule( "A in {listname!s}".format( listname=someList ) )

etc.

>>> r1.test( 89,  92,  18,  7,   90,  35, 60 )
False

Edit.

  • str(A) march regex"[2-5][0-2]"
  • myfoo(A) > 100
  • A is in myfoo(B)

These are all trivial Python code. I'm not sure why the comment is even included as being interesting or difficult.

r4= Rule( "re.match( r'[2-5][0-2]', str(A) )" )
r5= Rule( "myfoo(A) > 100" )
r6= Rule( "A in myfoo(B)" )

There's a trick to this. The trick is to write the Python code; and then enclose the code in quotes. Any Python code is legal.

If the Python aspect of these rules is confusion, a Python tutorial may be helpful.

qid & accept id: (6220490, 6221293) query: Reading files in parallel in python soup:

Why not take a simple approach:

\n
    \n
  • Open each file sequentially and read its lines to fill an in-memory data structure
  • \n
  • Perform statistics on the in-memory data structure
  • \n
\n

Here is a self-contained example with 3 "files", each containing 3 lines. It uses StringIO for convenience instead of actual files:

\n
#!/usr/bin/env python\n# coding: utf-8\n\nfrom StringIO import StringIO\n\n# for this example, each "file" has 3 lines instead of 100000\nf1 = '1\t10\n2\t11\n3\t12'\nf2 = '1\t13\n2\t14\n3\t15'\nf3 = '1\t16\n2\t17\n3\t18'\n\nfiles = [f1, f2, f3]\n\n# data is a list of dictionaries mapping population to average age\n# i.e. data[0][10000] contains the average age in location 0 (files[0]) with\n# population of 10000.\ndata = []\n\nfor i,filename in enumerate(files):\n    f = StringIO(filename)\n    # f = open(filename, 'r')\n    data.append(dict())\n\n    for line in f:\n        population, average_age = (int(s) for s in line.split('\t'))\n        data[i][population] = average_age\n\nprint data\n\n# gather custom statistics on the data\n\n# i.e. here's how to calculate the average age across all locations where\n# population is 2:\nnum_locations = len(data)\npop2_avg = sum((data[loc][2] for loc in xrange(num_locations)))/num_locations\nprint 'Average age with population 2 is', pop2_avg, 'years old'\n
\n

The output is:

\n
[{1: 10, 2: 11, 3: 12}, {1: 13, 2: 14, 3: 15}, {1: 16, 2: 17, 3: 18}]\nAverage age with population 2 is 14 years old\n
\n soup wrap:

Why not take a simple approach:

  • Open each file sequentially and read its lines to fill an in-memory data structure
  • Perform statistics on the in-memory data structure

Here is a self-contained example with 3 "files", each containing 3 lines. It uses StringIO for convenience instead of actual files:

#!/usr/bin/env python
# coding: utf-8

from StringIO import StringIO

# for this example, each "file" has 3 lines instead of 100000
f1 = '1\t10\n2\t11\n3\t12'
f2 = '1\t13\n2\t14\n3\t15'
f3 = '1\t16\n2\t17\n3\t18'

files = [f1, f2, f3]

# data is a list of dictionaries mapping population to average age
# i.e. data[0][10000] contains the average age in location 0 (files[0]) with
# population of 10000.
data = []

for i,filename in enumerate(files):
    f = StringIO(filename)
    # f = open(filename, 'r')
    data.append(dict())

    for line in f:
        population, average_age = (int(s) for s in line.split('\t'))
        data[i][population] = average_age

print data

# gather custom statistics on the data

# i.e. here's how to calculate the average age across all locations where
# population is 2:
num_locations = len(data)
pop2_avg = sum((data[loc][2] for loc in xrange(num_locations)))/num_locations
print 'Average age with population 2 is', pop2_avg, 'years old'

The output is:

[{1: 10, 2: 11, 3: 12}, {1: 13, 2: 14, 3: 15}, {1: 16, 2: 17, 3: 18}]
Average age with population 2 is 14 years old
qid & accept id: (6235146, 6235318) query: Converting separate functions into class-based soup:

Django actually already includes a login_required decorator that makes handling user authentication trivial. Just include the following at the top of your view.py page:

\n
from django.contrib.auth.decorators import login_required\n
\n

and then add

\n
@login_required \n
\n

before any views that require a login. It even handles redirecting the user to the appropriate page once they log in.

\n

More info here:\nhttps://docs.djangoproject.com/en/dev/topics/auth/#the-login-required-decorator

\n

This should greatly simplify your views, and may result in not having to write a separate class, since all that's left is a simple re-direct.

\n

As for the variables, each request already contains a request.user object with information on the user. You can do a search in the docs for Request and response objects to learn more.

\n

You can use that user object to get the profile variable by extending the user module. Set AUTH_PROFILE_MODULE = 'myapp.UserProfile' in your Settings, which will allow you to access a users profile as follows:

\n
user.get_profile().location. \n
\n

More about that here:\nhttp://www.b-list.org/weblog/2006/jun/06/django-tips-extending-user-model/

\n soup wrap:

Django actually already includes a login_required decorator that makes handling user authentication trivial. Just include the following at the top of your view.py page:

from django.contrib.auth.decorators import login_required

and then add

@login_required 

before any views that require a login. It even handles redirecting the user to the appropriate page once they log in.

More info here: https://docs.djangoproject.com/en/dev/topics/auth/#the-login-required-decorator

This should greatly simplify your views, and may result in not having to write a separate class, since all that's left is a simple re-direct.

As for the variables, each request already contains a request.user object with information on the user. You can do a search in the docs for Request and response objects to learn more.

You can use that user object to get the profile variable by extending the user module. Set AUTH_PROFILE_MODULE = 'myapp.UserProfile' in your Settings, which will allow you to access a users profile as follows:

user.get_profile().location. 

More about that here: http://www.b-list.org/weblog/2006/jun/06/django-tips-extending-user-model/

qid & accept id: (6237378, 6237842) query: insert into sqlite table with unique column soup:

You could use INSERT OR REPLACE to update rows with a unique constraint,\nor INSERT OR IGNORE to ignore inserts which conflict with a unique constraint:

\n
import sqlite3\n\ndef insert_or_replace():\n    # https://sqlite.org/lang_insert.html\n    connection=sqlite3.connect(':memory:')\n    cursor=connection.cursor()\n    cursor.execute('CREATE TABLE foo (bar INTEGER UNIQUE, baz INTEGER)')\n    cursor.execute('INSERT INTO foo (bar,baz) VALUES (?, ?)',(1,2))\n    cursor.execute('INSERT OR REPLACE INTO foo (bar,baz) VALUES (?, ?)',(1,3))\n    cursor.execute('SELECT * from foo')\n    data=cursor.fetchall()\n    print(data)\n    # [(1, 3)]\n\n\ndef on_conflict():\n    # https://sqlite.org/lang_insert.html\n    connection=sqlite3.connect(':memory:')\n    cursor=connection.cursor()\n    cursor.execute('CREATE TABLE foo (bar INTEGER UNIQUE, baz INTEGER)')\n    cursor.execute('INSERT INTO foo (bar,baz) VALUES (?, ?)',(1,2))\n    cursor.execute('INSERT OR IGNORE INTO foo (bar,baz) VALUES (?, ?)',(1,3))\n    cursor.execute('SELECT * from foo')\n    data=cursor.fetchall()\n    print(data)\n    # [(1, 2)]    \n\ninsert_or_replace()\non_conflict()\n
\n

These sqlite commands are probably faster than writing Python code to do the same thing, though to test this you could use Python's timeit module to test the speed of various implementations. For example, you could run

\n
python -mtimeit -s'import test' 'test.insert_or_replace()'\n
\n

versus

\n
python -mtimeit -s'import test' 'test.filter_nonunique_rows_in_Python()'\n
\n

versus

\n
python -mtimeit -s'import test' 'test.insert_with_try_catch_blocks()'\n
\n soup wrap:

You could use INSERT OR REPLACE to update rows with a unique constraint, or INSERT OR IGNORE to ignore inserts which conflict with a unique constraint:

import sqlite3

def insert_or_replace():
    # https://sqlite.org/lang_insert.html
    connection=sqlite3.connect(':memory:')
    cursor=connection.cursor()
    cursor.execute('CREATE TABLE foo (bar INTEGER UNIQUE, baz INTEGER)')
    cursor.execute('INSERT INTO foo (bar,baz) VALUES (?, ?)',(1,2))
    cursor.execute('INSERT OR REPLACE INTO foo (bar,baz) VALUES (?, ?)',(1,3))
    cursor.execute('SELECT * from foo')
    data=cursor.fetchall()
    print(data)
    # [(1, 3)]


def on_conflict():
    # https://sqlite.org/lang_insert.html
    connection=sqlite3.connect(':memory:')
    cursor=connection.cursor()
    cursor.execute('CREATE TABLE foo (bar INTEGER UNIQUE, baz INTEGER)')
    cursor.execute('INSERT INTO foo (bar,baz) VALUES (?, ?)',(1,2))
    cursor.execute('INSERT OR IGNORE INTO foo (bar,baz) VALUES (?, ?)',(1,3))
    cursor.execute('SELECT * from foo')
    data=cursor.fetchall()
    print(data)
    # [(1, 2)]    

insert_or_replace()
on_conflict()

These sqlite commands are probably faster than writing Python code to do the same thing, though to test this you could use Python's timeit module to test the speed of various implementations. For example, you could run

python -mtimeit -s'import test' 'test.insert_or_replace()'

versus

python -mtimeit -s'import test' 'test.filter_nonunique_rows_in_Python()'

versus

python -mtimeit -s'import test' 'test.insert_with_try_catch_blocks()'
qid & accept id: (6253617, 6253880) query: How can I store data to a data dictionary in Python when headings are in mixed up order soup:

This actually seems pretty easy. Process the file into a data structure, then export it into a csv.

\n
school = None\nheaders = None\ndata = {}\nfor line in text.splitlines():\n    if line.startswith("school id"):\n        school = line.split('=')[1].strip()\n        headers = None\n        continue\n    if school is not None and headers is None:\n        headers = line.split('|')\n        continue\n\n    if school is not None and headers is not None and line:\n        if not school in data:\n            data[school] = []\n        datum = dict(zip(headers, line.split('|')))\n        data[school].append(datum)    \n
\n
\n
In [29]: data\nOut[29]: \n{'273533123': [{'age': '27',\n                'degree': 'MBA',\n                'name': 'John B. Black',\n                'race': 'hispanic',\n                'year': '2003'},\n               {'age': '28',\n                'degree': 'PhD',\n                'name': 'Steven Smith',\n                'race': 'black',\n                'year': '2005'},\n               {'age': '25',\n                'degree': 'MBA',\n                'name': 'Jacob Waters',\n                'race': 'hispanic',\n                'year': '2003'}],\n '28392': [{'age': '27',\n            'degree': 'PhD',\n            'name': 'Susan A. Smith',\n            'race': 'white',\n            'year': '2007'},\n           {'age': '26',\n            'degree': 'PhD',\n            'name': 'Fred Collins',\n            'race': 'hispanic',\n            'year': '2006'},\n           {'age': '28',\n            'degree': 'MBA',\n            'name': 'Amber Real',\n            'race': 'white',\n            'year': '2007'},\n           {'age': '27',\n            'degree': 'PhD',\n            'name': 'Mike Lee',\n            'race': 'white',\n            'year': '2003'}],\n '3452332': [{'age': '27',\n              'degree': 'Bachelors',\n              'name': 'Peter Hintze',\n              'race': 'white',\n              'year': '2002'},\n             {'age': '25',\n              'degree': 'MBA',\n              'name': 'Ann Graden',\n              'race': 'black',\n              'year': '2004'},\n             {'age': '28',\n              'degree': 'PhD',\n              'name': 'Bryan Stewart',\n              'race': 'white',\n              'year': '2004'}]}    \n
\n soup wrap:

This actually seems pretty easy. Process the file into a data structure, then export it into a csv.

school = None
headers = None
data = {}
for line in text.splitlines():
    if line.startswith("school id"):
        school = line.split('=')[1].strip()
        headers = None
        continue
    if school is not None and headers is None:
        headers = line.split('|')
        continue

    if school is not None and headers is not None and line:
        if not school in data:
            data[school] = []
        datum = dict(zip(headers, line.split('|')))
        data[school].append(datum)    

In [29]: data
Out[29]: 
{'273533123': [{'age': '27',
                'degree': 'MBA',
                'name': 'John B. Black',
                'race': 'hispanic',
                'year': '2003'},
               {'age': '28',
                'degree': 'PhD',
                'name': 'Steven Smith',
                'race': 'black',
                'year': '2005'},
               {'age': '25',
                'degree': 'MBA',
                'name': 'Jacob Waters',
                'race': 'hispanic',
                'year': '2003'}],
 '28392': [{'age': '27',
            'degree': 'PhD',
            'name': 'Susan A. Smith',
            'race': 'white',
            'year': '2007'},
           {'age': '26',
            'degree': 'PhD',
            'name': 'Fred Collins',
            'race': 'hispanic',
            'year': '2006'},
           {'age': '28',
            'degree': 'MBA',
            'name': 'Amber Real',
            'race': 'white',
            'year': '2007'},
           {'age': '27',
            'degree': 'PhD',
            'name': 'Mike Lee',
            'race': 'white',
            'year': '2003'}],
 '3452332': [{'age': '27',
              'degree': 'Bachelors',
              'name': 'Peter Hintze',
              'race': 'white',
              'year': '2002'},
             {'age': '25',
              'degree': 'MBA',
              'name': 'Ann Graden',
              'race': 'black',
              'year': '2004'},
             {'age': '28',
              'degree': 'PhD',
              'name': 'Bryan Stewart',
              'race': 'white',
              'year': '2004'}]}    
qid & accept id: (6290105, 6290211) query: Traversing a "list" tree and get the type(item) list with same structure in python? soup:

You can create a generator that will traverse the tree for you for (1).

\n
def traverse(o, tree_types=(list, tuple)):\n    if isinstance(o, tree_types):\n        for value in o:\n            for subvalue in traverse(value):\n                yield subvalue\n    else:\n        yield o\n\ndata = [(1,1,(1,1,(1,"1"))),(1,1,1),(1,),1,(1,(1,("1",)))]\nprint list(traverse(data))\n# prints [1, 1, 1, 1, 1, '1', 1, 1, 1, 1, 1, 1, 1, '1']\n\nfor value in traverse(data):\n    print repr(value)\n# prints\n# 1\n# 1\n# 1\n# 1\n# 1\n# '1'\n# 1\n# 1\n# 1\n# 1\n# 1\n# 1\n# 1\n# '1'\n
\n
\n

Here is one possible approach to (2).

\n
def tree_map(f, o, tree_types=(list, tuple)):\n    if isinstance(o, tree_types):\n        return type(o)(tree_map(f, value, tree_types) for value in o)\n    else:\n        return f(o)\n\ndata = [(1,1,(1,1,(1,"1"))),(1,1,1),(1,),1,(1,(1,("1",)))]\nprint tree_map(lambda o: type(o).__name__, data)\n# prints [('int', 'int', ('int', 'int', ('int', 'str'))), ('int', 'int', 'int'), ('int',), 'int', ('int', ('int', ('str',)))]\n
\n soup wrap:

You can create a generator that will traverse the tree for you for (1).

def traverse(o, tree_types=(list, tuple)):
    if isinstance(o, tree_types):
        for value in o:
            for subvalue in traverse(value):
                yield subvalue
    else:
        yield o

data = [(1,1,(1,1,(1,"1"))),(1,1,1),(1,),1,(1,(1,("1",)))]
print list(traverse(data))
# prints [1, 1, 1, 1, 1, '1', 1, 1, 1, 1, 1, 1, 1, '1']

for value in traverse(data):
    print repr(value)
# prints
# 1
# 1
# 1
# 1
# 1
# '1'
# 1
# 1
# 1
# 1
# 1
# 1
# 1
# '1'

Here is one possible approach to (2).

def tree_map(f, o, tree_types=(list, tuple)):
    if isinstance(o, tree_types):
        return type(o)(tree_map(f, value, tree_types) for value in o)
    else:
        return f(o)

data = [(1,1,(1,1,(1,"1"))),(1,1,1),(1,),1,(1,(1,("1",)))]
print tree_map(lambda o: type(o).__name__, data)
# prints [('int', 'int', ('int', 'int', ('int', 'str'))), ('int', 'int', 'int'), ('int',), 'int', ('int', ('int', ('str',)))]
qid & accept id: (6315244, 6315525) query: How to give object away to python garbage collection? soup:

I find that most programs create and dispose of objects quite naturally, so I never normally worry about it.

\n

Some examples:

\n
person = Person('john')\nperson = Person('james')\n# Whoops! 'john' has died!\n\npeople = []\npeople.append(Person('john'))\n# ...\n# All 'Persons' live in people\npeople = []\n# Now all 'Persons' are dead (including the list that referenced them)\n\nclass House():\n    def setOwner(self, person):\n        self.owner = person\n\nhouse.setOwner(people[0])\n# Now a House refers to a Person\npeople = []\n# Now all 'Persons' are dead, except the one that house.owner refers to.\n
\n

What I assume you are after is this:

\n
people = {}\npeople['john'] = Person('john')\n\ndef removePerson(personName):\n    del people[personName]\n\nremovePerson('john')\n
\n

In this case people is the master list and you can control when a Person gets added and removed from the list (its a dictionary).

\n

You may have to think through the concept of a person being created and then dying very thoroughly: Once created how does the person first interact with the simulation. Upon death, how should you untangle the references? (Its ok for a person to refer to other stuff, its things like House in my example that would keep a person alive. You could have other objects hold on to just the name of the person).

\n soup wrap:

I find that most programs create and dispose of objects quite naturally, so I never normally worry about it.

Some examples:

person = Person('john')
person = Person('james')
# Whoops! 'john' has died!

people = []
people.append(Person('john'))
# ...
# All 'Persons' live in people
people = []
# Now all 'Persons' are dead (including the list that referenced them)

class House():
    def setOwner(self, person):
        self.owner = person

house.setOwner(people[0])
# Now a House refers to a Person
people = []
# Now all 'Persons' are dead, except the one that house.owner refers to.

What I assume you are after is this:

people = {}
people['john'] = Person('john')

def removePerson(personName):
    del people[personName]

removePerson('john')

In this case people is the master list and you can control when a Person gets added and removed from the list (its a dictionary).

You may have to think through the concept of a person being created and then dying very thoroughly: Once created how does the person first interact with the simulation. Upon death, how should you untangle the references? (Its ok for a person to refer to other stuff, its things like House in my example that would keep a person alive. You could have other objects hold on to just the name of the person).

qid & accept id: (6316726, 6317571) query: SQLAlchemy/Elixir - querying to check entity's membership in a many-to-many relationship list soup:

You can find the intermediate table where Elixir has hidden it away, but note that it uses fully qualified column names (such as __package_path_with_underscores__course_id). To avoid this, define your ManyToMany using e.g.

\n
class Course(Entity):\n    ...\n    assistants = ManyToMany('Professor', inverse='courses_assisted',\n                            local_colname='course_id', remote_colname='prof_id',\n                            ondelete='cascade')\n
\n

and then you can access the intermediate table using

\n
rel = Course._descriptor.find_relationship('assistants')\nassert rel\ntable = rel.table\n
\n

and can access the columns using table.c.prof_id, etc.

\n

Update: Of course you can do this at a higher level, but not in a single query, because SQLAlchemy doesn't yet support in_ for relationships. For example, with two queries:

\n
>>> mit_courses = set(Course.query.join(\n... University).filter(University.name == 'MIT'))\n>>> [p.name for p in Professor.query if set(\n... p.courses_assisted).intersection(mit_courses)]\n
\n

Or, alternatively:

\n
>>> plist = [c.assistants for c in Course.query.join(\n... University).filter(University.name == 'MIT')]\n>>> [p.name for p in set(itertools.chain(*plist))]\n
\n

The first step creates a list of lists of assistants. The second step flattens the list of lists and removes duplicates through making a set.

\n soup wrap:

You can find the intermediate table where Elixir has hidden it away, but note that it uses fully qualified column names (such as __package_path_with_underscores__course_id). To avoid this, define your ManyToMany using e.g.

class Course(Entity):
    ...
    assistants = ManyToMany('Professor', inverse='courses_assisted',
                            local_colname='course_id', remote_colname='prof_id',
                            ondelete='cascade')

and then you can access the intermediate table using

rel = Course._descriptor.find_relationship('assistants')
assert rel
table = rel.table

and can access the columns using table.c.prof_id, etc.

Update: Of course you can do this at a higher level, but not in a single query, because SQLAlchemy doesn't yet support in_ for relationships. For example, with two queries:

>>> mit_courses = set(Course.query.join(
... University).filter(University.name == 'MIT'))
>>> [p.name for p in Professor.query if set(
... p.courses_assisted).intersection(mit_courses)]

Or, alternatively:

>>> plist = [c.assistants for c in Course.query.join(
... University).filter(University.name == 'MIT')]
>>> [p.name for p in set(itertools.chain(*plist))]

The first step creates a list of lists of assistants. The second step flattens the list of lists and removes duplicates through making a set.

qid & accept id: (6367051, 6367075) query: Is there an easy way to tell which line number a file pointer is on? soup:

A typical solution to this problem is to define a new class that wraps an existing instance of a file, which automatically counts the numbers. Something like this (just off the top of my head, I haven't tested this):

\n
class FileLineWrapper(object):\n    def __init__(self, f):\n        self.f = f\n        self.line = 0\n    def close(self):\n        return self.f.close()\n    def readline(self):\n        self.line += 1\n        return self.f.readline()\n    # to allow using in 'with' statements \n    def __enter__(self):\n        return self\n    def __exit__(self, exc_type, exc_val, exc_tb):\n        self.close()\n
\n

Use it like this:

\n
f = FileLineWrapper(open("myfile.txt", "r"))\nf.readline()\nprint(f.line)\n
\n

It looks like the standard module fileinput does much the same thing (and some other things as well); you could use that instead if you like.

\n soup wrap:

A typical solution to this problem is to define a new class that wraps an existing instance of a file, which automatically counts the numbers. Something like this (just off the top of my head, I haven't tested this):

class FileLineWrapper(object):
    def __init__(self, f):
        self.f = f
        self.line = 0
    def close(self):
        return self.f.close()
    def readline(self):
        self.line += 1
        return self.f.readline()
    # to allow using in 'with' statements 
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.close()

Use it like this:

f = FileLineWrapper(open("myfile.txt", "r"))
f.readline()
print(f.line)

It looks like the standard module fileinput does much the same thing (and some other things as well); you could use that instead if you like.

qid & accept id: (6389577, 6389626) query: Merge two arrays into a matrix in python and sort soup:
C = zip(A, B)\nD = sorted(C, key=lambda x: x[1])\nA2, B2 = zip(*D)\n
\n

Or all on one line:

\n
A2, B2 = zip(*sorted(zip(A,B), key=lambda x: x[1]))\n
\n soup wrap:
C = zip(A, B)
D = sorted(C, key=lambda x: x[1])
A2, B2 = zip(*D)

Or all on one line:

A2, B2 = zip(*sorted(zip(A,B), key=lambda x: x[1]))
qid & accept id: (6406368, 6406750) query: Matplotlib - Move X-Axis label downwards, but not X-Axis Ticks soup:

use labelpad parameter:

\n
pl.xlabel("...", labelpad=20)\n
\n

or set it after:

\n
ax.xaxis.labelpad = 20\n
\n soup wrap:

use labelpad parameter:

pl.xlabel("...", labelpad=20)

or set it after:

ax.xaxis.labelpad = 20
qid & accept id: (6444825, 7643395) query: render cms page within another page soup:

Just for a moment ignoring the idea of creating a custom plugin in order to do what you describe (ie, render a page's placeholders programatically), the following might be a viable alternative, depending on what exactly you are trying to achieve...

\n

You should be able, just in the template for your "outer" cms page (ie, the page within which you want to display the contents of another cms page), to get access to the current page like this:

\n
{{ request.current_page }}\n
\n

This is by virtue of the cms page middleware. So taking that a step further, you should be able to access the page's placeholders like this:

\n
{% for placeholder in request.current_page.placeholders %}\n    {{ placeholder.render }}\n{% endfor %}\n
\n

That's one way you could go about rendering a page's placeholders "inside" another page.

\n soup wrap:

Just for a moment ignoring the idea of creating a custom plugin in order to do what you describe (ie, render a page's placeholders programatically), the following might be a viable alternative, depending on what exactly you are trying to achieve...

You should be able, just in the template for your "outer" cms page (ie, the page within which you want to display the contents of another cms page), to get access to the current page like this:

{{ request.current_page }}

This is by virtue of the cms page middleware. So taking that a step further, you should be able to access the page's placeholders like this:

{% for placeholder in request.current_page.placeholders %}
    {{ placeholder.render }}
{% endfor %}

That's one way you could go about rendering a page's placeholders "inside" another page.

qid & accept id: (6493681, 6493765) query: Use of SQL - IN in python soup:

Try something like this:

\n
'(%s)' % ','.join(map(str,x))\n
\n

This will give you a string that you could use to send to MySql as a valid IN clause:

\n
(1,2,3,4,5,6)\n
\n soup wrap:

Try something like this:

'(%s)' % ','.join(map(str,x))

This will give you a string that you could use to send to MySql as a valid IN clause:

(1,2,3,4,5,6)
qid & accept id: (6540412, 6540833) query: Updating a TKinter GUI from a multiprocessing calculation soup:

This may or may not be helpful to you, but it is possible to make tkinter thread-safe by ensuring that its code and methods are executed on the particular thread the root was instantiated on. One project that experimented with the concept can be found over on the Python Cookbook as recipe 577633 (Directory Pruner 2). The code below comes from lines 76 - 253 and is fairly easy to extend with widgets.

\n
\n

Primary Thread-safety Support

\n
# Import several GUI libraries.\nimport tkinter.ttk\nimport tkinter.filedialog\nimport tkinter.messagebox\n\n# Import other needed modules.\nimport queue\nimport _thread\nimport operator\n\n################################################################################\n\nclass AffinityLoop:\n\n    "Restricts code execution to thread that instance was created on."\n\n    __slots__ = '__action', '__thread'\n\n    def __init__(self):\n        "Initialize AffinityLoop with job queue and thread identity."\n        self.__action = queue.Queue()\n        self.__thread = _thread.get_ident()\n\n    def run(self, func, *args, **keywords):\n        "Run function on creating thread and return result."\n        if _thread.get_ident() == self.__thread:\n            self.__run_jobs()\n            return func(*args, **keywords)\n        else:\n            job = self.__Job(func, args, keywords)\n            self.__action.put_nowait(job)\n            return job.result\n\n    def __run_jobs(self):\n        "Run all pending jobs currently in the job queue."\n        while not self.__action.empty():\n            job = self.__action.get_nowait()\n            job.execute()\n\n    ########################################################################\n\n    class __Job:\n\n        "Store information to run a job at a later time."\n\n        __slots__ = ('__func', '__args', '__keywords',\n                     '__error', '__mutex', '__value')\n\n        def __init__(self, func, args, keywords):\n            "Initialize the job's info and ready for execution."\n            self.__func = func\n            self.__args = args\n            self.__keywords = keywords\n            self.__error = False\n            self.__mutex = _thread.allocate_lock()\n            self.__mutex.acquire()\n\n        def execute(self):\n            "Run the job, store any error, and return to sender."\n            try:\n                self.__value = self.__func(*self.__args, **self.__keywords)\n            except Exception as error:\n                self.__error = True\n                self.__value = error\n            self.__mutex.release()\n\n        @property\n        def result(self):\n            "Return execution result or raise an error."\n            self.__mutex.acquire()\n            if self.__error:\n                raise self.__value\n            return self.__value\n\n################################################################################\n\nclass _ThreadSafe:\n\n    "Create a thread-safe GUI class for safe cross-threaded calls."\n\n    ROOT = tkinter.Tk\n\n    def __init__(self, master=None, *args, **keywords):\n        "Initialize a thread-safe wrapper around a GUI base class."\n        if master is None:\n            if self.BASE is not self.ROOT:\n                raise ValueError('Widget must have a master!')\n            self.__job = AffinityLoop() # Use Affinity() if it does not break.\n            self.__schedule(self.__initialize, *args, **keywords)\n        else:\n            self.master = master\n            self.__job = master.__job\n            self.__schedule(self.__initialize, master, *args, **keywords)\n\n    def __initialize(self, *args, **keywords):\n        "Delegate instance creation to later time if necessary."\n        self.__obj = self.BASE(*args, **keywords)\n\n    ########################################################################\n\n    # Provide a framework for delaying method execution when needed.\n\n    def __schedule(self, *args, **keywords):\n        "Schedule execution of a method till later if necessary."\n        return self.__job.run(self.__run, *args, **keywords)\n\n    @classmethod\n    def __run(cls, func, *args, **keywords):\n        "Execute the function after converting the arguments."\n        args = tuple(cls.unwrap(i) for i in args)\n        keywords = dict((k, cls.unwrap(v)) for k, v in keywords.items())\n        return func(*args, **keywords)\n\n    @staticmethod\n    def unwrap(obj):\n        "Unpack inner objects wrapped by _ThreadSafe instances."\n        return obj.__obj if isinstance(obj, _ThreadSafe) else obj\n\n    ########################################################################\n\n    # Allow access to and manipulation of wrapped instance's settings.\n\n    def __getitem__(self, key):\n        "Get a configuration option from the underlying object."\n        return self.__schedule(operator.getitem, self, key)\n\n    def __setitem__(self, key, value):\n        "Set a configuration option on the underlying object."\n        return self.__schedule(operator.setitem, self, key, value)\n\n    ########################################################################\n\n    # Create attribute proxies for methods and allow their execution.\n\n    def __getattr__(self, name):\n        "Create a requested attribute and return cached result."\n        attr = self.__Attr(self.__callback, (name,))\n        setattr(self, name, attr)\n        return attr\n\n    def __callback(self, path, *args, **keywords):\n        "Schedule execution of named method from attribute proxy."\n        return self.__schedule(self.__method, path, *args, **keywords)\n\n    def __method(self, path, *args, **keywords):\n        "Extract a method and run it with the provided arguments."\n        method = self.__obj\n        for name in path:\n            method = getattr(method, name)\n        return method(*args, **keywords)\n\n    ########################################################################\n\n    class __Attr:\n\n        "Save an attribute's name and wait for execution."\n\n        __slots__ = '__callback', '__path'\n\n        def __init__(self, callback, path):\n            "Initialize proxy with callback and method path."\n            self.__callback = callback\n            self.__path = path\n\n        def __call__(self, *args, **keywords):\n            "Run a known method with the given arguments."\n            return self.__callback(self.__path, *args, **keywords)\n\n        def __getattr__(self, name):\n            "Generate a proxy object for a sub-attribute."\n            if name in {'__func__', '__name__'}:\n                # Hack for the "tkinter.__init__.Misc._register" method.\n                raise AttributeError('This is not a real method!')\n            return self.__class__(self.__callback, self.__path + (name,))\n\n################################################################################\n\n# Provide thread-safe classes to be used from tkinter.\n\nclass Tk(_ThreadSafe): BASE = tkinter.Tk\nclass Frame(_ThreadSafe): BASE = tkinter.ttk.Frame\nclass Button(_ThreadSafe): BASE = tkinter.ttk.Button\nclass Entry(_ThreadSafe): BASE = tkinter.ttk.Entry\nclass Progressbar(_ThreadSafe): BASE = tkinter.ttk.Progressbar\nclass Treeview(_ThreadSafe): BASE = tkinter.ttk.Treeview\nclass Scrollbar(_ThreadSafe): BASE = tkinter.ttk.Scrollbar\nclass Sizegrip(_ThreadSafe): BASE = tkinter.ttk.Sizegrip\nclass Menu(_ThreadSafe): BASE = tkinter.Menu\nclass Directory(_ThreadSafe): BASE = tkinter.filedialog.Directory\nclass Message(_ThreadSafe): BASE = tkinter.messagebox.Message\n
\n
\n

If you read the rest of the application, you will find that it is built with the widgets defined as _ThreadSafe variants that you are used to seeing in other tkinter applications. As method calls come in from various threads, they are automatically held until it becomes possible to execute those calls on the creating thread. Note how the mainloop is replaced by way of lines 291 - 298 and 326 - 336.

\n
\n

Notice NoDefaltRoot & main_loop Calls

\n
@classmethod\ndef main(cls):\n    "Create an application containing a single TrimDirView widget."\n    tkinter.NoDefaultRoot()\n    root = cls.create_application_root()\n    cls.attach_window_icon(root, ICON)\n    view = cls.setup_class_instance(root)\n    cls.main_loop(root)\n
\n
\n

main_loop Allows Threads To Execute

\n
@staticmethod\ndef main_loop(root):\n    "Process all GUI events according to tkinter's settings."\n    target = time.clock()\n    while True:\n        try:\n            root.update()\n        except tkinter.TclError:\n            break\n        target += tkinter._tkinter.getbusywaitinterval() / 1000\n        time.sleep(max(target - time.clock(), 0))\n
\n
\n soup wrap:

This may or may not be helpful to you, but it is possible to make tkinter thread-safe by ensuring that its code and methods are executed on the particular thread the root was instantiated on. One project that experimented with the concept can be found over on the Python Cookbook as recipe 577633 (Directory Pruner 2). The code below comes from lines 76 - 253 and is fairly easy to extend with widgets.


Primary Thread-safety Support

# Import several GUI libraries.
import tkinter.ttk
import tkinter.filedialog
import tkinter.messagebox

# Import other needed modules.
import queue
import _thread
import operator

################################################################################

class AffinityLoop:

    "Restricts code execution to thread that instance was created on."

    __slots__ = '__action', '__thread'

    def __init__(self):
        "Initialize AffinityLoop with job queue and thread identity."
        self.__action = queue.Queue()
        self.__thread = _thread.get_ident()

    def run(self, func, *args, **keywords):
        "Run function on creating thread and return result."
        if _thread.get_ident() == self.__thread:
            self.__run_jobs()
            return func(*args, **keywords)
        else:
            job = self.__Job(func, args, keywords)
            self.__action.put_nowait(job)
            return job.result

    def __run_jobs(self):
        "Run all pending jobs currently in the job queue."
        while not self.__action.empty():
            job = self.__action.get_nowait()
            job.execute()

    ########################################################################

    class __Job:

        "Store information to run a job at a later time."

        __slots__ = ('__func', '__args', '__keywords',
                     '__error', '__mutex', '__value')

        def __init__(self, func, args, keywords):
            "Initialize the job's info and ready for execution."
            self.__func = func
            self.__args = args
            self.__keywords = keywords
            self.__error = False
            self.__mutex = _thread.allocate_lock()
            self.__mutex.acquire()

        def execute(self):
            "Run the job, store any error, and return to sender."
            try:
                self.__value = self.__func(*self.__args, **self.__keywords)
            except Exception as error:
                self.__error = True
                self.__value = error
            self.__mutex.release()

        @property
        def result(self):
            "Return execution result or raise an error."
            self.__mutex.acquire()
            if self.__error:
                raise self.__value
            return self.__value

################################################################################

class _ThreadSafe:

    "Create a thread-safe GUI class for safe cross-threaded calls."

    ROOT = tkinter.Tk

    def __init__(self, master=None, *args, **keywords):
        "Initialize a thread-safe wrapper around a GUI base class."
        if master is None:
            if self.BASE is not self.ROOT:
                raise ValueError('Widget must have a master!')
            self.__job = AffinityLoop() # Use Affinity() if it does not break.
            self.__schedule(self.__initialize, *args, **keywords)
        else:
            self.master = master
            self.__job = master.__job
            self.__schedule(self.__initialize, master, *args, **keywords)

    def __initialize(self, *args, **keywords):
        "Delegate instance creation to later time if necessary."
        self.__obj = self.BASE(*args, **keywords)

    ########################################################################

    # Provide a framework for delaying method execution when needed.

    def __schedule(self, *args, **keywords):
        "Schedule execution of a method till later if necessary."
        return self.__job.run(self.__run, *args, **keywords)

    @classmethod
    def __run(cls, func, *args, **keywords):
        "Execute the function after converting the arguments."
        args = tuple(cls.unwrap(i) for i in args)
        keywords = dict((k, cls.unwrap(v)) for k, v in keywords.items())
        return func(*args, **keywords)

    @staticmethod
    def unwrap(obj):
        "Unpack inner objects wrapped by _ThreadSafe instances."
        return obj.__obj if isinstance(obj, _ThreadSafe) else obj

    ########################################################################

    # Allow access to and manipulation of wrapped instance's settings.

    def __getitem__(self, key):
        "Get a configuration option from the underlying object."
        return self.__schedule(operator.getitem, self, key)

    def __setitem__(self, key, value):
        "Set a configuration option on the underlying object."
        return self.__schedule(operator.setitem, self, key, value)

    ########################################################################

    # Create attribute proxies for methods and allow their execution.

    def __getattr__(self, name):
        "Create a requested attribute and return cached result."
        attr = self.__Attr(self.__callback, (name,))
        setattr(self, name, attr)
        return attr

    def __callback(self, path, *args, **keywords):
        "Schedule execution of named method from attribute proxy."
        return self.__schedule(self.__method, path, *args, **keywords)

    def __method(self, path, *args, **keywords):
        "Extract a method and run it with the provided arguments."
        method = self.__obj
        for name in path:
            method = getattr(method, name)
        return method(*args, **keywords)

    ########################################################################

    class __Attr:

        "Save an attribute's name and wait for execution."

        __slots__ = '__callback', '__path'

        def __init__(self, callback, path):
            "Initialize proxy with callback and method path."
            self.__callback = callback
            self.__path = path

        def __call__(self, *args, **keywords):
            "Run a known method with the given arguments."
            return self.__callback(self.__path, *args, **keywords)

        def __getattr__(self, name):
            "Generate a proxy object for a sub-attribute."
            if name in {'__func__', '__name__'}:
                # Hack for the "tkinter.__init__.Misc._register" method.
                raise AttributeError('This is not a real method!')
            return self.__class__(self.__callback, self.__path + (name,))

################################################################################

# Provide thread-safe classes to be used from tkinter.

class Tk(_ThreadSafe): BASE = tkinter.Tk
class Frame(_ThreadSafe): BASE = tkinter.ttk.Frame
class Button(_ThreadSafe): BASE = tkinter.ttk.Button
class Entry(_ThreadSafe): BASE = tkinter.ttk.Entry
class Progressbar(_ThreadSafe): BASE = tkinter.ttk.Progressbar
class Treeview(_ThreadSafe): BASE = tkinter.ttk.Treeview
class Scrollbar(_ThreadSafe): BASE = tkinter.ttk.Scrollbar
class Sizegrip(_ThreadSafe): BASE = tkinter.ttk.Sizegrip
class Menu(_ThreadSafe): BASE = tkinter.Menu
class Directory(_ThreadSafe): BASE = tkinter.filedialog.Directory
class Message(_ThreadSafe): BASE = tkinter.messagebox.Message

If you read the rest of the application, you will find that it is built with the widgets defined as _ThreadSafe variants that you are used to seeing in other tkinter applications. As method calls come in from various threads, they are automatically held until it becomes possible to execute those calls on the creating thread. Note how the mainloop is replaced by way of lines 291 - 298 and 326 - 336.


Notice NoDefaltRoot & main_loop Calls

@classmethod
def main(cls):
    "Create an application containing a single TrimDirView widget."
    tkinter.NoDefaultRoot()
    root = cls.create_application_root()
    cls.attach_window_icon(root, ICON)
    view = cls.setup_class_instance(root)
    cls.main_loop(root)

main_loop Allows Threads To Execute

@staticmethod
def main_loop(root):
    "Process all GUI events according to tkinter's settings."
    target = time.clock()
    while True:
        try:
            root.update()
        except tkinter.TclError:
            break
        target += tkinter._tkinter.getbusywaitinterval() / 1000
        time.sleep(max(target - time.clock(), 0))

qid & accept id: (6566642, 6566682) query: (python) prepend script dir to a path soup:

I would personally just os.chdir into the script's directory whenever I execute it. It is just:

\n
import os\nos.chdir(os.path.split(__file__)[0])\n
\n

However if you did want to refactor this thing into a library, you are in essence wanting a function that is aware of its caller's state. You thus have to make it psd(__file__, blah). If you just wanted to write psd(blah), you'd have to do cpython-specific tricks with stack frames:

\n
import inspect\n\ndef getCallerModule():\n    # gets globals of module called from, and prints out __file__ global\n    print(inspect.currentframe().f_back.f_globals['__file__'])\n
\n soup wrap:

I would personally just os.chdir into the script's directory whenever I execute it. It is just:

import os
os.chdir(os.path.split(__file__)[0])

However if you did want to refactor this thing into a library, you are in essence wanting a function that is aware of its caller's state. You thus have to make it psd(__file__, blah). If you just wanted to write psd(blah), you'd have to do cpython-specific tricks with stack frames:

import inspect

def getCallerModule():
    # gets globals of module called from, and prints out __file__ global
    print(inspect.currentframe().f_back.f_globals['__file__'])
qid & accept id: (6611563, 10561643) query: SQLAlchemy ON DUPLICATE KEY UPDATE soup:

ON DUPLICATE KEY UPDATE in the SQL statement

\n

If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a @compiles decorator.

\n

The code (linked from a good thread on the subject on reddit) for an example can be found on github:

\n
from sqlalchemy.ext.compiler import compiles\nfrom sqlalchemy.sql.expression import Insert\n\n@compiles(Insert)\ndef append_string(insert, compiler, **kw):\n    s = compiler.visit_insert(insert, **kw)\n    if 'append_string' in insert.kwargs:\n        return s + " " + insert.kwargs['append_string']\n    return s\n\n\nmy_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)\n
\n

But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.

\n

ON DUPLICATE KEY UPDATE functionality within the ORM

\n

SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.

\n

session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.

\n

But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.

\n
def get_or_create(session, model, defaults=None, **kwargs):\n    instance = session.query(model).filter_by(**kwargs).first()\n    if instance:\n        return instance\n    else:\n        params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))\n        if defaults:\n            params.update(defaults)\n        instance = model(**params)\n        return instance\n
\n soup wrap:

ON DUPLICATE KEY UPDATE in the SQL statement

If you want the generated SQL to actually include ON DUPLICATE KEY UPDATE, the simplest way involves using a @compiles decorator.

The code (linked from a good thread on the subject on reddit) for an example can be found on github:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Insert

@compiles(Insert)
def append_string(insert, compiler, **kw):
    s = compiler.visit_insert(insert, **kw)
    if 'append_string' in insert.kwargs:
        return s + " " + insert.kwargs['append_string']
    return s


my_connection.execute(my_table.insert(append_string = 'ON DUPLICATE KEY UPDATE foo=foo'), my_values)

But note that in this approach, you have to manually create the append_string. You could probably change the append_string function so that it automatically changes the insert string into an insert with 'ON DUPLICATE KEY UPDATE' string, but I'm not going to do that here due to laziness.

ON DUPLICATE KEY UPDATE functionality within the ORM

SQLAlchemy does not provide an interface to ON DUPLICATE KEY UPDATE or MERGE or any other similar functionality in its ORM layer. Nevertheless, it has the session.merge() function that can replicate the functionality only if the key in question is a primary key.

session.merge(ModelObject) first checks if a row with the same primary key value exists by sending a SELECT query (or by looking it up locally). If it does, it sets a flag somewhere indicating that ModelObject is in the database already, and that SQLAlchemy should use an UPDATE query. Note that merge is quite a bit more complicated than this, but it replicates the functionality well with primary keys.

But what if you want ON DUPLICATE KEY UPDATE functionality with a non-primary key (for example, another unique key)? Unfortunately, SQLAlchemy doesn't have any such function. Instead, you have to create something that resembles Django's get_or_create(). Another StackOverflow answer covers it, and I'll just paste a modified, working version of it here for convenience.

def get_or_create(session, model, defaults=None, **kwargs):
    instance = session.query(model).filter_by(**kwargs).first()
    if instance:
        return instance
    else:
        params = dict((k, v) for k, v in kwargs.iteritems() if not isinstance(v, ClauseElement))
        if defaults:
            params.update(defaults)
        instance = model(**params)
        return instance
qid & accept id: (6624152, 6624782) query: matching a multiline make-line variable assignment with a python regexp soup:

re.M means re.MULTILINE, but it doesn't concern the symbolism of dot, it concerns the symbolism of ^ and $

\n

You need to specify re.DOTALL to make the dot able to match even with '\n'

\n
def test():\n    s = r"""    \n\nFOO=a \    \n\n  b\n\n  """\n    import re\n    print repr(s)\n    print '---------------------'\n    regex = re.compile(r'^FOO=(.+)(?
\n

result

\n
'    \n\nFOO=a \\    \n\n  b\n\n  '\n---------------------\na \    \n-----\n'a \\    '\n---------------------\na \    \n\n  b\n\n\n-----\n'a \\    \n\n  b\n\n  '\n
\n soup wrap:

re.M means re.MULTILINE, but it doesn't concern the symbolism of dot, it concerns the symbolism of ^ and $

You need to specify re.DOTALL to make the dot able to match even with '\n'

def test():
    s = r"""    

FOO=a \    

  b

  """
    import re
    print repr(s)
    print '---------------------'
    regex = re.compile(r'^FOO=(.+)(?

result

'    \n\nFOO=a \\    \n\n  b\n\n  '
---------------------
a \    
-----
'a \\    '
---------------------
a \    

  b


-----
'a \\    \n\n  b\n\n  '
qid & accept id: (6669828, 6671316) query: optparse(): Input validation soup:

It's been a while since I did anything with optparse, but I took a brief look through the docs and an old program.

\n

"-f/-s,-e/-d are mandatory options but -f&-s cannot be used together and the same as with -e&-d options - cannot be used together. How can I put the check in place?"

\n

For mutual exclusivity, you have to do the check yourself, for example:

\n
parser.add_option("-e", help="e desc", dest="e_opt", action="store_true")\nparser.add_option("-d", help="d desc", dest="d_opt", action="store_true")\n(opts, args) = parser.parse_args()\nif (parser.has_option("-e") and parser.has_option("-d")):\n    print "Error!  Found both d and e options.  You can't do that!"\n    sys.exit(1)\n
\n

Since the example options here are boolean, you could replace the if line above with:

\n
if (opts.e_opt and opts.d_opt):\n
\n

See the section How optparse handles errors for more.

\n

"How can I use -w option (when used) with or w/o a value?"

\n

I've never figured out a way to have an optparse option for which a value is, well, optional. AFAIK, you have to set the option up to have values or to not have values. The closest I've come is to specify a default value for an option which must have a value. Then that entry doesn't have to be specified on the command line. Sample code :

\n
parser.add_option("-w", help="warning", dest="warn", default=0)\n
\n

An aside with a (hopefully helpful) suggestion:

\n

If you saw the docs, you did see the part about how "mandatory options" is an oxymoron, right?    ;-p    Humor aside, you may want to consider re-designing the interface, so that:

\n
    \n
  • Required information isn't entered using an "option".
  • \n
  • Only one argument (or group of arguments) enters data which could be mutually exclusive. In other words, instead of "-e" or "-d", have "-e on" or "-e off". If you want something like "-v" for verbose and "-q" for quiet/verbose off, you can store the values into one variable:
  • \n
\n
parser.add_option("-v", help="verbose on", dest="verbose", action="store_true")\nparser.add_option("-q", help="verbose off", dest="verbose", action="store_false")\n
\n

This particular example is borrowed (with slight expansion) from the section Handling boolean (flag) options. For something like this you might also want to check out the Grouping Options section; I've not used this feature, so won't say more about it.

\n soup wrap:

It's been a while since I did anything with optparse, but I took a brief look through the docs and an old program.

"-f/-s,-e/-d are mandatory options but -f&-s cannot be used together and the same as with -e&-d options - cannot be used together. How can I put the check in place?"

For mutual exclusivity, you have to do the check yourself, for example:

parser.add_option("-e", help="e desc", dest="e_opt", action="store_true")
parser.add_option("-d", help="d desc", dest="d_opt", action="store_true")
(opts, args) = parser.parse_args()
if (parser.has_option("-e") and parser.has_option("-d")):
    print "Error!  Found both d and e options.  You can't do that!"
    sys.exit(1)

Since the example options here are boolean, you could replace the if line above with:

if (opts.e_opt and opts.d_opt):

See the section How optparse handles errors for more.

"How can I use -w option (when used) with or w/o a value?"

I've never figured out a way to have an optparse option for which a value is, well, optional. AFAIK, you have to set the option up to have values or to not have values. The closest I've come is to specify a default value for an option which must have a value. Then that entry doesn't have to be specified on the command line. Sample code :

parser.add_option("-w", help="warning", dest="warn", default=0)

An aside with a (hopefully helpful) suggestion:

If you saw the docs, you did see the part about how "mandatory options" is an oxymoron, right?    ;-p    Humor aside, you may want to consider re-designing the interface, so that:

  • Required information isn't entered using an "option".
  • Only one argument (or group of arguments) enters data which could be mutually exclusive. In other words, instead of "-e" or "-d", have "-e on" or "-e off". If you want something like "-v" for verbose and "-q" for quiet/verbose off, you can store the values into one variable:
parser.add_option("-v", help="verbose on", dest="verbose", action="store_true")
parser.add_option("-q", help="verbose off", dest="verbose", action="store_false")

This particular example is borrowed (with slight expansion) from the section Handling boolean (flag) options. For something like this you might also want to check out the Grouping Options section; I've not used this feature, so won't say more about it.

qid & accept id: (6687619, 6687691) query: Print out a large list from file into multiple sublists with overlapping sequences in python soup:
seq="abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft"\n>>> n = 4\n>>> overlap = 5\n>>> division = len(seq)/n\n>>> [seq[i*division:(i+1)*division+overlap] for i in range(n)]\n['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']\n
\n

it is probably slightly more efficient to do it like this

\n
>>> [seq[i:i+division+overlap] for i in range(0,n*division,division)]\n['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']\n
\n soup wrap:
seq="abcdefessdfekgheithrfkopeifhghtryrhfbcvdfersdwtiyuyrterdhcbgjherytyekdnfiwytowihfiwoeirehjiwoqpft"
>>> n = 4
>>> overlap = 5
>>> division = len(seq)/n
>>> [seq[i*division:(i+1)*division+overlap] for i in range(n)]
['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']

it is probably slightly more efficient to do it like this

>>> [seq[i:i+division+overlap] for i in range(0,n*division,division)]
['abcdefessdfekgheithrfkopeifhg', 'eifhghtryrhfbcvdfersdwtiyuyrt', 'yuyrterdhcbgjherytyekdnfiwyto', 'iwytowihfiwoeirehjiwoqpft']
qid & accept id: (6727491, 6727624) query: Track changes of atributes in instance. Python soup:

This is a start:

\n
class MagicWrapper(object):\n    def __init__(self, wrapped):\n        self._wrapped = wrapped\n\n    def __getattr__(self, attr):\n        return getattr(self._wrapped, attr)\n\n    def __setattr__(self, attr, val):\n        if attr == '_wrapped':\n            super(MagicWrapper, self).__setattr__('_wrapped', val)\n        else:\n            setattr(self._wrapped, 'old_' + attr, getattr(self._wrapped, attr))\n            setattr(self._wrapped, attr, val)\n\n\nclass MyObject(object):\n    def __init__(self):\n        self.attr_one = None\n        self.attr_two = 1\n\nobj = MyObject()\nobj = MagicWrapper(obj)\nobj.attr_one = 'new value'\nobj.attr_two = 2\n\nprint obj.old_attr_one\nprint obj.attr_one\nprint obj.old_attr_two\nprint obj.attr_two\n
\n

This isn't bullet-proof when you're trying to wrap weird objects (very little in Python is), but it should work for "normal" classes. You could write a lot more code to get a little bit closer to fully cloning the behaviour of the wrapped object, but it's probably impossible to do perfectly. The main thing to be aware of here is that many special methods will not be redirected to the wrapped object.

\n

If you want to do this without wrapping obj in some way, it's going to get messy. Here's an option:

\n
def add_old_setattr_to_class(cls):\n    def __setattr__(self, attr, val):\n        super_setattr = super(self.__class__, self).__setattr__\n        if attr.startswith('old_'):\n            super_setattr(attr, val)\n        else:\n            super_setattr('old_' + attr, getattr(self, attr))\n            super_setattr(attr, val)\n    cls.__setattr__ = __setattr__\n\n\nclass MyObject(object):\n    def __init__(self):\n        self.attr_one = None\n        self.attr_two = 1\n\nobj = MyObject()\nadd_old_setattr_to_class(obj.__class__)\nobj.attr_one = 'new value'\nobj.attr_two = 2\n\nprint obj.old_attr_one\nprint obj.attr_one\nprint obj.old_attr_two\nprint obj.attr_two\n
\n

Note that this is extremely invasive if you're using it on externally provided objects. It globally modifies the class of the object you're applying the magic to, not just that one instance. This is because like several other special methods, __setattr__ is not looked up in the instance's attribute dictionary; the lookup skips straight to the class, so there's no way to just override __setattr__ on the instance. I would characterise this sort of code as a bizarre hack if I encountered it in the wild (it's "nifty cleverness" if I write it myself, of course ;) ).

\n

This version may or may not play nicely with objects that already play tricks with __setattr__ and __getattr__/__getattribute__. If you end up modifying the same class several times, I think this still works, but you end up with an ever-increasing number of wrapped __setattr__ definitions. You should probably try to avoid that; maybe by setting a "secret flag" on the class and checking for it in add_old_setattr_to_class before modifying cls. You should probably also use a more-unlikely prefix than just old_, since you're essentially trying to create a whole separate namespace.

\n soup wrap:

This is a start:

class MagicWrapper(object):
    def __init__(self, wrapped):
        self._wrapped = wrapped

    def __getattr__(self, attr):
        return getattr(self._wrapped, attr)

    def __setattr__(self, attr, val):
        if attr == '_wrapped':
            super(MagicWrapper, self).__setattr__('_wrapped', val)
        else:
            setattr(self._wrapped, 'old_' + attr, getattr(self._wrapped, attr))
            setattr(self._wrapped, attr, val)


class MyObject(object):
    def __init__(self):
        self.attr_one = None
        self.attr_two = 1

obj = MyObject()
obj = MagicWrapper(obj)
obj.attr_one = 'new value'
obj.attr_two = 2

print obj.old_attr_one
print obj.attr_one
print obj.old_attr_two
print obj.attr_two

This isn't bullet-proof when you're trying to wrap weird objects (very little in Python is), but it should work for "normal" classes. You could write a lot more code to get a little bit closer to fully cloning the behaviour of the wrapped object, but it's probably impossible to do perfectly. The main thing to be aware of here is that many special methods will not be redirected to the wrapped object.

If you want to do this without wrapping obj in some way, it's going to get messy. Here's an option:

def add_old_setattr_to_class(cls):
    def __setattr__(self, attr, val):
        super_setattr = super(self.__class__, self).__setattr__
        if attr.startswith('old_'):
            super_setattr(attr, val)
        else:
            super_setattr('old_' + attr, getattr(self, attr))
            super_setattr(attr, val)
    cls.__setattr__ = __setattr__


class MyObject(object):
    def __init__(self):
        self.attr_one = None
        self.attr_two = 1

obj = MyObject()
add_old_setattr_to_class(obj.__class__)
obj.attr_one = 'new value'
obj.attr_two = 2

print obj.old_attr_one
print obj.attr_one
print obj.old_attr_two
print obj.attr_two

Note that this is extremely invasive if you're using it on externally provided objects. It globally modifies the class of the object you're applying the magic to, not just that one instance. This is because like several other special methods, __setattr__ is not looked up in the instance's attribute dictionary; the lookup skips straight to the class, so there's no way to just override __setattr__ on the instance. I would characterise this sort of code as a bizarre hack if I encountered it in the wild (it's "nifty cleverness" if I write it myself, of course ;) ).

This version may or may not play nicely with objects that already play tricks with __setattr__ and __getattr__/__getattribute__. If you end up modifying the same class several times, I think this still works, but you end up with an ever-increasing number of wrapped __setattr__ definitions. You should probably try to avoid that; maybe by setting a "secret flag" on the class and checking for it in add_old_setattr_to_class before modifying cls. You should probably also use a more-unlikely prefix than just old_, since you're essentially trying to create a whole separate namespace.

qid & accept id: (6762695, 6762730) query: Joining Subsequent List Elements - Python soup:

You can try the following if you don't care about init list:

\n
>>> a = ['AA', 'BB', 'C', 'D']\n>>> a[0] += a.pop(1)\n
\n

If you want to get new one and leave initList as is you can use something like this(note that this is just a sample):

\n
a = ['AA', 'BB', 'C', 'D']\noutList = a[:] # make a copy of list values\noutList[0] += outputList.pop(1)\n
\n

Or in some cases you can try to use something like this too:

\n
from itertools import groupby\n\na = ['AA', 'BB', 'C', 'D']\nres = [''.join((str(z) for z in y)) for x, y in groupby(a, key = lambda x: len(x) == 2)]\n
\n soup wrap:

You can try the following if you don't care about init list:

>>> a = ['AA', 'BB', 'C', 'D']
>>> a[0] += a.pop(1)

If you want to get new one and leave initList as is you can use something like this(note that this is just a sample):

a = ['AA', 'BB', 'C', 'D']
outList = a[:] # make a copy of list values
outList[0] += outputList.pop(1)

Or in some cases you can try to use something like this too:

from itertools import groupby

a = ['AA', 'BB', 'C', 'D']
res = [''.join((str(z) for z in y)) for x, y in groupby(a, key = lambda x: len(x) == 2)]
qid & accept id: (6798490, 6809725) query: Storing a directed, weighted, complete graph in the GAE datastore soup:

I solved my own problem with a minor modification to the first design I suggested in my question.

\n

I learned about the key_name argument that lets me set my own key names. So every time I create a new edge, I pass in the following argument to the constructor:

\n
key_name = vertex1.name + ' > ' + vertex2.name\n
\n

Then, instead of running this query multiple times:

\n
edge = Edge.all().filter('better =', vertex1).filter('worse =', vertex2).get()\n
\n

I can retrieve the edges easily since I know how to construct their keys. Using the Key.from_path() method, I construct a list of keys that refer to edges. Each key is obtained by doing this:

\n
db.Key.from_path('Edge', vertex1.name + ' > ' + vertex2.name)\n
\n

I then pass that list of keys to get all the objects in one query.

\n soup wrap:

I solved my own problem with a minor modification to the first design I suggested in my question.

I learned about the key_name argument that lets me set my own key names. So every time I create a new edge, I pass in the following argument to the constructor:

key_name = vertex1.name + ' > ' + vertex2.name

Then, instead of running this query multiple times:

edge = Edge.all().filter('better =', vertex1).filter('worse =', vertex2).get()

I can retrieve the edges easily since I know how to construct their keys. Using the Key.from_path() method, I construct a list of keys that refer to edges. Each key is obtained by doing this:

db.Key.from_path('Edge', vertex1.name + ' > ' + vertex2.name)

I then pass that list of keys to get all the objects in one query.

qid & accept id: (6819640, 6819810) query: Django, topic model with subtopics soup:

You probably want to look at the documentation for related_name. Basically Django does this for you. For example:

\n
class Topic(models.Model):\n    master_topic = models.ForeignKey('self',\n                     null=True,\n                     blank=True,\n                     related_name="sub_topics")\n
\n

Then access this code:

\n
apple = Topic.objects.filter(tag='Apple')\nsub_topics = apple.sub_topics.all() ## Gets all sub_topics.\n
\n soup wrap:

You probably want to look at the documentation for related_name. Basically Django does this for you. For example:

class Topic(models.Model):
    master_topic = models.ForeignKey('self',
                     null=True,
                     blank=True,
                     related_name="sub_topics")

Then access this code:

apple = Topic.objects.filter(tag='Apple')
sub_topics = apple.sub_topics.all() ## Gets all sub_topics.
qid & accept id: (6852394, 6853766) query: Ignoring unrecognized options when parsing argv? soup:

I just discovered that getopt will stop parsing if it encounters --:

\n
Python 2.6.6 (r266:84292, Jun 16 2011, 16:59:16) \nType "help", "copyright", "credits" or "license" for more information.\n>>> from getopt import getopt\n>>>\n>>> argv = ['-v', '--plugin=foo', '--', '--extra=bar', '-c']\n>>> opts, extra = getopt(argv, 'v', 'plugin=')\n>>>\n>>> opts\n[('-v', ''), ('--plugin', 'foo')]\n>>>\n>>> extra\n['--extra=bar', '-c']\n
\n

Note that the above argv is the equivalent of calling:

\n
> main.py -v --plugin=Foo -- --extra=bar -c\n
\n

I like this solution particularly since it gives the user a little extra flexibility in how he wants to order the parameters.

\n soup wrap:

I just discovered that getopt will stop parsing if it encounters --:

Python 2.6.6 (r266:84292, Jun 16 2011, 16:59:16) 
Type "help", "copyright", "credits" or "license" for more information.
>>> from getopt import getopt
>>>
>>> argv = ['-v', '--plugin=foo', '--', '--extra=bar', '-c']
>>> opts, extra = getopt(argv, 'v', 'plugin=')
>>>
>>> opts
[('-v', ''), ('--plugin', 'foo')]
>>>
>>> extra
['--extra=bar', '-c']

Note that the above argv is the equivalent of calling:

> main.py -v --plugin=Foo -- --extra=bar -c

I like this solution particularly since it gives the user a little extra flexibility in how he wants to order the parameters.

qid & accept id: (6941965, 6942141) query: Convert a date string into YYYYMMDD soup:

Try dateutil:

\n
from dateutil import parser\n\ndates = ['30th November 2009', '31st March 2010', '30th September 2010']\n\nfor date in dates:\n    print parser.parse(date).strftime('%Y%m%d')\n
\n

output:

\n
20091130\n20100331\n20100930\n
\n

or if you want to do it using standard datetime module:

\n
from datetime import datetime\n\ndates = ['30th November 2009', '31st March 2010', '30th September 2010']\n\nfor date in dates:\n    part = date.split()\n    print datetime.strptime('%s %s %s' % (part[0][:-2]), part[1], part[2]), '%d %B %Y').strftime('%Y%m%d')\n
\n soup wrap:

Try dateutil:

from dateutil import parser

dates = ['30th November 2009', '31st March 2010', '30th September 2010']

for date in dates:
    print parser.parse(date).strftime('%Y%m%d')

output:

20091130
20100331
20100930

or if you want to do it using standard datetime module:

from datetime import datetime

dates = ['30th November 2009', '31st March 2010', '30th September 2010']

for date in dates:
    part = date.split()
    print datetime.strptime('%s %s %s' % (part[0][:-2]), part[1], part[2]), '%d %B %Y').strftime('%Y%m%d')
qid & accept id: (6943912, 6944352) query: Using a global flag for python RegExp compile soup:

Yes, you can change it to be globally re.DOTALL. But you shouldn't. Global settings are a bad idea at the best of times -- this could cause any Python code run by the same instance of Python to break.

\n
\n

So, don't do this:

\n

The way you can change it is to use the fact that the Python interpreter caches modules per instance, so that if somebody else imports the same module they get the object to which you also have access. So you could rebind re.compile to a proxy function that passes re.DOTALL.

\n
import re\nre.my_compile = re.compile\nre.compile = lambda pattern, flags: re.my_compile(pattern, flags | re.DOTALL)\n
\n

and this change will happen to everybody else.

\n

You can even package this up in a context manager, as follows:

\n
from contextlib import contextmanager\n\n@contextmanager\ndef flag_regexen(flag):\n    import re\n    re.my_compile = re.compile\n    re.compile = lambda pattern, flags: re.my_compile(pattern, flags | flag)\n    yield\n    re.compile = re.my_compile\n
\n

and then

\n
with flag_regexen(re.DOTALL):\n    \n
\n soup wrap:

Yes, you can change it to be globally re.DOTALL. But you shouldn't. Global settings are a bad idea at the best of times -- this could cause any Python code run by the same instance of Python to break.


So, don't do this:

The way you can change it is to use the fact that the Python interpreter caches modules per instance, so that if somebody else imports the same module they get the object to which you also have access. So you could rebind re.compile to a proxy function that passes re.DOTALL.

import re
re.my_compile = re.compile
re.compile = lambda pattern, flags: re.my_compile(pattern, flags | re.DOTALL)

and this change will happen to everybody else.

You can even package this up in a context manager, as follows:

from contextlib import contextmanager

@contextmanager
def flag_regexen(flag):
    import re
    re.my_compile = re.compile
    re.compile = lambda pattern, flags: re.my_compile(pattern, flags | flag)
    yield
    re.compile = re.my_compile

and then

with flag_regexen(re.DOTALL):
    
qid & accept id: (6976372, 6976507) query: Mulitprocess Pools with different functions soup:

To pass different functions, you can simply call map_async multiple times.

\n

Here is an example to illustrate that,

\n
from multiprocessing import Pool\nfrom time import sleep\n\ndef square(x):\n    return x * x\n\ndef cube(y):\n    return y * y * y\n\npool = Pool(processes=20)\n\nresult_squares = pool.map_async(f, range(10))\nresult_cubes = pool.map_async(g, range(10))\n
\n

The result will be:

\n
>>> print result_squares.get(timeout=1)\n[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\n\n>>> print result_cubes.get(timeout=1)\n[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]\n
\n soup wrap:

To pass different functions, you can simply call map_async multiple times.

Here is an example to illustrate that,

from multiprocessing import Pool
from time import sleep

def square(x):
    return x * x

def cube(y):
    return y * y * y

pool = Pool(processes=20)

result_squares = pool.map_async(f, range(10))
result_cubes = pool.map_async(g, range(10))

The result will be:

>>> print result_squares.get(timeout=1)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

>>> print result_cubes.get(timeout=1)
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
qid & accept id: (6998245, 7022322) query: Iterate over a ‘window’ of adjacent elements in Python soup:

Resulting function (from the edit of the question),

\n

frankeniter with ideas from answers of @agf, @FogleBird, @senderle, a resulting somewhat-neat-looking piece of code is:

\n
from itertools import chain, repeat, islice\n\ndef window(seq, size=2, fill=0, fill_left=True, fill_right=False):\n    """ Returns a sliding window (of width n) over data from the iterable:\n      s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...\n    """\n    ssize = size - 1\n    it = chain(\n      repeat(fill, ssize * fill_left),\n      iter(seq),\n      repeat(fill, ssize * fill_right))\n    result = tuple(islice(it, size))\n    if len(result) == size:  # `<=` if okay to return seq if len(seq) < size\n        yield result\n    for elem in it:\n        result = result[1:] + (elem,)\n        yield result\n
\n

and, for some performance information regarding deque/tuple:

\n
In [32]: kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)\nIn [33]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.ia(**kwa)]\n10000 loops, best of 3: 358 us per loop\nIn [34]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.window(**kwa)]\n10000 loops, best of 3: 368 us per loop\nIn [36]: %timeit -n 10000 [sum(x) for x in tmpf5.ia(**kwa)]\n10000 loops, best of 3: 340 us per loop\nIn [37]: %timeit -n 10000 [sum(x) for x in tmpf5.window(**kwa)]\n10000 loops, best of 3: 432 us per loop\n
\n

but anyway, if it's numbers then numpy is likely preferable.

\n soup wrap:

Resulting function (from the edit of the question),

frankeniter with ideas from answers of @agf, @FogleBird, @senderle, a resulting somewhat-neat-looking piece of code is:

from itertools import chain, repeat, islice

def window(seq, size=2, fill=0, fill_left=True, fill_right=False):
    """ Returns a sliding window (of width n) over data from the iterable:
      s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
    """
    ssize = size - 1
    it = chain(
      repeat(fill, ssize * fill_left),
      iter(seq),
      repeat(fill, ssize * fill_right))
    result = tuple(islice(it, size))
    if len(result) == size:  # `<=` if okay to return seq if len(seq) < size
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

and, for some performance information regarding deque/tuple:

In [32]: kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)
In [33]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.ia(**kwa)]
10000 loops, best of 3: 358 us per loop
In [34]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.window(**kwa)]
10000 loops, best of 3: 368 us per loop
In [36]: %timeit -n 10000 [sum(x) for x in tmpf5.ia(**kwa)]
10000 loops, best of 3: 340 us per loop
In [37]: %timeit -n 10000 [sum(x) for x in tmpf5.window(**kwa)]
10000 loops, best of 3: 432 us per loop

but anyway, if it's numbers then numpy is likely preferable.

qid & accept id: (7050562, 7050577) query: Trying to duplicate a list and modify one version of it in Python 2 soup:

To make a new copy of your list, try:

\n
newList = list(oldList)\n
\n

Or more cryptic concise via slicing:

\n
newlist = oldList[:]\n
\n

Just assigning oldList to newList will result in two names pointing to the same object, like so:

\n

http://henry.precheur.org/python/copy_list

\n

Generic object copying functions are provided by the copy module. (Image taken from: http://henry.precheur.org/python/copy_list).

\n soup wrap:

To make a new copy of your list, try:

newList = list(oldList)

Or more cryptic concise via slicing:

newlist = oldList[:]

Just assigning oldList to newList will result in two names pointing to the same object, like so:

http://henry.precheur.org/python/copy_list

Generic object copying functions are provided by the copy module. (Image taken from: http://henry.precheur.org/python/copy_list).

qid & accept id: (7086295, 7086760) query: Proper way to organize testcases that involve a data file for each testcase? soup:

I've done similar things with the unittest framework by writing a function which creates and returns a test class. This function can then take in whatever parameters you want and customise the test class accordingly. You can also customise the __doc__ attribute of the test function(s) to get customised messages when running the tests.

\n

I quickly knocked up the following example code to illustrate this. Instead of doing any actual testing, it uses the random module to fail some tests for demonstration purposes. When created, the classes are inserted into the global namespace so that a call to unittest.main() will pick them up. Depending on how you run your tests, you may wish to do something different with the generated classes.

\n
import os\nimport unittest\n\n# Generate a test class for an individual file.\ndef make_test(filename):\n    class TestClass(unittest.TestCase):\n        def test_file(self):\n            # Do the actual testing here.\n            # parsed = do_my_parsing(filename)\n            # golden = load_golden(filename)\n            # self.assertEquals(parsed, golden, 'Parsing failed.')\n\n            # Randomly fail some tests.\n            import random\n            if not random.randint(0, 10):\n                self.assertEquals(0, 1, 'Parsing failed.')\n\n        # Set the docstring so we get nice test messages.\n        test_file.__doc__ = 'Test parsing of %s' % filename\n\n    return TestClass\n\n# Create a single file test.\nTest1 = make_test('file1.html')\n\n# Create several tests from a list.\nfor i in range(2, 5):\n    globals()['Test%d' % i] = make_test('file%d.html' % i)\n\n# Create them from a directory listing.\nfor dirname, subdirs, filenames in os.walk('tests'):\n    for f in filenames:\n        globals()['Test%s' % f] = make_test('%s/%s' % (dirname, f))\n\n# If this file is being run, run all the tests.\nif __name__ == '__main__':\n    unittest.main()\n
\n

A sample run:

\n
$ python tests.py -v\nTest parsing of file1.html ... ok\nTest parsing of file2.html ... ok\nTest parsing of file3.html ... ok\nTest parsing of file4.html ... ok\nTest parsing of tests/file5.html ... ok\nTest parsing of tests/file6.html ... FAIL\nTest parsing of tests/file7.html ... ok\nTest parsing of tests/file8.html ... ok\n\n======================================================================\nFAIL: Test parsing of tests/file6.html\n----------------------------------------------------------------------\nTraceback (most recent call last):\n  File "generic.py", line 16, in test_file\n    self.assertEquals(0, 1, 'Parsing failed.')\nAssertionError: Parsing failed.\n\n----------------------------------------------------------------------\nRan 8 tests in 0.004s\n\nFAILED (failures=1)\n
\n soup wrap:

I've done similar things with the unittest framework by writing a function which creates and returns a test class. This function can then take in whatever parameters you want and customise the test class accordingly. You can also customise the __doc__ attribute of the test function(s) to get customised messages when running the tests.

I quickly knocked up the following example code to illustrate this. Instead of doing any actual testing, it uses the random module to fail some tests for demonstration purposes. When created, the classes are inserted into the global namespace so that a call to unittest.main() will pick them up. Depending on how you run your tests, you may wish to do something different with the generated classes.

import os
import unittest

# Generate a test class for an individual file.
def make_test(filename):
    class TestClass(unittest.TestCase):
        def test_file(self):
            # Do the actual testing here.
            # parsed = do_my_parsing(filename)
            # golden = load_golden(filename)
            # self.assertEquals(parsed, golden, 'Parsing failed.')

            # Randomly fail some tests.
            import random
            if not random.randint(0, 10):
                self.assertEquals(0, 1, 'Parsing failed.')

        # Set the docstring so we get nice test messages.
        test_file.__doc__ = 'Test parsing of %s' % filename

    return TestClass

# Create a single file test.
Test1 = make_test('file1.html')

# Create several tests from a list.
for i in range(2, 5):
    globals()['Test%d' % i] = make_test('file%d.html' % i)

# Create them from a directory listing.
for dirname, subdirs, filenames in os.walk('tests'):
    for f in filenames:
        globals()['Test%s' % f] = make_test('%s/%s' % (dirname, f))

# If this file is being run, run all the tests.
if __name__ == '__main__':
    unittest.main()

A sample run:

$ python tests.py -v
Test parsing of file1.html ... ok
Test parsing of file2.html ... ok
Test parsing of file3.html ... ok
Test parsing of file4.html ... ok
Test parsing of tests/file5.html ... ok
Test parsing of tests/file6.html ... FAIL
Test parsing of tests/file7.html ... ok
Test parsing of tests/file8.html ... ok

======================================================================
FAIL: Test parsing of tests/file6.html
----------------------------------------------------------------------
Traceback (most recent call last):
  File "generic.py", line 16, in test_file
    self.assertEquals(0, 1, 'Parsing failed.')
AssertionError: Parsing failed.

----------------------------------------------------------------------
Ran 8 tests in 0.004s

FAILED (failures=1)
qid & accept id: (7096090, 7096183) query: Add django model manager code-completion to Komodo soup:

Probably the easiest way to get this to work seems to be to add the following to the top of models.py:

\n
from django.db.models import manager\n
\n

and then under each model add

\n
objects = manager.Manager()\n
\n

so that, for example, the following:

\n
class Site(models.Model):\n    name = models.CharField(max_length=200)\n    prefix = models.CharField(max_length=1)\n    secret = models.CharField(max_length=255)\n\n    def __unicode__(self):\n        return self.name\n
\n

becomes

\n
class Site(models.Model):\n    name = models.CharField(max_length=200)\n    prefix = models.CharField(max_length=1)\n    secret = models.CharField(max_length=255)\n\n    objects = manager.Manager()\n\n    def __unicode__(self):\n        return self.name\n
\n

This is how you would (explicitly) set your own model manager, and by explicitly setting the model manager (to the default) Kommodo picks up the code completion perfectly.

\n

Hopefully this will help someone :-)

\n soup wrap:

Probably the easiest way to get this to work seems to be to add the following to the top of models.py:

from django.db.models import manager

and then under each model add

objects = manager.Manager()

so that, for example, the following:

class Site(models.Model):
    name = models.CharField(max_length=200)
    prefix = models.CharField(max_length=1)
    secret = models.CharField(max_length=255)

    def __unicode__(self):
        return self.name

becomes

class Site(models.Model):
    name = models.CharField(max_length=200)
    prefix = models.CharField(max_length=1)
    secret = models.CharField(max_length=255)

    objects = manager.Manager()

    def __unicode__(self):
        return self.name

This is how you would (explicitly) set your own model manager, and by explicitly setting the model manager (to the default) Kommodo picks up the code completion perfectly.

Hopefully this will help someone :-)

qid & accept id: (7132861, 7133204) query: building full path filename in python, soup:

This works fine:

\n
os.path.join(dir_name, base_filename + "." + filename_suffix)\n
\n

Keep in mind that os.path.join() exists to smooth over the different path separator characters used by different operating systems, so your code doesn't have to special-case each one. File name "extensions" only have significant meaning on one major operating system (they're simply part of the file name on non-Windows systems), and their separator is always a dot. There's no need for a function to join them, but if using one makes you feel better, you can do this:

\n
os.path.join(dir_name, '.'.join((base_filename, filename_suffix)))\n
\n

Or, if you want to keep your code really clean, simply include the dot in the suffix:

\n
suffix = '.pdf'\nos.path.join(dir_name, base_filename + suffix)\n
\n soup wrap:

This works fine:

os.path.join(dir_name, base_filename + "." + filename_suffix)

Keep in mind that os.path.join() exists to smooth over the different path separator characters used by different operating systems, so your code doesn't have to special-case each one. File name "extensions" only have significant meaning on one major operating system (they're simply part of the file name on non-Windows systems), and their separator is always a dot. There's no need for a function to join them, but if using one makes you feel better, you can do this:

os.path.join(dir_name, '.'.join((base_filename, filename_suffix)))

Or, if you want to keep your code really clean, simply include the dot in the suffix:

suffix = '.pdf'
os.path.join(dir_name, base_filename + suffix)
qid & accept id: (7154171, 7154284) query: How can I set a true/false variable with if? soup:

You can use a function attribute that you switch between true and false on each call:

\n
def toggleConsole():\n    toggleConsole.show = not getattr(toggleConsole, "show", True)\n    console = win32console.GetConsoleWindow()\n    win32gui.ShowWindow(console, int(toggleConsole.show))\n
\n

Here is a quick example of how this works:

\n
>>> def test():\n...     test.show = not getattr(test, "show", True)\n...     print int(test.show)\n... \n>>> test()\n0\n>>> test()\n1\n>>> test()\n0\n
\n soup wrap:

You can use a function attribute that you switch between true and false on each call:

def toggleConsole():
    toggleConsole.show = not getattr(toggleConsole, "show", True)
    console = win32console.GetConsoleWindow()
    win32gui.ShowWindow(console, int(toggleConsole.show))

Here is a quick example of how this works:

>>> def test():
...     test.show = not getattr(test, "show", True)
...     print int(test.show)
... 
>>> test()
0
>>> test()
1
>>> test()
0
qid & accept id: (7171140, 7171543) query: Using Python Iterparse For Large XML Files soup:

Try Liza Daly's fast_iter. After processing an element, elem, it calls elem.clear() to remove descendants and also removes preceding siblings.

\n
def fast_iter(context, func, *args, **kwargs):\n    """\n    http://lxml.de/parsing.html#modifying-the-tree\n    Based on Liza Daly's fast_iter\n    http://www.ibm.com/developerworks/xml/library/x-hiperfparse/\n    See also http://effbot.org/zone/element-iterparse.htm\n    """\n    for event, elem in context:\n        func(elem, *args, **kwargs)\n        # It's safe to call clear() here because no descendants will be\n        # accessed\n        elem.clear()\n        # Also eliminate now-empty references from the root node to elem\n        for ancestor in elem.xpath('ancestor-or-self::*'):\n            while ancestor.getprevious() is not None:\n                del ancestor.getparent()[0]\n    del context\n\n\ndef process_element(elem):\n    print elem.xpath( 'description/text( )' )\n\ncontext = etree.iterparse( MYFILE, tag='item' )\nfast_iter(context,process_element)\n
\n

Daly's article is an excellent read, especially if you are processing large XML files.

\n
\n

Edit: The fast_iter posted above is a modified version of Daly's fast_iter. After processing an element, it is more aggressive at removing other elements that are no longer needed.

\n

The script below shows the difference in behavior. Note in particular that orig_fast_iter does not delete the A1 element, while the mod_fast_iter does delete it, thus saving more memory.

\n
import lxml.etree as ET\nimport textwrap\nimport io\n\ndef setup_ABC():\n    content = textwrap.dedent('''\\n      \n        \n          \n          1\n          \n        \n        \n          \n          2\n          \n        \n      \n        ''')\n    return content\n\n\ndef study_fast_iter():\n    def orig_fast_iter(context, func, *args, **kwargs):\n        for event, elem in context:\n            print('Processing {e}'.format(e=ET.tostring(elem)))\n            func(elem, *args, **kwargs)\n            print('Clearing {e}'.format(e=ET.tostring(elem)))\n            elem.clear()\n            while elem.getprevious() is not None:\n                print('Deleting {p}'.format(\n                    p=(elem.getparent()[0]).tag))\n                del elem.getparent()[0]\n        del context\n\n    def mod_fast_iter(context, func, *args, **kwargs):\n        """\n        http://www.ibm.com/developerworks/xml/library/x-hiperfparse/\n        Author: Liza Daly\n        See also http://effbot.org/zone/element-iterparse.htm\n        """\n        for event, elem in context:\n            print('Processing {e}'.format(e=ET.tostring(elem)))\n            func(elem, *args, **kwargs)\n            # It's safe to call clear() here because no descendants will be\n            # accessed\n            print('Clearing {e}'.format(e=ET.tostring(elem)))\n            elem.clear()\n            # Also eliminate now-empty references from the root node to elem\n            for ancestor in elem.xpath('ancestor-or-self::*'):\n                print('Checking ancestor: {a}'.format(a=ancestor.tag))\n                while ancestor.getprevious() is not None:\n                    print(\n                        'Deleting {p}'.format(p=(ancestor.getparent()[0]).tag))\n                    del ancestor.getparent()[0]\n        del context\n\n    content = setup_ABC()\n    context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')\n    orig_fast_iter(context, lambda elem: None)\n    # Processing 1\n    # Clearing 1\n    # Deleting B1\n    # Processing 2\n    # Clearing 2\n    # Deleting B2\n\n    print('-' * 80)\n    """\n    The improved fast_iter deletes A1. The original fast_iter does not.\n    """\n    content = setup_ABC()\n    context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')\n    mod_fast_iter(context, lambda elem: None)\n    # Processing 1\n    # Clearing 1\n    # Checking ancestor: root\n    # Checking ancestor: A1\n    # Checking ancestor: C\n    # Deleting B1\n    # Processing 2\n    # Clearing 2\n    # Checking ancestor: root\n    # Checking ancestor: A2\n    # Deleting A1\n    # Checking ancestor: C\n    # Deleting B2\n\nstudy_fast_iter()\n
\n soup wrap:

Try Liza Daly's fast_iter. After processing an element, elem, it calls elem.clear() to remove descendants and also removes preceding siblings.

def fast_iter(context, func, *args, **kwargs):
    """
    http://lxml.de/parsing.html#modifying-the-tree
    Based on Liza Daly's fast_iter
    http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
    See also http://effbot.org/zone/element-iterparse.htm
    """
    for event, elem in context:
        func(elem, *args, **kwargs)
        # It's safe to call clear() here because no descendants will be
        # accessed
        elem.clear()
        # Also eliminate now-empty references from the root node to elem
        for ancestor in elem.xpath('ancestor-or-self::*'):
            while ancestor.getprevious() is not None:
                del ancestor.getparent()[0]
    del context


def process_element(elem):
    print elem.xpath( 'description/text( )' )

context = etree.iterparse( MYFILE, tag='item' )
fast_iter(context,process_element)

Daly's article is an excellent read, especially if you are processing large XML files.


Edit: The fast_iter posted above is a modified version of Daly's fast_iter. After processing an element, it is more aggressive at removing other elements that are no longer needed.

The script below shows the difference in behavior. Note in particular that orig_fast_iter does not delete the A1 element, while the mod_fast_iter does delete it, thus saving more memory.

import lxml.etree as ET
import textwrap
import io

def setup_ABC():
    content = textwrap.dedent('''\
      
        
          
          1
          
        
        
          
          2
          
        
      
        ''')
    return content


def study_fast_iter():
    def orig_fast_iter(context, func, *args, **kwargs):
        for event, elem in context:
            print('Processing {e}'.format(e=ET.tostring(elem)))
            func(elem, *args, **kwargs)
            print('Clearing {e}'.format(e=ET.tostring(elem)))
            elem.clear()
            while elem.getprevious() is not None:
                print('Deleting {p}'.format(
                    p=(elem.getparent()[0]).tag))
                del elem.getparent()[0]
        del context

    def mod_fast_iter(context, func, *args, **kwargs):
        """
        http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
        Author: Liza Daly
        See also http://effbot.org/zone/element-iterparse.htm
        """
        for event, elem in context:
            print('Processing {e}'.format(e=ET.tostring(elem)))
            func(elem, *args, **kwargs)
            # It's safe to call clear() here because no descendants will be
            # accessed
            print('Clearing {e}'.format(e=ET.tostring(elem)))
            elem.clear()
            # Also eliminate now-empty references from the root node to elem
            for ancestor in elem.xpath('ancestor-or-self::*'):
                print('Checking ancestor: {a}'.format(a=ancestor.tag))
                while ancestor.getprevious() is not None:
                    print(
                        'Deleting {p}'.format(p=(ancestor.getparent()[0]).tag))
                    del ancestor.getparent()[0]
        del context

    content = setup_ABC()
    context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')
    orig_fast_iter(context, lambda elem: None)
    # Processing 1
    # Clearing 1
    # Deleting B1
    # Processing 2
    # Clearing 2
    # Deleting B2

    print('-' * 80)
    """
    The improved fast_iter deletes A1. The original fast_iter does not.
    """
    content = setup_ABC()
    context = ET.iterparse(io.BytesIO(content), events=('end', ), tag='C')
    mod_fast_iter(context, lambda elem: None)
    # Processing 1
    # Clearing 1
    # Checking ancestor: root
    # Checking ancestor: A1
    # Checking ancestor: C
    # Deleting B1
    # Processing 2
    # Clearing 2
    # Checking ancestor: root
    # Checking ancestor: A2
    # Deleting A1
    # Checking ancestor: C
    # Deleting B2

study_fast_iter()
qid & accept id: (7172290, 7172562) query: Override python logging for test efficiency soup:

Option 1:

\n

Logging can be disabled by calling

\n
logging.disable(logging.CRITICAL)\n
\n

and turned back on with

\n
logging.disable(logging.NOTSET)\n
\n

However, even after disabling logging, a logging statement such as logger.info would still cause Python to do a few attribute lookups and function calls before reaching the isEnabledFor method. Still, this might be good enough.

\n

Option 2:

\n

Use mocking:

\n
class MockLogger(object):\n    def debug(msg, *args, **kwargs): pass\n    def info(msg, *args, **kwargs): pass\n    def warn(msg, *args, **kwargs): pass\n    def error(msg, *args, **kwargs): pass\n    def critical(msg, *args, **kwargs): pass\n\nclass Test(unittest.TestCase):\n    def test_func(self):\n        _logger1=testmodule.logger1\n        _logger2=testmodule.logger2\n        testmodule.logger1=MockLogger()\n        testmodule.logger2=MockLogger()\n        # perform test\n        testmodule.logger1=_logger1\n        testmodule.logger2=_logger2\n
\n

This will reduce the time consumed by logging statements to the time it takes to do one attribute lookup and one (noop) function call. If that's not satisfactory, I think the only option left is removing the logging statements themselves.

\n soup wrap:

Option 1:

Logging can be disabled by calling

logging.disable(logging.CRITICAL)

and turned back on with

logging.disable(logging.NOTSET)

However, even after disabling logging, a logging statement such as logger.info would still cause Python to do a few attribute lookups and function calls before reaching the isEnabledFor method. Still, this might be good enough.

Option 2:

Use mocking:

class MockLogger(object):
    def debug(msg, *args, **kwargs): pass
    def info(msg, *args, **kwargs): pass
    def warn(msg, *args, **kwargs): pass
    def error(msg, *args, **kwargs): pass
    def critical(msg, *args, **kwargs): pass

class Test(unittest.TestCase):
    def test_func(self):
        _logger1=testmodule.logger1
        _logger2=testmodule.logger2
        testmodule.logger1=MockLogger()
        testmodule.logger2=MockLogger()
        # perform test
        testmodule.logger1=_logger1
        testmodule.logger2=_logger2

This will reduce the time consumed by logging statements to the time it takes to do one attribute lookup and one (noop) function call. If that's not satisfactory, I think the only option left is removing the logging statements themselves.

qid & accept id: (7218865, 7219341) query: How do you map a fully qualified class name to its class object in Python? soup:

You can use importlib in 2.7:

\n
from importlib import import_module\n\nname = 'xml.etree.ElementTree.ElementTree'\nparts = name.rsplit('.', 1)\nElementTree = getattr(import_module(parts[0]), parts[1])\ntree = ElementTree()\n
\n

In older versions you can use the __import__ function. It defaults to returning the top level of a package import (e.g. xml). However, if you pass it a non-empty fromlist, it returns the named module instead:

\n
name = 'xml.etree.ElementTree.ElementTree'\nparts = name.rsplit('.', 1)    \nElementTree = getattr(__import__(parts[0], fromlist=['']), parts[1])\ntree = ElementTree()\n
\n soup wrap:

You can use importlib in 2.7:

from importlib import import_module

name = 'xml.etree.ElementTree.ElementTree'
parts = name.rsplit('.', 1)
ElementTree = getattr(import_module(parts[0]), parts[1])
tree = ElementTree()

In older versions you can use the __import__ function. It defaults to returning the top level of a package import (e.g. xml). However, if you pass it a non-empty fromlist, it returns the named module instead:

name = 'xml.etree.ElementTree.ElementTree'
parts = name.rsplit('.', 1)    
ElementTree = getattr(__import__(parts[0], fromlist=['']), parts[1])
tree = ElementTree()
qid & accept id: (7274521, 7274693) query: how do i turn for loop iterator into a neat pythonic one line for loop soup:
list_choices = {}\nfor i in obj:\n    list_choices.setdefault(i.area.region.id, []).append([i.id, i.name])\n
\n

or, using list_choices = collections.defaultdict(list) the last line will be:

\n
list_choices[i.area.region.id].append([i.id, i.name])\n
\n soup wrap:
list_choices = {}
for i in obj:
    list_choices.setdefault(i.area.region.id, []).append([i.id, i.name])

or, using list_choices = collections.defaultdict(list) the last line will be:

list_choices[i.area.region.id].append([i.id, i.name])
qid & accept id: (7302316, 7302381) query: Converting string to datetime object in Python (GAE)? soup:

If you can use dateutil with GAE, then

\n
In [70]: import dateutil.parser as parser\nIn [71]: parser.parse('Sunday 31st of July 2005 ( 02:05:50 PM )',fuzzy=True)\nOut[71]: datetime.datetime(2005, 7, 31, 14, 5, 50)\n
\n

Otherwise, you'll have to rely on re to manipulate the date string into a format strptime can parse.

\n
In [89]: datetime.datetime.strptime(re.sub(r'\w+ (\d+)\w+ of(.+)\s+\( (.+) \)',r'\1 \2 \3','Sunday 31st of July 2005 ( 02:05:50 PM )'),'%d %B %Y %I:%M:%S %p')\nOut[89]: datetime.datetime(2005, 7, 31, 14, 5, 50)\n
\n soup wrap:

If you can use dateutil with GAE, then

In [70]: import dateutil.parser as parser
In [71]: parser.parse('Sunday 31st of July 2005 ( 02:05:50 PM )',fuzzy=True)
Out[71]: datetime.datetime(2005, 7, 31, 14, 5, 50)

Otherwise, you'll have to rely on re to manipulate the date string into a format strptime can parse.

In [89]: datetime.datetime.strptime(re.sub(r'\w+ (\d+)\w+ of(.+)\s+\( (.+) \)',r'\1 \2 \3','Sunday 31st of July 2005 ( 02:05:50 PM )'),'%d %B %Y %I:%M:%S %p')
Out[89]: datetime.datetime(2005, 7, 31, 14, 5, 50)
qid & accept id: (7376019, 7376026) query: list extend() to index, inserting list elements not only to the end soup:

Sure, you can use slice indexing:

\n
a_list[1:1] = b_list\n
\n

Just to demonstrate the general algorithm, if you were to implement the my_extend function in a hypothetical custom list class, it would look like this:

\n
def my_extend(self, other_list, index):\n    self[index:index] = other_list\n
\n

But don't actually make that a function, just use the slice notation when you need to.

\n soup wrap:

Sure, you can use slice indexing:

a_list[1:1] = b_list

Just to demonstrate the general algorithm, if you were to implement the my_extend function in a hypothetical custom list class, it would look like this:

def my_extend(self, other_list, index):
    self[index:index] = other_list

But don't actually make that a function, just use the slice notation when you need to.

qid & accept id: (7407934, 7408451) query: python beginner - how to read contents of several files into unique lists? soup:

You could do it like that if you don't need to remeber where the contents come from :

\n
PathwayList = []\nfor InFileName in FileList:\n    sys.stderr.write("Processing file %s\n" % InFileName)\n    InFile = open(InFileName, 'r')\n    PathwayList.append(InFile.readlines())\n    InFile.close()  \n\nfor contents in PathwayList:\n    # do something with contents which is a list of strings\n    print contents  \n
\n

or, if you want to keep track of the files names, you could use a dictionary :

\n
PathwayList = {}\nfor InFileName in FileList:\n    sys.stderr.write("Processing file %s\n" % InFileName)\n    InFile = open(InFileName, 'r')\n    PathwayList[InFile] = InFile.readlines()\n    InFile.close()\n\nfor filename, contents in PathwayList.items():\n    # do something with contents which is a list of strings\n    print filename, contents  \n
\n soup wrap:

You could do it like that if you don't need to remeber where the contents come from :

PathwayList = []
for InFileName in FileList:
    sys.stderr.write("Processing file %s\n" % InFileName)
    InFile = open(InFileName, 'r')
    PathwayList.append(InFile.readlines())
    InFile.close()  

for contents in PathwayList:
    # do something with contents which is a list of strings
    print contents  

or, if you want to keep track of the files names, you could use a dictionary :

PathwayList = {}
for InFileName in FileList:
    sys.stderr.write("Processing file %s\n" % InFileName)
    InFile = open(InFileName, 'r')
    PathwayList[InFile] = InFile.readlines()
    InFile.close()

for filename, contents in PathwayList.items():
    # do something with contents which is a list of strings
    print filename, contents  
qid & accept id: (7463941, 7464026) query: Reshape for array multiplication/division in python soup:

Two somewhat easy ways are:

\n
(x * y.T).T\n
\n

or

\n
x.reshape((-1,1)) * y\n
\n

Numpy's broadcasting is a very powerful feature, and will do exactly what you want automatically, but it expects the last axis (or axes) of the arrays to have the same shape, not the first axes. Thus, you need to transpose y for it to work.

\n

The second option is the same as what you're doing, but -1 is treated as a placeholder for the array's size, which reduces some typing.

\n soup wrap:

Two somewhat easy ways are:

(x * y.T).T

or

x.reshape((-1,1)) * y

Numpy's broadcasting is a very powerful feature, and will do exactly what you want automatically, but it expects the last axis (or axes) of the arrays to have the same shape, not the first axes. Thus, you need to transpose y for it to work.

The second option is the same as what you're doing, but -1 is treated as a placeholder for the array's size, which reduces some typing.

qid & accept id: (7471055, 7471348) query: Python: converting a nested list into a simple list with coord positions soup:

It might be this:

\n
l = [['g,g', 'g,g'], ['d,d', 'd,d,d', 'd,d'], ['s,s', 's,s']]\noutput = [ (x, y, z, v) for z, l1 in enumerate(l[::-1]) for y, l2 in enumerate(l1) for x, v in enumerate(l2.split(',')) ]\n
\n

... but as it has been written, it is not clear what the rule is exactly. In nested loops:

\n
output = []\nfor z,l1 in enumerate(l[::-1]):\n    for y, l2 in enumerate(l1):\n        for x, v in enumerate(l2.split(',')):\n            output.append((x, y, z, v))\n
\n soup wrap:

It might be this:

l = [['g,g', 'g,g'], ['d,d', 'd,d,d', 'd,d'], ['s,s', 's,s']]
output = [ (x, y, z, v) for z, l1 in enumerate(l[::-1]) for y, l2 in enumerate(l1) for x, v in enumerate(l2.split(',')) ]

... but as it has been written, it is not clear what the rule is exactly. In nested loops:

output = []
for z,l1 in enumerate(l[::-1]):
    for y, l2 in enumerate(l1):
        for x, v in enumerate(l2.split(',')):
            output.append((x, y, z, v))
qid & accept id: (7490408, 7490431) query: How to unpack a list? soup:

Your code will work exactly as you typed it.

\n
def foo(*mylist):\n    bar("first", *mylist)\n\ndef bar(*vals):\n    print "|".join(vals)\n\nfoo("a","b")\n
\n

will print:

\n
first|a|b\n
\n soup wrap:

Your code will work exactly as you typed it.

def foo(*mylist):
    bar("first", *mylist)

def bar(*vals):
    print "|".join(vals)

foo("a","b")

will print:

first|a|b
qid & accept id: (7501557, 10318323) query: Convert property to django model field soup:

If you want to load from legacy fixture, you could build some intermediate model/table, convert file or customize dumpdata command. Fool dumpdata is possible, as following, but hmm...

\n
class VirtualField(object):\n    rel = None\n\n    def contribute_to_class(self, cls, name):\n        self.attname = self.name = name\n        # cls._meta.add_virtual_field(self)\n        get_field = cls._meta.get_field\n        cls._meta.get_field = lambda name, many_to_many=True: self if name == self.name else get_field(name, many_to_many)\n        models.signals.pre_init.connect(self.pre_init, sender=cls) #, weak=False)\n        models.signals.post_init.connect(self.post_init, sender=cls) #, weak=False)\n        setattr(cls, name, self)\n\n    def pre_init(self, signal, sender, args, kwargs, **_kwargs):\n        sender._meta._field_name_cache.append(self)\n\n    def post_init(self, signal, sender, **kwargs):\n        sender._meta._field_name_cache[:] = sender._meta._field_name_cache[:-1]\n\n    def __get__(self, instance, instance_type=None):\n        if instance is None:\n            return self\n        return instance.field1 + '/' + instance.field2\n\n    def __set__(self, instance, value):\n        if instance is None:\n             raise AttributeError(u"%s must be accessed via instance" % self.related.opts.object_name)\n        instance.field1, instance.field2 = value.split('/')\n\n    def to_python(self, value):\n        return value\n\nclass A(models.Model):\n     field1 = models.TextField()\n     field2 = models.TextField()\n     virtual_field = VirtualField()\n\n# legacy.json\n[{"pk": 1, "model": "so.a", "fields": {"virtual_field": "A/B"}}, {"pk": 2, "model": "so.a", "fields": {"virtual_field": "199/200"}}]\n\n$ ./manage.py loaddump legacy.json\nInstalled 2 object(s) from 1 fixture(s)\n
\n

Or you could add customized serializer to public serializers and mainly override its Deserializer function to work w/ properties that you have. Mainly override to tweak two lines in Deserializer inside django/core/serializers/python.py

\n
field = Model._meta.get_field(field_name)\n# and\nyield base.DeserializedObject(Model(**data), m2m_data)\n
\n soup wrap:

If you want to load from legacy fixture, you could build some intermediate model/table, convert file or customize dumpdata command. Fool dumpdata is possible, as following, but hmm...

class VirtualField(object):
    rel = None

    def contribute_to_class(self, cls, name):
        self.attname = self.name = name
        # cls._meta.add_virtual_field(self)
        get_field = cls._meta.get_field
        cls._meta.get_field = lambda name, many_to_many=True: self if name == self.name else get_field(name, many_to_many)
        models.signals.pre_init.connect(self.pre_init, sender=cls) #, weak=False)
        models.signals.post_init.connect(self.post_init, sender=cls) #, weak=False)
        setattr(cls, name, self)

    def pre_init(self, signal, sender, args, kwargs, **_kwargs):
        sender._meta._field_name_cache.append(self)

    def post_init(self, signal, sender, **kwargs):
        sender._meta._field_name_cache[:] = sender._meta._field_name_cache[:-1]

    def __get__(self, instance, instance_type=None):
        if instance is None:
            return self
        return instance.field1 + '/' + instance.field2

    def __set__(self, instance, value):
        if instance is None:
             raise AttributeError(u"%s must be accessed via instance" % self.related.opts.object_name)
        instance.field1, instance.field2 = value.split('/')

    def to_python(self, value):
        return value

class A(models.Model):
     field1 = models.TextField()
     field2 = models.TextField()
     virtual_field = VirtualField()

# legacy.json
[{"pk": 1, "model": "so.a", "fields": {"virtual_field": "A/B"}}, {"pk": 2, "model": "so.a", "fields": {"virtual_field": "199/200"}}]

$ ./manage.py loaddump legacy.json
Installed 2 object(s) from 1 fixture(s)

Or you could add customized serializer to public serializers and mainly override its Deserializer function to work w/ properties that you have. Mainly override to tweak two lines in Deserializer inside django/core/serializers/python.py

field = Model._meta.get_field(field_name)
# and
yield base.DeserializedObject(Model(**data), m2m_data)
qid & accept id: (7508774, 7508795) query: Beautiful Soup - how to fix broken tags soup:

Edit (working):

\n

I grabbed a complete (at least it should be complete) list of all html tags from w3 to match against. Try it out:

\n
fixedString = re.sub(">\s*(\!--|\!DOCTYPE|\\n                           a|abbr|acronym|address|applet|area|\\n                           b|base|basefont|bdo|big|blockquote|body|br|button|\\n                           caption|center|cite|code|col|colgroup|\\n                           dd|del|dfn|dir|div|dl|dt|\\n                           em|\\n                           fieldset|font|form|frame|frameset|\\n                           head|h1|h2|h3|h4|h5|h6|hr|html|\\n                           i|iframe|img|input|ins|\\n                           kbd|\\n                           label|legend|li|link|\\n                           map|menu|meta|\\n                           noframes|noscript|\\n                           object|ol|optgroup|option|\\n                           p|param|pre|\\n                           q|\\n                           s|samp|script|select|small|span|strike|strong|style|sub|sup|\\n                           table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|\\n                           u|ul|\\n                           var)>", "><\g<1>>", s)\nbs = BeautifulSoup(fixedString)\n
\n

Produces:

\n
>>> print s\n\n\ntd>LABEL1INPUT1\n\n\nLABEL2INPUT2\n\n\n>>> print re.sub(">\s*(\!--|\!DOCTYPE|\\n                       a|abbr|acronym|address|applet|area|\\n                       b|base|basefont|bdo|big|blockquote|body|br|button|\\n                       caption|center|cite|code|col|colgroup|\\n                       dd|del|dfn|dir|div|dl|dt|\\n                       em|\\n                       fieldset|font|form|frame|frameset|\\n                       head|h1|h2|h3|h4|h5|h6|hr|html|\\n                       i|iframe|img|input|ins|\\n                       kbd|\\n                       label|legend|li|link|\\n                       map|menu|meta|\\n                       noframes|noscript|\\n                       object|ol|optgroup|option|\\n                       p|param|pre|\\n                       q|\\n                       s|samp|script|select|small|span|strike|strong|style|sub|sup|\\n                       table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|\\n                       u|ul|\\n                       var)>", "><\g<1>>", s)\n\nLABEL1INPUT1\n\n\nLABEL2INPUT2\n\n
\n
\n

This one should match broken ending tags as well ():

\n
re.sub(">\s*(/?)(\!--|\!DOCTYPE|\a|abbr|acronym|address|applet|area|\\n                 b|base|basefont|bdo|big|blockquote|body|br|button|\\n                 caption|center|cite|code|col|colgroup|\\n                 dd|del|dfn|dir|div|dl|dt|\\n                 em|\\n                 fieldset|font|form|frame|frameset|\\n                 head|h1|h2|h3|h4|h5|h6|hr|html|\\n                 i|iframe|img|input|ins|\\n                 kbd|\\n                 label|legend|li|link|\\n                 map|menu|meta|\\n                 noframes|noscript|\\n                 object|ol|optgroup|option|\\n                 p|param|pre|\\n                 q|\\n                 s|samp|script|select|small|span|strike|strong|style|sub|sup|\\n                 table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|\\n                 u|ul|\\n                 var)>", "><\g<1>\g<2>>", s)\n
\n soup wrap:

Edit (working):

I grabbed a complete (at least it should be complete) list of all html tags from w3 to match against. Try it out:

fixedString = re.sub(">\s*(\!--|\!DOCTYPE|\
                           a|abbr|acronym|address|applet|area|\
                           b|base|basefont|bdo|big|blockquote|body|br|button|\
                           caption|center|cite|code|col|colgroup|\
                           dd|del|dfn|dir|div|dl|dt|\
                           em|\
                           fieldset|font|form|frame|frameset|\
                           head|h1|h2|h3|h4|h5|h6|hr|html|\
                           i|iframe|img|input|ins|\
                           kbd|\
                           label|legend|li|link|\
                           map|menu|meta|\
                           noframes|noscript|\
                           object|ol|optgroup|option|\
                           p|param|pre|\
                           q|\
                           s|samp|script|select|small|span|strike|strong|style|sub|sup|\
                           table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|\
                           u|ul|\
                           var)>", "><\g<1>>", s)
bs = BeautifulSoup(fixedString)

Produces:

>>> print s


td>LABEL1INPUT1


LABEL2INPUT2


>>> print re.sub(">\s*(\!--|\!DOCTYPE|\
                       a|abbr|acronym|address|applet|area|\
                       b|base|basefont|bdo|big|blockquote|body|br|button|\
                       caption|center|cite|code|col|colgroup|\
                       dd|del|dfn|dir|div|dl|dt|\
                       em|\
                       fieldset|font|form|frame|frameset|\
                       head|h1|h2|h3|h4|h5|h6|hr|html|\
                       i|iframe|img|input|ins|\
                       kbd|\
                       label|legend|li|link|\
                       map|menu|meta|\
                       noframes|noscript|\
                       object|ol|optgroup|option|\
                       p|param|pre|\
                       q|\
                       s|samp|script|select|small|span|strike|strong|style|sub|sup|\
                       table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|\
                       u|ul|\
                       var)>", "><\g<1>>", s)

LABEL1INPUT1


LABEL2INPUT2


This one should match broken ending tags as well ():

re.sub(">\s*(/?)(\!--|\!DOCTYPE|\a|abbr|acronym|address|applet|area|\
                 b|base|basefont|bdo|big|blockquote|body|br|button|\
                 caption|center|cite|code|col|colgroup|\
                 dd|del|dfn|dir|div|dl|dt|\
                 em|\
                 fieldset|font|form|frame|frameset|\
                 head|h1|h2|h3|h4|h5|h6|hr|html|\
                 i|iframe|img|input|ins|\
                 kbd|\
                 label|legend|li|link|\
                 map|menu|meta|\
                 noframes|noscript|\
                 object|ol|optgroup|option|\
                 p|param|pre|\
                 q|\
                 s|samp|script|select|small|span|strike|strong|style|sub|sup|\
                 table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|\
                 u|ul|\
                 var)>", "><\g<1>\g<2>>", s)
qid & accept id: (7522721, 7522895) query: Django Multiple Caches - How to choose which cache the session goes in? soup:

The cached_db and cache backends don't support it, but it's easy to create your own:

\n
from django.contrib.sessions.backends.cache import SessionStore as CachedSessionStore\nfrom django.core.cache import get_cache\nfrom django.conf import settings\n\nclass SessionStore(CachedSessionStore):\n    """\n    A cache-based session store.\n    """\n    def __init__(self, session_key=None):\n        self._cache = get_cache(settings.SESSION_CACHE_ALIAS)\n        super(SessionStore, self).__init__(session_key)\n
\n

No need for a cached_db backend since Redis is persistent anyway :)

\n
\n

When using Memcached and cached_db, its a bit more complex because of how that SessionStore is implemented. We just replace it completely:

\n
from django.conf import settings\nfrom django.contrib.sessions.backends.db import SessionStore as DBStore\nfrom django.core.cache import get_cache\n\nclass SessionStore(DBStore):\n    """\n    Implements cached, database backed sessions.  Now with control over the cache!\n    """\n\n    def __init__(self, session_key=None):\n        super(SessionStore, self).__init__(session_key)\n        self.cache = get_cache(getattr(settings, 'SESSION_CACHE_ALIAS', 'default'))\n\n    def load(self):\n        data = self.cache.get(self.session_key, None)\n        if data is None:\n            data = super(SessionStore, self).load()\n            self.cache.set(self.session_key, data, settings.SESSION_COOKIE_AGE)\n        return data\n\n    def exists(self, session_key):\n        return super(SessionStore, self).exists(session_key)\n\n    def save(self, must_create=False):\n        super(SessionStore, self).save(must_create)\n        self.cache.set(self.session_key, self._session, settings.SESSION_COOKIE_AGE)\n\n    def delete(self, session_key=None):\n        super(SessionStore, self).delete(session_key)\n        self.cache.delete(session_key or self.session_key)\n\n    def flush(self):\n        """\n        Removes the current session data from the database and regenerates the\n        key.\n        """\n        self.clear()\n        self.delete(self.session_key)\n        self.create()\n
\n soup wrap:

The cached_db and cache backends don't support it, but it's easy to create your own:

from django.contrib.sessions.backends.cache import SessionStore as CachedSessionStore
from django.core.cache import get_cache
from django.conf import settings

class SessionStore(CachedSessionStore):
    """
    A cache-based session store.
    """
    def __init__(self, session_key=None):
        self._cache = get_cache(settings.SESSION_CACHE_ALIAS)
        super(SessionStore, self).__init__(session_key)

No need for a cached_db backend since Redis is persistent anyway :)


When using Memcached and cached_db, its a bit more complex because of how that SessionStore is implemented. We just replace it completely:

from django.conf import settings
from django.contrib.sessions.backends.db import SessionStore as DBStore
from django.core.cache import get_cache

class SessionStore(DBStore):
    """
    Implements cached, database backed sessions.  Now with control over the cache!
    """

    def __init__(self, session_key=None):
        super(SessionStore, self).__init__(session_key)
        self.cache = get_cache(getattr(settings, 'SESSION_CACHE_ALIAS', 'default'))

    def load(self):
        data = self.cache.get(self.session_key, None)
        if data is None:
            data = super(SessionStore, self).load()
            self.cache.set(self.session_key, data, settings.SESSION_COOKIE_AGE)
        return data

    def exists(self, session_key):
        return super(SessionStore, self).exists(session_key)

    def save(self, must_create=False):
        super(SessionStore, self).save(must_create)
        self.cache.set(self.session_key, self._session, settings.SESSION_COOKIE_AGE)

    def delete(self, session_key=None):
        super(SessionStore, self).delete(session_key)
        self.cache.delete(session_key or self.session_key)

    def flush(self):
        """
        Removes the current session data from the database and regenerates the
        key.
        """
        self.clear()
        self.delete(self.session_key)
        self.create()
qid & accept id: (7537439, 7537466) query: How to increment a variable on a for loop in jinja template? soup:

You could use set to increment a counter:

\n
{% set count = 1 %}\n{% for i in p %}\n  {{ count }}\n  {% set count = count + 1 %}\n{% endfor %}\n
\n

Or you could use loop.index:

\n
{% for i in p %}\n  {{ loop.index }}\n{% endfor %}\n
\n

Check the template designer documentation.

\n soup wrap:

You could use set to increment a counter:

{% set count = 1 %}
{% for i in p %}
  {{ count }}
  {% set count = count + 1 %}
{% endfor %}

Or you could use loop.index:

{% for i in p %}
  {{ loop.index }}
{% endfor %}

Check the template designer documentation.

qid & accept id: (7586936, 7587060) query: Batch Scripting: Running python script on all directories in a folder soup:

Use the /d operator on for:

\n
for /D %%d in (*) do script.py %%d\n
\n

This assumes you only need to execute one command per directory (script.py %%d) if you need to execute more use braces (). Also I'm guessing there's an execution engine needed first, but not sure what it is for you.

\n

A multi-line example:

\n
for /D %%d in (%1) do (\n   echo processing %%d\n   script.py %%d\n)\n
\n soup wrap:

Use the /d operator on for:

for /D %%d in (*) do script.py %%d

This assumes you only need to execute one command per directory (script.py %%d) if you need to execute more use braces (). Also I'm guessing there's an execution engine needed first, but not sure what it is for you.

A multi-line example:

for /D %%d in (%1) do (
   echo processing %%d
   script.py %%d
)
qid & accept id: (7681301, 7681336) query: Search for a key in a nested Python dictionary soup:

You're close.

\n
idnum = 11\n# The loop and 'if' are good\n# You just had the 'break' in the wrong place\nfor id, idnumber in A.iteritems():\n    if idnum in idnumber.keys(): # you can skip '.keys()', it's the default\n       calculate = some_function_of(idnumber[idnum])\n       break # if we find it we're done looking - leave the loop\n    # otherwise we continue to the next dictionary\nelse:\n    # this is the for loop's 'else' clause\n    # if we don't find it at all, we end up here\n    # because we never broke out of the loop\n    calculate = your_default_value\n    # or whatever you want to do if you don't find it\n
\n

If you need to know how many 11s there are as keys in the inner dicts, you can:

\n
idnum = 11\nprint sum(idnum in idnumber for idnumber in A.itervalues())\n
\n

This works because a key can only be in each dict once so you just have to test if the key exits. in returns True or False which are equal to 1 and 0, so the sum is the number of occurences of idnum.

\n soup wrap:

You're close.

idnum = 11
# The loop and 'if' are good
# You just had the 'break' in the wrong place
for id, idnumber in A.iteritems():
    if idnum in idnumber.keys(): # you can skip '.keys()', it's the default
       calculate = some_function_of(idnumber[idnum])
       break # if we find it we're done looking - leave the loop
    # otherwise we continue to the next dictionary
else:
    # this is the for loop's 'else' clause
    # if we don't find it at all, we end up here
    # because we never broke out of the loop
    calculate = your_default_value
    # or whatever you want to do if you don't find it

If you need to know how many 11s there are as keys in the inner dicts, you can:

idnum = 11
print sum(idnum in idnumber for idnumber in A.itervalues())

This works because a key can only be in each dict once so you just have to test if the key exits. in returns True or False which are equal to 1 and 0, so the sum is the number of occurences of idnum.

qid & accept id: (7700545, 7700625) query: How to pick the largest number in a matrix of lists in python? soup:
max((cell[k], x, y)\n    for (y, row) in enumerate(m)\n    for (x, cell) in enumerate(row))[1:]\n
\n

Also, you can assign the result directly to a couple of variables:

\n
(_, x, y) = max((cell[k], x, y)\n                for (y, row) in enumerate(m)\n                for (x, cell) in enumerate(row))\n
\n

This is O(n2), btw.

\n soup wrap:
max((cell[k], x, y)
    for (y, row) in enumerate(m)
    for (x, cell) in enumerate(row))[1:]

Also, you can assign the result directly to a couple of variables:

(_, x, y) = max((cell[k], x, y)
                for (y, row) in enumerate(m)
                for (x, cell) in enumerate(row))

This is O(n2), btw.

qid & accept id: (7734028, 7736087) query: different foreground colors for each line in wxPython wxTextCtrl soup:

There are several methods in wx.Python to get colored text.

\n
    \n
  • wx.TextCtrl with wx.TE_RICH, wx.TE_RICH2 styles
  • \n
  • wx.stc.StyledTextCtrl
  • \n
  • wx.richtext.RichTextCtrl
  • \n
  • wx.HtmlWindow (inserting color tags in your text)
  • \n
  • wx.ListCrtl
  • \n
\n

You can get examples of all of them in the wxPython demo

\n

For example, you can change fore and background colors in any part of a wx.TextCrtl:

\n
rt = wx.TextCtrl(self, -1,"My Text....",size=(200, 100),style=wx.TE_MULTILINE|wx.TE_RICH2)\nrt.SetInsertionPoint(0)\nrt.SetStyle(2, 5, wx.TextAttr("red", "blue"))\n
\n

wx.richtext is also easy to use to write lines with different colors:

\n
rtc = wx.richtext.RichTextCtrl(self, style=wx.VSCROLL|wx.HSCROLL|wx.NO_BORDER)\nrtc.BeginTextColour((255, 0, 0))\nrtc.WriteText("this color is red")\nrtc.EndTextColour()\nrtc.Newline()\n
\n

As indicated in other answer the use of a wx.ListCrtl can be a very straighforward method if you work with lines of text (instead of multiline text).

\n soup wrap:

There are several methods in wx.Python to get colored text.

  • wx.TextCtrl with wx.TE_RICH, wx.TE_RICH2 styles
  • wx.stc.StyledTextCtrl
  • wx.richtext.RichTextCtrl
  • wx.HtmlWindow (inserting color tags in your text)
  • wx.ListCrtl

You can get examples of all of them in the wxPython demo

For example, you can change fore and background colors in any part of a wx.TextCrtl:

rt = wx.TextCtrl(self, -1,"My Text....",size=(200, 100),style=wx.TE_MULTILINE|wx.TE_RICH2)
rt.SetInsertionPoint(0)
rt.SetStyle(2, 5, wx.TextAttr("red", "blue"))

wx.richtext is also easy to use to write lines with different colors:

rtc = wx.richtext.RichTextCtrl(self, style=wx.VSCROLL|wx.HSCROLL|wx.NO_BORDER)
rtc.BeginTextColour((255, 0, 0))
rtc.WriteText("this color is red")
rtc.EndTextColour()
rtc.Newline()

As indicated in other answer the use of a wx.ListCrtl can be a very straighforward method if you work with lines of text (instead of multiline text).

qid & accept id: (7741455, 8021955) query: Creating a boost::python::object from a std::function soup:

Use boost::python::make_function, and provide a signature because the default one doesn't handle std::function.

\n

For example, we want to wrap the return type of:

\n
std::function get_string_function(const std::string& name)\n{\n    return [=](int x, int y)\n    {\n        return name + "(x=" + std::to_string(x) + ", y=" + std::to_string(y) + ")";\n    };\n}\n
\n

We could define a wrapper and def using it:

\n
boost::python::object get_string_function_pywrapper(const std::string& name)\n{\n    auto func = get_string_function(name);\n    auto call_policies = boost::python::default_call_policies();\n    typedef boost::mpl::vector func_sig;\n    return boost::python::make_function(func, call_policies, func_sig());\n}\n\nBOOST_PYTHON_MODULE(s)\n{\n    boost::python::def("get_string_function", get_string_function_pywrapper);\n}\n
\n

The Python side can now use the result as we want:

\n
>>> import s\n>>> s.get_string_function("Coord")\n\n>>> _(1, 4)\n'Coord(x=1, y=4)'\n
\n soup wrap:

Use boost::python::make_function, and provide a signature because the default one doesn't handle std::function.

For example, we want to wrap the return type of:

std::function get_string_function(const std::string& name)
{
    return [=](int x, int y)
    {
        return name + "(x=" + std::to_string(x) + ", y=" + std::to_string(y) + ")";
    };
}

We could define a wrapper and def using it:

boost::python::object get_string_function_pywrapper(const std::string& name)
{
    auto func = get_string_function(name);
    auto call_policies = boost::python::default_call_policies();
    typedef boost::mpl::vector func_sig;
    return boost::python::make_function(func, call_policies, func_sig());
}

BOOST_PYTHON_MODULE(s)
{
    boost::python::def("get_string_function", get_string_function_pywrapper);
}

The Python side can now use the result as we want:

>>> import s
>>> s.get_string_function("Coord")

>>> _(1, 4)
'Coord(x=1, y=4)'
qid & accept id: (7786737, 7837535) query: How to use PyBrain? soup:

It seems that this is a supervised learning problem. In this type of problem you NEED to provide some answers BEFORE to train your NN.

\n

You can try following approach

\n
    \n
  1. Create a simple maze for your car.
  2. \n
  3. Drive your car manually in this maze.
  4. \n
  5. Collect your turning information
  6. \n
\n

Lets assume you have following car.

\n
    \n
  • rf = rangefinder
  • \n
  • rf_f = rangefinder_forward
  • \n
  • rf_r = rangefinder_right
  • \n
  • rf_l = rangefinder_left
  • \n
  • rf_60 = rangefinder_60 degree
  • \n
  • rf_320 = rangefinder_320 degree
  • \n
\n

Below is your rf diagram

\n
  320   f   60\n   \   |  / \n    \  | /\n     \ |/  \n l--------------r\n       |\n       |\n       |\n
\n

Your train set should be like below.

\n
rf_f , rf_l , rf_r, rf_60, rf_320 , turn\n0     0      0    0    0     0       0    // we go directly, no obstacles detected\n0     0      0    0    0     0       0     // we go directly, , no obstacles detected\n1.0   0      0    0    0     0       0    // We see a wall in forward far away. \n0.9   1      0    0    0     0       0.2  // We see a wall in forward and left, \n                                             therefore turn right slightly etc.\n0.8   0.8      0    0    0     0     0.4  // We see a wall in forward and left, \n                                         therefore turn right slightly etc.\n
\n

After you have given such a training dataset to your NN you may train it.

\n soup wrap:

It seems that this is a supervised learning problem. In this type of problem you NEED to provide some answers BEFORE to train your NN.

You can try following approach

  1. Create a simple maze for your car.
  2. Drive your car manually in this maze.
  3. Collect your turning information

Lets assume you have following car.

  • rf = rangefinder
  • rf_f = rangefinder_forward
  • rf_r = rangefinder_right
  • rf_l = rangefinder_left
  • rf_60 = rangefinder_60 degree
  • rf_320 = rangefinder_320 degree

Below is your rf diagram

  320   f   60
   \   |  / 
    \  | /
     \ |/  
 l--------------r
       |
       |
       |

Your train set should be like below.

rf_f , rf_l , rf_r, rf_60, rf_320 , turn
0     0      0    0    0     0       0    // we go directly, no obstacles detected
0     0      0    0    0     0       0     // we go directly, , no obstacles detected
1.0   0      0    0    0     0       0    // We see a wall in forward far away. 
0.9   1      0    0    0     0       0.2  // We see a wall in forward and left, 
                                             therefore turn right slightly etc.
0.8   0.8      0    0    0     0     0.4  // We see a wall in forward and left, 
                                         therefore turn right slightly etc.

After you have given such a training dataset to your NN you may train it.

qid & accept id: (7821265, 10612571) query: PYMongo : Parsing|Serializing query output of a collection soup:

I have solved this by adding __setitem__ in class. \nthan i do

\n
result = as_class()\nfor key,value in dict_expr.items():\n        result.__setitem__(key,value)\n
\n

and in my class __setitem__ is like

\n
def __setitem__(self,key,value):\n     try:\n        attr = getattr(class_obj,key)\n        if(attr!=None):\n            if(isinstance(value,dict)):\n                for child_key,child_value in value.items(): \n                    attr.__setitem__(child_key,child_value)\n                setattr(class_obj,key,attr)\n            else:\n                setattr(class_obj,key,value)\n\n    except AttributeError:\n       pass\n
\n soup wrap:

I have solved this by adding __setitem__ in class. than i do

result = as_class()
for key,value in dict_expr.items():
        result.__setitem__(key,value)

and in my class __setitem__ is like

def __setitem__(self,key,value):
     try:
        attr = getattr(class_obj,key)
        if(attr!=None):
            if(isinstance(value,dict)):
                for child_key,child_value in value.items(): 
                    attr.__setitem__(child_key,child_value)
                setattr(class_obj,key,attr)
            else:
                setattr(class_obj,key,value)

    except AttributeError:
       pass
qid & accept id: (7835030, 7839576) query: Obtaining Client IP address from a WSGI app using Eventlet soup:

What you want is in the wsgi environ, specifically environ['REMOTE_ADDR'].

\n

However, if there is a proxy involved, then REMOTE_ADDR will be the address of the proxy, and the client address will be included (most likely) in HTTP_X_FORWARDED_FOR.

\n

Here's a function that should do what you want, for most cases (all credit to Sævar):

\n
def get_client_address(environ):\n    try:\n        return environ['HTTP_X_FORWARDED_FOR'].split(',')[-1].strip()\n    except KeyError:\n        return environ['REMOTE_ADDR']\n
\n
\n

You can easily see what is included in the wsgi environ by writing a simple wsgi app and pointing a browser at it, for example:

\n
from eventlet import wsgi\nimport eventlet\n\nfrom pprint import pformat\n\ndef show_env(env, start_response):\n    start_response('200 OK', [('Content-Type', 'text/plain')])\n    return ['%s\r\n' % pformat(env)]\n\nwsgi.server(eventlet.listen(('', 8090)), show_env)\n
\n
\n

And combining the two ...

\n
from eventlet import wsgi\nimport eventlet\n\nfrom pprint import pformat\n\ndef get_client_address(environ):\n    try:\n        return environ['HTTP_X_FORWARDED_FOR'].split(',')[-1].strip()\n    except KeyError:\n        return environ['REMOTE_ADDR']\n\ndef show_env(env, start_response):\n    start_response('200 OK', [('Content-Type', 'text/plain')])\n    return ['%s\r\n\r\nClient Address: %s\r\n' % (pformat(env), get_client_address(env))]\n\nwsgi.server(eventlet.listen(('', 8090)), show_env)\n
\n soup wrap:

What you want is in the wsgi environ, specifically environ['REMOTE_ADDR'].

However, if there is a proxy involved, then REMOTE_ADDR will be the address of the proxy, and the client address will be included (most likely) in HTTP_X_FORWARDED_FOR.

Here's a function that should do what you want, for most cases (all credit to Sævar):

def get_client_address(environ):
    try:
        return environ['HTTP_X_FORWARDED_FOR'].split(',')[-1].strip()
    except KeyError:
        return environ['REMOTE_ADDR']

You can easily see what is included in the wsgi environ by writing a simple wsgi app and pointing a browser at it, for example:

from eventlet import wsgi
import eventlet

from pprint import pformat

def show_env(env, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['%s\r\n' % pformat(env)]

wsgi.server(eventlet.listen(('', 8090)), show_env)

And combining the two ...

from eventlet import wsgi
import eventlet

from pprint import pformat

def get_client_address(environ):
    try:
        return environ['HTTP_X_FORWARDED_FOR'].split(',')[-1].strip()
    except KeyError:
        return environ['REMOTE_ADDR']

def show_env(env, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['%s\r\n\r\nClient Address: %s\r\n' % (pformat(env), get_client_address(env))]

wsgi.server(eventlet.listen(('', 8090)), show_env)
qid & accept id: (7877282, 7885108) query: How to send image generated by PIL to browser? soup:

First, you can save the image to a tempfile and remove the local file (if you have one):

\n
from tempfile import NamedTemporaryFile\nfrom shutil import copyfileobj\nfrom os import remove\n\ntempFileObj = NamedTemporaryFile(mode='w+b',suffix='jpg')\npilImage = open('/tmp/myfile.jpg','rb')\ncopyfileobj(pilImage,tempFileObj)\npilImage.close()\nremove('/tmp/myfile.jpg')\ntempFileObj.seek(0,0)\n
\n

Second, set the temp file to the response (as per this stackoverflow question):

\n
from flask import send_file\n\n@app.route('/path')\ndef view_method():\n    response = send_file(tempFileObj, as_attachment=True, attachment_filename='myfile.jpg')\n    return response\n
\n soup wrap:

First, you can save the image to a tempfile and remove the local file (if you have one):

from tempfile import NamedTemporaryFile
from shutil import copyfileobj
from os import remove

tempFileObj = NamedTemporaryFile(mode='w+b',suffix='jpg')
pilImage = open('/tmp/myfile.jpg','rb')
copyfileobj(pilImage,tempFileObj)
pilImage.close()
remove('/tmp/myfile.jpg')
tempFileObj.seek(0,0)

Second, set the temp file to the response (as per this stackoverflow question):

from flask import send_file

@app.route('/path')
def view_method():
    response = send_file(tempFileObj, as_attachment=True, attachment_filename='myfile.jpg')
    return response
qid & accept id: (7886024, 7886060) query: related to List (want to insert into database) soup:

How about this?

\n
>>> query = 'INSERT INTO (%s) VALUES (%s)' % (','.join([str(i) for i in list1]),\n                                              ','.join([str(i) for i in list2]))\n>>> print query\nINSERT INTO (name,age,sex) VALUES (test,10,female)\n
\n

The str is needed because that way, numbers are allowed to be in the list.

\n

Edit: I feel like you could add some effort into this yourself, but anyway. To add quotes, I'd change it to this:

\n
>>> list1 = ['name', 'age', 'sex']\n>>> list2 = ['test', 10, 'female']\n>>> f = lambda l: ','.join(["'%s'" % str(s) for s in l])\n>>> print 'INSERT INTO (%s) VALUES (%s)' % (f(list1), f(list2))\nINSERT INTO ('name','age','sex') VALUES ('test','10','female')\n
\n soup wrap:

How about this?

>>> query = 'INSERT INTO (%s) VALUES (%s)' % (','.join([str(i) for i in list1]),
                                              ','.join([str(i) for i in list2]))
>>> print query
INSERT INTO (name,age,sex) VALUES (test,10,female)

The str is needed because that way, numbers are allowed to be in the list.

Edit: I feel like you could add some effort into this yourself, but anyway. To add quotes, I'd change it to this:

>>> list1 = ['name', 'age', 'sex']
>>> list2 = ['test', 10, 'female']
>>> f = lambda l: ','.join(["'%s'" % str(s) for s in l])
>>> print 'INSERT INTO (%s) VALUES (%s)' % (f(list1), f(list2))
INSERT INTO ('name','age','sex') VALUES ('test','10','female')
qid & accept id: (7907848, 7908229) query: How to create linux users via my own GUI application in Python? soup:

You can call something like

\n
useradd -m -p PASSWORD\n
\n

where PASSWORD is what you get as a result of crypt() function defined in unistd.h.

\n

As you've found out yourself, in the case of Python it looks like this

\n
import os \nimport crypt \n\npassword ="testpassword"\nencPass = crypt.crypt(Password,"salt")\nos.system("useradd -p "+encPass+" someuser ")\n
\n soup wrap:

You can call something like

useradd -m -p PASSWORD

where PASSWORD is what you get as a result of crypt() function defined in unistd.h.

As you've found out yourself, in the case of Python it looks like this

import os 
import crypt 

password ="testpassword"
encPass = crypt.crypt(Password,"salt")
os.system("useradd -p "+encPass+" someuser ")
qid & accept id: (7918240, 7929193) query: Navigate trough lxml categories soup:

You achieved the parsing, as you can see if you do the following:

\n
>>> tree\n\n
\n

Now you can go through this element using lxml._ElementTree functions, documented here: http://lxml.de/tutorial.html

\n

Here are some basics, with a simple file I got from my local network:

\n
>>> tree.getroot()\n\n>>> tree.getroot().tag\n'html'\n>>> tree.getroot().text\n>>> for child in tree.getroot().getchildren():\n    print child.tag, child.getchildren()\nhead\nbody\n>>> for child in tree.getroot().getchildren():\n    print child.tag, [sub_child.tag for sub_child in child.getchildren()]\nhead ['title']\nbody ['h1', 'p', 'hr', 'address']\n
\n soup wrap:

You achieved the parsing, as you can see if you do the following:

>>> tree

Now you can go through this element using lxml._ElementTree functions, documented here: http://lxml.de/tutorial.html

Here are some basics, with a simple file I got from my local network:

>>> tree.getroot()

>>> tree.getroot().tag
'html'
>>> tree.getroot().text
>>> for child in tree.getroot().getchildren():
    print child.tag, child.getchildren()
head
body
>>> for child in tree.getroot().getchildren():
    print child.tag, [sub_child.tag for sub_child in child.getchildren()]
head ['title']
body ['h1', 'p', 'hr', 'address']
qid & accept id: (7927670, 7928523) query: How to define a chi2 value function for arbitrary function? soup:

Since PyMinuit uses introspection, you have to use introspection, too. make_chi_squared() could be implemented like this:

\n
import inspect\n\nchi_squared_template = """\ndef chi_squared(%(params)s):\n    return (((f(data_x, %(params)s) - data_y) / errors) ** 2).sum()\n"""\n\ndef make_chi_squared(f, data_x, data_y, errors):\n    params = ", ".join(inspect.getargspec(f).args[1:])\n    exec chi_squared_template % {"params": params}\n    return chi_squared\n
\n

Example usage:

\n
import numpy\n\ndef f(x, a1, a2, a3, a4):\n    return a1 + a2*x + a3*x**2 + a4*x**3\n\ndata_x = numpy.arange(50)\nerrors = numpy.random.randn(50) * 0.3\ndata_y = data_x**3 + errors\n\nchi_squared = make_chi_squared(f, data_x, data_y, errors)\nprint inspect.getargspec(chi_squared).args\n
\n

printing

\n
['a1', 'a2', 'a3', 'a4']\n
\n soup wrap:

Since PyMinuit uses introspection, you have to use introspection, too. make_chi_squared() could be implemented like this:

import inspect

chi_squared_template = """
def chi_squared(%(params)s):
    return (((f(data_x, %(params)s) - data_y) / errors) ** 2).sum()
"""

def make_chi_squared(f, data_x, data_y, errors):
    params = ", ".join(inspect.getargspec(f).args[1:])
    exec chi_squared_template % {"params": params}
    return chi_squared

Example usage:

import numpy

def f(x, a1, a2, a3, a4):
    return a1 + a2*x + a3*x**2 + a4*x**3

data_x = numpy.arange(50)
errors = numpy.random.randn(50) * 0.3
data_y = data_x**3 + errors

chi_squared = make_chi_squared(f, data_x, data_y, errors)
print inspect.getargspec(chi_squared).args

printing

['a1', 'a2', 'a3', 'a4']
qid & accept id: (7933596, 7934577) query: Django dynamic model fields soup:

As of today, there are four available approaches, two of them requiring a certain storage backend:

\n
    \n
  1. Django-eav (the original package is no longer mantained but has some thriving forks)

    \n

    This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:

    \n
      \n
    • uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic;
    • \n
    • allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:

      \n
      eav.unregister(Encounter)\neav.register(Patient)\n
    • \n
    • Nicely integrates with Django admin;

    • \n
    • At the same time being really powerful.

    • \n
    \n

    Downsides:

    \n
      \n
    • Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
    • \n
    • Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
    • \n
    • You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader.
    • \n
    \n

    The usage is pretty straightforward:

    \n
    import eav\nfrom app.models import Patient, Encounter\n\neav.register(Encounter)\neav.register(Patient)\nAttribute.objects.create(name='age', datatype=Attribute.TYPE_INT)\nAttribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)\nAttribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)\nAttribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)\nAttribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)\n\nself.yes = EnumValue.objects.create(value='yes')\nself.no = EnumValue.objects.create(value='no')\nself.unkown = EnumValue.objects.create(value='unkown')\nynu = EnumGroup.objects.create(name='Yes / No / Unknown')\nynu.enums.add(self.yes)\nynu.enums.add(self.no)\nynu.enums.add(self.unkown)\n\nAttribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\\n                                       enum_group=ynu)\n\n# When you register a model within EAV,\n# you can access all of EAV attributes:\n\nPatient.objects.create(name='Bob', eav__age=12,\n                           eav__fever=no, eav__city='New York',\n                           eav__country='USA')\n# You can filter queries based on their EAV fields:\n\nquery1 = Patient.objects.filter(Q(eav__city__contains='Y'))\nquery2 = Q(eav__city__contains='Y') |  Q(eav__fever=no)\n
  2. \n
  3. Hstore, JSON or JSONB fields in PostgreSQL

    \n

    PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.

    \n

    HStoreField:

    \n

    Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.

    \n

    This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.

    \n
    #app/models.py\nfrom django.contrib.postgres.fields import HStoreField\nclass Something(models.Model):\n    name = models.CharField(max_length=32)\n    data = models.HStoreField(db_index=True)\n
    \n

    In Django's shell you can use it like this:

    \n
    >>> instance = Something.objects.create(\n                 name='something',\n                 data={'a': '1', 'b': '2'}\n           )\n>>> instance.data['a']\n'1'        \n>>> empty = Something.objects.create(name='empty')\n>>> empty.data\n{}\n>>> empty.data['a'] = '1'\n>>> empty.save()\n>>> Something.objects.get(name='something').data['a']\n'1'\n
    \n

    You can issue indexed queries against hstore fields:

    \n
    # equivalence\nSomething.objects.filter(data={'a': '1', 'b': '2'})\n\n# subset by key/value mapping\nSomething.objects.filter(data__a='1')\n\n# subset by list of keys\nSomething.objects.filter(data__has_keys=['a', 'b'])\n\n# subset by single key\nSomething.objects.filter(data__has_key='a')    \n
    \n

    JSONField:

    \n

    JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore.\nSeveral packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage.\nJSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.

    \n
    #app/models.py\nfrom django.contrib.postgres.fields import JSONField\nclass Something(models.Model):\n    name = models.CharField(max_length=32)\n    data = JSONField(db_index=True)\n
    \n

    Creating in the shell:

    \n
    >>> instance = Something.objects.create(\n                 name='something',\n                 data={'a': 1, 'b': 2, 'nested': {'c':3}}\n           )\n
    \n

    Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).

    \n
    >>> Something.objects.filter(data__a=1)\n>>> Something.objects.filter(data__nested__c=3)\n>>> Something.objects.filter(data__has_key='a')\n
  4. \n
  5. Django MongoDB

    \n

    Or other NoSQL Django adaptations -- with them you can have fully dynamic models.

    \n

    NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things.

    \n

    Checkout this Django MongoDB example:

    \n
    from djangotoolbox.fields import DictField\n\nclass Image(models.Model):\n    exif = DictField()\n...\n\n>>> image = Image.objects.create(exif=get_exif_data(...))\n>>> image.exif\n{u'camera_model' : 'Spamcams 4242', 'exposure_time' : 0.3, ...}\n
    \n

    You can even create embedded lists of any Django models:

    \n
    class Container(models.Model):\n    stuff = ListField(EmbeddedModelField())\n\nclass FooModel(models.Model):\n    foo = models.IntegerField()\n\nclass BarModel(models.Model):\n    bar = models.CharField()\n...\n\n>>> Container.objects.create(\n    stuff=[FooModel(foo=42), BarModel(bar='spam')]\n)\n
  6. \n
  7. Django-mutant: Dynamic models based on syncdb and South-hooks

    \n

    Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall.

    \n

    All of these are based on Django South hooks, which, according to Will Hardy's talk at DjangoCon 2011 (watch it!) are nevertheless robust and tested in production (relevant source code).

    \n

    First to implement this was Michael Hall.

    \n

    Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests.

    \n

    If you are using Michael Halls lib, your code will look like this:

    \n
    from dynamo import models\n\ntest_app, created = models.DynamicApp.objects.get_or_create(\n                      name='dynamo'\n                    )\ntest, created = models.DynamicModel.objects.get_or_create(\n                  name='Test',\n                  verbose_name='Test Model',\n                  app=test_app\n               )\nfoo, created = models.DynamicModelField.objects.get_or_create(\n                  name = 'foo',\n                  verbose_name = 'Foo Field',\n                  model = test,\n                  field_type = 'dynamiccharfield',\n                  null = True,\n                  blank = True,\n                  unique = False,\n                  help_text = 'Test field for Foo',\n               )\nbar, created = models.DynamicModelField.objects.get_or_create(\n                  name = 'bar',\n                  verbose_name = 'Bar Field',\n                  model = test,\n                  field_type = 'dynamicintegerfield',\n                  null = True,\n                  blank = True,\n                  unique = False,\n                  help_text = 'Test field for Bar',\n               )\n
  8. \n
\n soup wrap:

As of today, there are four available approaches, two of them requiring a certain storage backend:

  1. Django-eav (the original package is no longer mantained but has some thriving forks)

    This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:

    • uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic;
    • allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:

      eav.unregister(Encounter)
      eav.register(Patient)
      
    • Nicely integrates with Django admin;

    • At the same time being really powerful.

    Downsides:

    • Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
    • Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
    • You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader.

    The usage is pretty straightforward:

    import eav
    from app.models import Patient, Encounter
    
    eav.register(Encounter)
    eav.register(Patient)
    Attribute.objects.create(name='age', datatype=Attribute.TYPE_INT)
    Attribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)
    Attribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)
    Attribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)
    
    self.yes = EnumValue.objects.create(value='yes')
    self.no = EnumValue.objects.create(value='no')
    self.unkown = EnumValue.objects.create(value='unkown')
    ynu = EnumGroup.objects.create(name='Yes / No / Unknown')
    ynu.enums.add(self.yes)
    ynu.enums.add(self.no)
    ynu.enums.add(self.unkown)
    
    Attribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\
                                           enum_group=ynu)
    
    # When you register a model within EAV,
    # you can access all of EAV attributes:
    
    Patient.objects.create(name='Bob', eav__age=12,
                               eav__fever=no, eav__city='New York',
                               eav__country='USA')
    # You can filter queries based on their EAV fields:
    
    query1 = Patient.objects.filter(Q(eav__city__contains='Y'))
    query2 = Q(eav__city__contains='Y') |  Q(eav__fever=no)
    
  2. Hstore, JSON or JSONB fields in PostgreSQL

    PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.

    HStoreField:

    Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.

    This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.

    #app/models.py
    from django.contrib.postgres.fields import HStoreField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = models.HStoreField(db_index=True)
    

    In Django's shell you can use it like this:

    >>> instance = Something.objects.create(
                     name='something',
                     data={'a': '1', 'b': '2'}
               )
    >>> instance.data['a']
    '1'        
    >>> empty = Something.objects.create(name='empty')
    >>> empty.data
    {}
    >>> empty.data['a'] = '1'
    >>> empty.save()
    >>> Something.objects.get(name='something').data['a']
    '1'
    

    You can issue indexed queries against hstore fields:

    # equivalence
    Something.objects.filter(data={'a': '1', 'b': '2'})
    
    # subset by key/value mapping
    Something.objects.filter(data__a='1')
    
    # subset by list of keys
    Something.objects.filter(data__has_keys=['a', 'b'])
    
    # subset by single key
    Something.objects.filter(data__has_key='a')    
    

    JSONField:

    JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore. Several packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage. JSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.

    #app/models.py
    from django.contrib.postgres.fields import JSONField
    class Something(models.Model):
        name = models.CharField(max_length=32)
        data = JSONField(db_index=True)
    

    Creating in the shell:

    >>> instance = Something.objects.create(
                     name='something',
                     data={'a': 1, 'b': 2, 'nested': {'c':3}}
               )
    

    Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).

    >>> Something.objects.filter(data__a=1)
    >>> Something.objects.filter(data__nested__c=3)
    >>> Something.objects.filter(data__has_key='a')
    
  3. Django MongoDB

    Or other NoSQL Django adaptations -- with them you can have fully dynamic models.

    NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things.

    Checkout this Django MongoDB example:

    from djangotoolbox.fields import DictField
    
    class Image(models.Model):
        exif = DictField()
    ...
    
    >>> image = Image.objects.create(exif=get_exif_data(...))
    >>> image.exif
    {u'camera_model' : 'Spamcams 4242', 'exposure_time' : 0.3, ...}
    

    You can even create embedded lists of any Django models:

    class Container(models.Model):
        stuff = ListField(EmbeddedModelField())
    
    class FooModel(models.Model):
        foo = models.IntegerField()
    
    class BarModel(models.Model):
        bar = models.CharField()
    ...
    
    >>> Container.objects.create(
        stuff=[FooModel(foo=42), BarModel(bar='spam')]
    )
    
  4. Django-mutant: Dynamic models based on syncdb and South-hooks

    Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall.

    All of these are based on Django South hooks, which, according to Will Hardy's talk at DjangoCon 2011 (watch it!) are nevertheless robust and tested in production (relevant source code).

    First to implement this was Michael Hall.

    Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests.

    If you are using Michael Halls lib, your code will look like this:

    from dynamo import models
    
    test_app, created = models.DynamicApp.objects.get_or_create(
                          name='dynamo'
                        )
    test, created = models.DynamicModel.objects.get_or_create(
                      name='Test',
                      verbose_name='Test Model',
                      app=test_app
                   )
    foo, created = models.DynamicModelField.objects.get_or_create(
                      name = 'foo',
                      verbose_name = 'Foo Field',
                      model = test,
                      field_type = 'dynamiccharfield',
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = 'Test field for Foo',
                   )
    bar, created = models.DynamicModelField.objects.get_or_create(
                      name = 'bar',
                      verbose_name = 'Bar Field',
                      model = test,
                      field_type = 'dynamicintegerfield',
                      null = True,
                      blank = True,
                      unique = False,
                      help_text = 'Test field for Bar',
                   )
    
qid & accept id: (7950124, 7950135) query: strip ' from all members in a list soup:

It looks like you want to interpret the strings as integers. Use int to do this:

\n
chkseq = [int(line) for line in open("sequence.txt")] \n
\n

It can also be written using map instead of a list comprehension:

\n
chkseq = map(int, open("sequence.txt"))\n
\n soup wrap:

It looks like you want to interpret the strings as integers. Use int to do this:

chkseq = [int(line) for line in open("sequence.txt")] 

It can also be written using map instead of a list comprehension:

chkseq = map(int, open("sequence.txt"))
qid & accept id: (7953623, 7954508) query: How to modify the metavar for a positional argument in pythons argparse? soup:

How about:

\n
import argparse\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser(description = "Print a range.")\n\n    parser.add_argument("start", type = int, help = "Specify start.", )\n    parser.add_argument("stop", type = int, help = "Specify stop.", )\n    parser.add_argument("step", type = int, help = "Specify step.", )\n\n    args=parser.parse_args()\n    print(args)\n
\n

which yields

\n
% test.py -h\nusage: test.py [-h] start stop step\n\nPrint a range.\n\npositional arguments:\n  start       Specify start.\n  stop        Specify stop.\n  step        Specify step.\n\noptional arguments:\n  -h, --help  show this help message and exit\n
\n soup wrap:

How about:

import argparse
if __name__ == '__main__':
    parser = argparse.ArgumentParser(description = "Print a range.")

    parser.add_argument("start", type = int, help = "Specify start.", )
    parser.add_argument("stop", type = int, help = "Specify stop.", )
    parser.add_argument("step", type = int, help = "Specify step.", )

    args=parser.parse_args()
    print(args)

which yields

% test.py -h
usage: test.py [-h] start stop step

Print a range.

positional arguments:
  start       Specify start.
  stop        Specify stop.
  step        Specify step.

optional arguments:
  -h, --help  show this help message and exit
qid & accept id: (8017432, 8017470) query: Most efficient way to index words in a document? soup:

Use database for storing values.

\n
    \n
  1. First add all the sentences to one table (they should have IDs). You may call it eg. sentences.
  2. \n
  3. Second, create table with words contained within all the sentences (call it eg. words, give each word an ID), saving connection between sentences' table records and words' table records within separate table (call it eg. sentences_words, it should have two columns, preferably word_id and sentence_id).
  4. \n
  5. When searching for sentences containing all the mentioned words, your job will be simplified:

    \n
      \n
    1. You should first find records from words table, where words are exactly the ones you search for. The query could look like this:

      \n
      SELECT `id` FROM `words` WHERE `word` IN ('word1', 'word2', 'word3');\n
    2. \n
    3. Second, you should find sentence_id values from table sentences that have required word_id values (corresponding to the words from words table). The initial query could look like this:

      \n
      SELECT `sentence_id`, `word_id` FROM `sentences_words`\nWHERE `word_id` IN ([here goes list of words' ids]);\n
      \n

      which could be simplified to this:

      \n
      SELECT `sentence_id`, `word_id` FROM `sentences_words`\nWHERE `word_id` IN (\n    SELECT `id` FROM `words` WHERE `word` IN ('word1', 'word2', 'word3')\n);\n
    4. \n
    5. Filter the result within Python to return only sentence_id values that have all the required word_id IDs you need.

    6. \n
  6. \n
\n

This is basically a solution based on storing big amount of data in the form that is best suited for this - the database.

\n

EDIT:

\n
    \n
  1. If you will only search for two words, you can do even more (almost everything) on DBMS' side.
  2. \n
  3. Considering you need also position difference, you should store the position of the word within third column of sentences_words table (lets call it just position) and when searching for appropriate words, you should calculate difference of this value associated with both words.
  4. \n
\n soup wrap:

Use database for storing values.

  1. First add all the sentences to one table (they should have IDs). You may call it eg. sentences.
  2. Second, create table with words contained within all the sentences (call it eg. words, give each word an ID), saving connection between sentences' table records and words' table records within separate table (call it eg. sentences_words, it should have two columns, preferably word_id and sentence_id).
  3. When searching for sentences containing all the mentioned words, your job will be simplified:

    1. You should first find records from words table, where words are exactly the ones you search for. The query could look like this:

      SELECT `id` FROM `words` WHERE `word` IN ('word1', 'word2', 'word3');
      
    2. Second, you should find sentence_id values from table sentences that have required word_id values (corresponding to the words from words table). The initial query could look like this:

      SELECT `sentence_id`, `word_id` FROM `sentences_words`
      WHERE `word_id` IN ([here goes list of words' ids]);
      

      which could be simplified to this:

      SELECT `sentence_id`, `word_id` FROM `sentences_words`
      WHERE `word_id` IN (
          SELECT `id` FROM `words` WHERE `word` IN ('word1', 'word2', 'word3')
      );
      
    3. Filter the result within Python to return only sentence_id values that have all the required word_id IDs you need.

This is basically a solution based on storing big amount of data in the form that is best suited for this - the database.

EDIT:

  1. If you will only search for two words, you can do even more (almost everything) on DBMS' side.
  2. Considering you need also position difference, you should store the position of the word within third column of sentences_words table (lets call it just position) and when searching for appropriate words, you should calculate difference of this value associated with both words.
qid & accept id: (8019287, 8019418) query: Replace given line in files in Python soup:

I used this solution: Search and replace a line in a file in Python

\n
from tempfile import mkstemp\nfrom shutil import move\nfrom os import remove, close\n\ndef replace_3_line(file):\n    new_3rd_line = 'new_3_line\n'\n    #Create temp file\n    fh, abs_path = mkstemp()\n    new_file = open(abs_path,'w')\n    old_file = open(file)\n    counter = 0\n    for line in old_file:\n        counter = counter + 1\n        if counter == 3:\n            new_file.write(new_3rd_line)\n        else:\n            new_file.write(line)\n    #close temp file\n    new_file.close()\n    close(fh)\n    old_file.close()\n    #Remove original file\n    remove(file)\n    #Move new file\n    move(abs_path, file)\n\nreplace_3_line('tmp.ann')\n
\n

But it does not work with files that contains non English charecters.

\n
Traceback (most recent call last):\n  File "D:\xxx\replace.py", line 27, in \n    replace_3_line('tmp.ann')\n  File "D:\xxx\replace.py", line 12, in replace_3_line\n    for line in old_file:\n  File "C:\Python31\lib\encodings\cp1251.py", line 23, in decode\n    return codecs.charmap_decode(input,self.errors,decoding_table)[0]\nUnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 32: character maps to \n
\n

That is bad. Where's python unicode? (file is utf8, python3).

\n

File is:

\n
фвыафыв\nsdadf\n试试\n阿斯达а\n阿斯顿飞\n
\n soup wrap:

I used this solution: Search and replace a line in a file in Python

from tempfile import mkstemp
from shutil import move
from os import remove, close

def replace_3_line(file):
    new_3rd_line = 'new_3_line\n'
    #Create temp file
    fh, abs_path = mkstemp()
    new_file = open(abs_path,'w')
    old_file = open(file)
    counter = 0
    for line in old_file:
        counter = counter + 1
        if counter == 3:
            new_file.write(new_3rd_line)
        else:
            new_file.write(line)
    #close temp file
    new_file.close()
    close(fh)
    old_file.close()
    #Remove original file
    remove(file)
    #Move new file
    move(abs_path, file)

replace_3_line('tmp.ann')

But it does not work with files that contains non English charecters.

Traceback (most recent call last):
  File "D:\xxx\replace.py", line 27, in 
    replace_3_line('tmp.ann')
  File "D:\xxx\replace.py", line 12, in replace_3_line
    for line in old_file:
  File "C:\Python31\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 32: character maps to 

That is bad. Where's python unicode? (file is utf8, python3).

File is:

фвыафыв
sdadf
试试
阿斯达а
阿斯顿飞
qid & accept id: (8072755, 8167319) query: How do I get Python2.x `map` functionality in Python3.x? soup:

You must roll your own -- but it's easy:

\n
from itertools import zip_longest, starmap\n\ndef map2x(func, *iterables):\n    zipped = zip_longest(*iterables)\n    if func is None:\n        return zipped\n    return starmap(func, zipped)\n
\n

A simple example:

\n
a=['a1']\nb=['b1','b2','b3']\nc=['c1','c2']\n\nprint(list(map2x(None, a, b, c)))\n
\n

which gives us:

\n
[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]\n
\n soup wrap:

You must roll your own -- but it's easy:

from itertools import zip_longest, starmap

def map2x(func, *iterables):
    zipped = zip_longest(*iterables)
    if func is None:
        return zipped
    return starmap(func, zipped)

A simple example:

a=['a1']
b=['b1','b2','b3']
c=['c1','c2']

print(list(map2x(None, a, b, c)))

which gives us:

[('a1', 'b1', 'c1'), (None, 'b2', 'c2'), (None, 'b3', None)]
qid & accept id: (8087485, 8088872) query: transpose/rotate a block of a matrix in python soup:

Building on Sven Marnach's idea to use np.rot90, here is a version which rotates the quadrant clockwise (as requested?). In the key step

\n
block3[:] = np.rot90(block3.copy(),-1)\n
\n

a copy() is used on the right-hand side (RHS). Without the copy(), as values are assigned to block3, the underlying data used on the RHS is also changed. This muddles the values used in subsquent assignments. Without the copy(), multiple same values are spread about block3.

\n

I don't see a way to do this operation without a copy.

\n
import numpy as np\na = np.arange(36).reshape(6, 6)\nprint(a)\n# [[ 0  1  2  3  4  5]\n#  [ 6  7  8  9 10 11]\n#  [12 13 14 15 16 17]\n#  [18 19 20 21 22 23]\n#  [24 25 26 27 28 29]\n#  [30 31 32 33 34 35]]\nblock3 = a[3:6, 0:3]\n\n# To rotate counterclockwise\nblock3[:] = np.rot90(block3.copy())\nprint(a)\n# [[ 0  1  2  3  4  5]\n#  [ 6  7  8  9 10 11]\n#  [12 13 14 15 16 17]\n#  [20 26 32 21 22 23]\n#  [19 25 31 27 28 29]\n#  [18 24 30 33 34 35]]\n\n# To rotate clockwise\na = np.arange(36).reshape(6, 6)\nblock3 = a[3:6, 0:3]\nblock3[:] = np.rot90(block3.copy(),-1)\nprint(a)\n# [[ 0  1  2  3  4  5]\n#  [ 6  7  8  9 10 11]\n#  [12 13 14 15 16 17]\n#  [30 24 18 21 22 23]\n#  [31 25 19 27 28 29]\n#  [32 26 20 33 34 35]]\n
\n soup wrap:

Building on Sven Marnach's idea to use np.rot90, here is a version which rotates the quadrant clockwise (as requested?). In the key step

block3[:] = np.rot90(block3.copy(),-1)

a copy() is used on the right-hand side (RHS). Without the copy(), as values are assigned to block3, the underlying data used on the RHS is also changed. This muddles the values used in subsquent assignments. Without the copy(), multiple same values are spread about block3.

I don't see a way to do this operation without a copy.

import numpy as np
a = np.arange(36).reshape(6, 6)
print(a)
# [[ 0  1  2  3  4  5]
#  [ 6  7  8  9 10 11]
#  [12 13 14 15 16 17]
#  [18 19 20 21 22 23]
#  [24 25 26 27 28 29]
#  [30 31 32 33 34 35]]
block3 = a[3:6, 0:3]

# To rotate counterclockwise
block3[:] = np.rot90(block3.copy())
print(a)
# [[ 0  1  2  3  4  5]
#  [ 6  7  8  9 10 11]
#  [12 13 14 15 16 17]
#  [20 26 32 21 22 23]
#  [19 25 31 27 28 29]
#  [18 24 30 33 34 35]]

# To rotate clockwise
a = np.arange(36).reshape(6, 6)
block3 = a[3:6, 0:3]
block3[:] = np.rot90(block3.copy(),-1)
print(a)
# [[ 0  1  2  3  4  5]
#  [ 6  7  8  9 10 11]
#  [12 13 14 15 16 17]
#  [30 24 18 21 22 23]
#  [31 25 19 27 28 29]
#  [32 26 20 33 34 35]]
qid & accept id: (8096798, 8097092) query: Python: Find a Sentence between some website-tags using regex soup:

If you must do it with regular expressions, try something like this:

\n
a = re.finditer('(.+?)', html)\nfor m in a: \n    print m.group(1)\n
\n

Just for the reference, this code does the same, but in a far more robust way:

\n
doc = BeautifulSoup(html)\nfor a in doc.findAll('a', 'question-hyperlink'):\n    print a.text\n
\n soup wrap:

If you must do it with regular expressions, try something like this:

a = re.finditer('(.+?)', html)
for m in a: 
    print m.group(1)

Just for the reference, this code does the same, but in a far more robust way:

doc = BeautifulSoup(html)
for a in doc.findAll('a', 'question-hyperlink'):
    print a.text
qid & accept id: (8097844, 8097928) query: Executing different queries using mysql-python soup:

I think this is what you're looking for.

\n
def connect_and_get_data(query, data):\n    ...\n    cursor.execute(query, data)\n    ...\n\ndef get_data_about_first_amazing_topic(useful_string):\n    query = "SELECT ... FROM ... WHERE ... AND some_field=%s"\n    connect_and_get_data(query, ("one","two","three"))\n    ...\n
\n

But, if you're going to be making several queries quickly, it would be better to reuse your connection, since making too many connections can waste time.

\n
...\nCONNECTION = MySQLdb.connect(host=..., port=...,\n                             user=..., passwd=..., db=...,\n                             cursorclass=MySQLdb.cursors.DictCursor,\n                             charset = "utf8")\ncursor = CONNECTION.cursor()\ncursor.execute("SELECT ... FROM ... WHERE ... AND some_field=%s", ("first", "amazing", "topic"))\nfirst_result = cursor.fetchall()\n\ncursor.execute("SELECT ... FROM ... WHERE ... AND some_field=%s", (("first", "amazing", "topic")))\nsecond_result = cursor.fetchall()\n\ncursor.close()\n...\n
\n

This will make your code perform much better.

\n soup wrap:

I think this is what you're looking for.

def connect_and_get_data(query, data):
    ...
    cursor.execute(query, data)
    ...

def get_data_about_first_amazing_topic(useful_string):
    query = "SELECT ... FROM ... WHERE ... AND some_field=%s"
    connect_and_get_data(query, ("one","two","three"))
    ...

But, if you're going to be making several queries quickly, it would be better to reuse your connection, since making too many connections can waste time.

...
CONNECTION = MySQLdb.connect(host=..., port=...,
                             user=..., passwd=..., db=...,
                             cursorclass=MySQLdb.cursors.DictCursor,
                             charset = "utf8")
cursor = CONNECTION.cursor()
cursor.execute("SELECT ... FROM ... WHERE ... AND some_field=%s", ("first", "amazing", "topic"))
first_result = cursor.fetchall()

cursor.execute("SELECT ... FROM ... WHERE ... AND some_field=%s", (("first", "amazing", "topic")))
second_result = cursor.fetchall()

cursor.close()
...

This will make your code perform much better.

qid & accept id: (8137056, 8167348) query: How to input data from a web page to Python script most efficiently soup:

I now managed it with the exec() command.

\n
\n\n\n
\n\n\n
\n\n\n \n
\n

test.php:

\n
\n
\n soup wrap:

I now managed it with the exec() command.




test.php:


qid & accept id: (8147559, 8148597) query: how to get cookie in template webpy soup:
import web\ntemplate_globals = {\n    "cookies": web.cookies,\n}\nrender = web.template.render('templates/', globals=template_globals, base='layout', cache=False)\n
\n

You can also do things like this:

\n
render_partial = web.template.render('templates/', globals=template_globals)\ntemplate_globals.update(render=render_partial)\n
\n

And then you will be able to render partial templates in your templates

\n soup wrap:
import web
template_globals = {
    "cookies": web.cookies,
}
render = web.template.render('templates/', globals=template_globals, base='layout', cache=False)

You can also do things like this:

render_partial = web.template.render('templates/', globals=template_globals)
template_globals.update(render=render_partial)

And then you will be able to render partial templates in your templates

qid & accept id: (8180404, 8180490) query: python + auto ssh proccess to get date info soup:

The easiest way would be to just configure passwordless logins. Basically, create a local ssh key pair with

\n
ssh-keygen -t rsa\n
\n

and put the public key into $HOME/.ssh/authorized_keys at 103.116.140.151. If you don't care about the key of the remote host, add the -oStrictHostKeyChecking=no ssh option.

\n

Alternatively, use an SSH library such as Paramiko:

\n
import paramiko\nssh = paramiko.SSHClient()\n# Uncomment the following line for the equivalent of -oStrictHostKeyChecking=no\n#ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())\nssh.connect('103.116.140.151', username='user', password='diana_123')\nstdin, stdout, stderr = ssh.exec_command("date")\ndate = stdout.read()\nprint(date)\n
\n soup wrap:

The easiest way would be to just configure passwordless logins. Basically, create a local ssh key pair with

ssh-keygen -t rsa

and put the public key into $HOME/.ssh/authorized_keys at 103.116.140.151. If you don't care about the key of the remote host, add the -oStrictHostKeyChecking=no ssh option.

Alternatively, use an SSH library such as Paramiko:

import paramiko
ssh = paramiko.SSHClient()
# Uncomment the following line for the equivalent of -oStrictHostKeyChecking=no
#ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('103.116.140.151', username='user', password='diana_123')
stdin, stdout, stderr = ssh.exec_command("date")
date = stdout.read()
print(date)
qid & accept id: (8249165, 8249212) query: setting a condition for a mixed list soup:

Something like this, using a list comprehension:

\n
>>> input = [10, ["ETSc", "Juniper Hairstreak"], ["ETSc", "Spotted Turtle"], ["ETSc", "Blanding's Turtle"], "IWWH"]\n>>> output = [elt[0] + " (" + elt[1] + ")" if type(elt) == list and elt[0] == "ETSc" else str(elt) for elt in input]\n>>> output\n['10', 'ETSc (Juniper Hairstreak)', 'ETSc (Spotted Turtle)', "ETSc (Blanding's Turtle)", 'IWWH']\n
\n
\n

As @julio commented, you could make this more readable using a function:

\n
def xform(elt):\n    if type(elt) == list and len(elt) > 1 and elt[0] == "ETSc":\n        return elt[0] + " (" + elt[1] + ")"\n    else:\n        return str(elt)\n\noutput = [xform(elt) for elt in input]\n
\n soup wrap:

Something like this, using a list comprehension:

>>> input = [10, ["ETSc", "Juniper Hairstreak"], ["ETSc", "Spotted Turtle"], ["ETSc", "Blanding's Turtle"], "IWWH"]
>>> output = [elt[0] + " (" + elt[1] + ")" if type(elt) == list and elt[0] == "ETSc" else str(elt) for elt in input]
>>> output
['10', 'ETSc (Juniper Hairstreak)', 'ETSc (Spotted Turtle)', "ETSc (Blanding's Turtle)", 'IWWH']

As @julio commented, you could make this more readable using a function:

def xform(elt):
    if type(elt) == list and len(elt) > 1 and elt[0] == "ETSc":
        return elt[0] + " (" + elt[1] + ")"
    else:
        return str(elt)

output = [xform(elt) for elt in input]
qid & accept id: (8302519, 9073760) query: Suppressing the output in libsvm (python) soup:

Use the -q parameter option\n

\n
import svmutil\nparam = svmutil.svm_parameter('-q')\n...\n
\n

or\n

\n
import svmutil\nx = [[0.2, 0.1], [0.7, 0.6]]\ny = [0, 1]\nsvmutil.svm_train(y, x, '-q')\n
\n soup wrap:

Use the -q parameter option

import svmutil
param = svmutil.svm_parameter('-q')
...

or

import svmutil
x = [[0.2, 0.1], [0.7, 0.6]]
y = [0, 1]
svmutil.svm_train(y, x, '-q')
qid & accept id: (8329204, 8338373) query: Tipfy & Jinja: Creating a logout URL for every page soup:

I do something similar with Jinja / GAE and I use a BaseHandler + a template that I include. BaseHandler:

\n
class BaseHandler(webapp2.RequestHandler):\n    ...\n    def render_jinja(self, name, **data):\n        data['logout_url']=users.create_logout_url(self.request.uri)\n        template = jinja_environment.get_template('templates/'+name+'.html')\n        self.response.out.write(template.render(data))\n
\n

Then I can inherit the basehandler for eg form handlers:

\n
class FileUploadFormHandler(BaseHandler):\n    def get(self):\n        ...\n        self.render_jinja('contact_jinja', form=form, ...\n
\n soup wrap:

I do something similar with Jinja / GAE and I use a BaseHandler + a template that I include. BaseHandler:

class BaseHandler(webapp2.RequestHandler):
    ...
    def render_jinja(self, name, **data):
        data['logout_url']=users.create_logout_url(self.request.uri)
        template = jinja_environment.get_template('templates/'+name+'.html')
        self.response.out.write(template.render(data))

Then I can inherit the basehandler for eg form handlers:

class FileUploadFormHandler(BaseHandler):
    def get(self):
        ...
        self.render_jinja('contact_jinja', form=form, ...
qid & accept id: (8345190, 8345569) query: regex - how to recognise a pattern until a second one is found soup:

try this

\n
show_p=re.compile("(.*)\.s(\d*)e(\d*)")\nshow_p.match(x).groups()\n
\n

where x is your string

\n

Edit** (I forgot to include the extension, here is the revision)

\n
show_p=re.compile("^(.*)\.s(\d*)e(\d*).*?([^\.]*)$")\nshow_p.match(x).groups()\n
\n
\n

And Here is the test result

\n
>>> show_p=re.compile("(.*)\.s(\d*)e(\d*).*?([^\.]*)$")\n>>> x="tv_show.s01e01.episode_name.avi"\n>>> show_p.match(x).groups()\n('tv_show', '01', '01', 'avi')\n>>> x="tv_show.s2e1.episode_name.avi"\n>>> show_p.match(x).groups()\n('tv_show', '2', '1', 'avi')\n>>> x='some.other.tv.show.s04e05.episode_name.avi'\n>>> show_p.match(x).groups()\n('some.other.tv.show', '04', '05', 'avi')\n>>>  \n
\n soup wrap:

try this

show_p=re.compile("(.*)\.s(\d*)e(\d*)")
show_p.match(x).groups()

where x is your string

Edit** (I forgot to include the extension, here is the revision)

show_p=re.compile("^(.*)\.s(\d*)e(\d*).*?([^\.]*)$")
show_p.match(x).groups()

And Here is the test result

>>> show_p=re.compile("(.*)\.s(\d*)e(\d*).*?([^\.]*)$")
>>> x="tv_show.s01e01.episode_name.avi"
>>> show_p.match(x).groups()
('tv_show', '01', '01', 'avi')
>>> x="tv_show.s2e1.episode_name.avi"
>>> show_p.match(x).groups()
('tv_show', '2', '1', 'avi')
>>> x='some.other.tv.show.s04e05.episode_name.avi'
>>> show_p.match(x).groups()
('some.other.tv.show', '04', '05', 'avi')
>>>  
qid & accept id: (8355262, 8355692) query: formatting files with sed/ python/ etc soup:

This is what you are desiring?

\n
>>> x='$GETR("wp","1")$Yes$GETR("","2")$No$NOTE()$'\n>>> if x.count("$GETR")>1:\n    x=x.replace("$GETR","\n\t$GETR").replace("","\n")\n\n\n>>> print x\n\n    $GETR("wp","1")$Yes\n    $GETR("","2")$No$NOTE()$\n\n>>> x='$GETR("","2")$No$NOTE()$'\n>>> if x.count("$GETR")>1:\n    x=x.replace("$GETR","\n\t$GETR").replace("","\n")\n\n\n>>> print x\n$GETR("","2")$No$NOTE()$\n
\n

In that case try this

\n
if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")\nif x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")\n\n\n>>> x='$GETR("","2")$No$NOTE()$'\n>>> if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")\n>>> if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")\n>>> print x\n\n    $GETC("","2")$No$NOTE()$\n\n>>> x='$GETR("wp","1")$Yes$GETR("","2")$No$NOTE()$'\n>>> if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")\n>>> if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")\n>>> print x\n\n    $GETR("wp","1")$Yes\n    $GETR("","2")$No$NOTE()$\n\n>>> \n
\n soup wrap:

This is what you are desiring?

>>> x='$GETR("wp","1")$Yes$GETR("","2")$No$NOTE()$'
>>> if x.count("$GETR")>1:
    x=x.replace("$GETR","\n\t$GETR").replace("","\n")


>>> print x

    $GETR("wp","1")$Yes
    $GETR("","2")$No$NOTE()$

>>> x='$GETR("","2")$No$NOTE()$'
>>> if x.count("$GETR")>1:
    x=x.replace("$GETR","\n\t$GETR").replace("","\n")


>>> print x
$GETR("","2")$No$NOTE()$

In that case try this

if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")
if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")


>>> x='$GETR("","2")$No$NOTE()$'
>>> if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")
>>> if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")
>>> print x

    $GETC("","2")$No$NOTE()$

>>> x='$GETR("wp","1")$Yes$GETR("","2")$No$NOTE()$'
>>> if x.count("$GETR")>=1:x=x.replace("$GETR","\n\t$GETR").replace("","\n")
>>> if x.count("$GETR") == 1: x=x.replace("$GETR","$GETC")
>>> print x

    $GETR("wp","1")$Yes
    $GETR("","2")$No$NOTE()$

>>> 
qid & accept id: (8405977, 8565493) query: Rendering requested type in Tornado soup:

First, set up the handlers to count on a restful style URI. We use 2 chunks of regex looking for an ID and a potential request format (ie html, xml, json etc)

\n
class TaskServer(tornado.web.Application):\n    def __init__(self, newHandlers = [], debug = None):\n        request_format = "(\.[a-zA-Z]+$)?"\n        baseHandlers = [\n            (r"/jobs" + request_format, JobsHandler),\n            (r"/jobs/", JobsHandler),\n            (r"/jobs/new" + request_format, NewJobsHandler),\n            (r"/jobs/([0-9]+)/edit" + request_format, EditJobsHandler)\n        ]\n        for handler in newHandlers:\n            baseHandlers.append(handler)\n\n\n    tornado.web.Application.__init__(self, baseHandlers, debug = debug)\n
\n

Now, in the handler define a reusable function parseRestArgs (I put mine in a BaseHandler but pasted it here for ease of understanding/to save space) that splits out ID's and request formats. Since you should be expecting id's in a particular order, I stick them in a list.

\n

The get function can be abstracted more but it shows the basic idea of splitting out your logic into different request formats...

\n
class JobsHandler(BaseHandler):\n    def parseRestArgs(self, args):\n        idList = []\n        extension = None\n        if len(args) and not args[0] is None:\n            for arg in range(len(args)):\n                match = re.match("[0-9]+", args[arg])\n                if match:\n                    slave_id = int(match.groups()[0])\n\n            match = re.match("(\.[a-zA-Z]+$)", args[-1])\n            if match:\n                extension = match.groups()[0][1:]\n\n        return idList, extension\n\n    def get(self, *args):\n        ### Read\n        job_id, extension = self.parseRestArgs(args)\n\n        if len(job_id):\n            if extension == None or "html":\n               #self.render(html) # Show with some ID voodoo\n               pass\n            elif extension == 'json':\n                #self.render(json) # Show with some ID voodoo\n                pass\n            else:\n                raise tornado.web.HTTPError(404) #We don't do that sort of thing here...\n        else:\n            if extension == None or "html":\n                pass\n                # self.render(html) # Index- No ID given, show an index\n            elif extension == "json":\n                pass\n                # self.render(json) # Index- No ID given, show an index\n            else:\n                raise tornado.web.HTTPError(404) #We don't do that sort of thing here...\n
\n soup wrap:

First, set up the handlers to count on a restful style URI. We use 2 chunks of regex looking for an ID and a potential request format (ie html, xml, json etc)

class TaskServer(tornado.web.Application):
    def __init__(self, newHandlers = [], debug = None):
        request_format = "(\.[a-zA-Z]+$)?"
        baseHandlers = [
            (r"/jobs" + request_format, JobsHandler),
            (r"/jobs/", JobsHandler),
            (r"/jobs/new" + request_format, NewJobsHandler),
            (r"/jobs/([0-9]+)/edit" + request_format, EditJobsHandler)
        ]
        for handler in newHandlers:
            baseHandlers.append(handler)


    tornado.web.Application.__init__(self, baseHandlers, debug = debug)

Now, in the handler define a reusable function parseRestArgs (I put mine in a BaseHandler but pasted it here for ease of understanding/to save space) that splits out ID's and request formats. Since you should be expecting id's in a particular order, I stick them in a list.

The get function can be abstracted more but it shows the basic idea of splitting out your logic into different request formats...

class JobsHandler(BaseHandler):
    def parseRestArgs(self, args):
        idList = []
        extension = None
        if len(args) and not args[0] is None:
            for arg in range(len(args)):
                match = re.match("[0-9]+", args[arg])
                if match:
                    slave_id = int(match.groups()[0])

            match = re.match("(\.[a-zA-Z]+$)", args[-1])
            if match:
                extension = match.groups()[0][1:]

        return idList, extension

    def get(self, *args):
        ### Read
        job_id, extension = self.parseRestArgs(args)

        if len(job_id):
            if extension == None or "html":
               #self.render(html) # Show with some ID voodoo
               pass
            elif extension == 'json':
                #self.render(json) # Show with some ID voodoo
                pass
            else:
                raise tornado.web.HTTPError(404) #We don't do that sort of thing here...
        else:
            if extension == None or "html":
                pass
                # self.render(html) # Index- No ID given, show an index
            elif extension == "json":
                pass
                # self.render(json) # Index- No ID given, show an index
            else:
                raise tornado.web.HTTPError(404) #We don't do that sort of thing here...
qid & accept id: (8419817, 8419853) query: Remove single quotes from python list item soup:

Currently all of the values in your list are strings, and you want them to integers, here are the two most straightforward ways to do this:

\n
map(int, your_list)\n
\n

and

\n
[int(value) for value in your_list]\n
\n

See the documentation on map() and list comprehensions for more info.

\n

If you want to leave the items in your list as strings but display them without the single quotes, you can use the following:

\n
print('[' + ', '.join(your_list) + ']')\n
\n soup wrap:

Currently all of the values in your list are strings, and you want them to integers, here are the two most straightforward ways to do this:

map(int, your_list)

and

[int(value) for value in your_list]

See the documentation on map() and list comprehensions for more info.

If you want to leave the items in your list as strings but display them without the single quotes, you can use the following:

print('[' + ', '.join(your_list) + ']')
qid & accept id: (8422308, 8433306) query: Waf: How to output a generated file? soup:

If you simply want to substitute an input file your versionfile.ver should look like this

\n
VERSION=@VERSION@\nDATADIR=@DATADIR@\n
\n

Now you can use the following task so the values will be substituted

\n
bld.new_task_gen (\n  features = "subst",\n  source= "versionfile.ver",\n  target= "versionfile.out",\n  VERSION = bld.env['VERSION'],\n  DATADIR = bld.env['DATADIR'])\n
\n

To be able to access version from bld you have to define it during configure

\n
conf.env['VERSION'] = '0.7.0'\n
\n

You can find this waf task in action here Output files of this tasks can than be used as input for other tasks.

\n

However when you want to pass on your source file through a python script or any command available you can use the following:

\n
lib_typelib = bld.new_task_gen(\n  name = 'versionfile',\n  source = 'versionfile.ver',\n  target = 'versionfile.out',\n  rule='/path/to/your/python/script ${SRC} -o ${TGT}')\n
\n

There is also a sample available here where in this case g-ir-compiler is used what in your case would be a python script.

\n soup wrap:

If you simply want to substitute an input file your versionfile.ver should look like this

VERSION=@VERSION@
DATADIR=@DATADIR@

Now you can use the following task so the values will be substituted

bld.new_task_gen (
  features = "subst",
  source= "versionfile.ver",
  target= "versionfile.out",
  VERSION = bld.env['VERSION'],
  DATADIR = bld.env['DATADIR'])

To be able to access version from bld you have to define it during configure

conf.env['VERSION'] = '0.7.0'

You can find this waf task in action here Output files of this tasks can than be used as input for other tasks.

However when you want to pass on your source file through a python script or any command available you can use the following:

lib_typelib = bld.new_task_gen(
  name = 'versionfile',
  source = 'versionfile.ver',
  target = 'versionfile.out',
  rule='/path/to/your/python/script ${SRC} -o ${TGT}')

There is also a sample available here where in this case g-ir-compiler is used what in your case would be a python script.

qid & accept id: (8431654, 8431743) query: retrieve the Package.Module.Class name from a (Python) class/type soup:

Using inspect.getmodule you can (sometimes) find the module in which an object was defined, e.g.

\n
>>> from collections import defaultdict\n>>> import inspect\n>>> inspect.getmodule(defaultdict)\n\n
\n

The module name can be found using __name__. Note that the defining module need not be the one you imported from due to re-exports:

\n
>>> from scipy.sparse import csr_matrix\n>>> inspect.getmodule(csr_matrix).__name__\n'scipy.sparse.csr'\n
\n soup wrap:

Using inspect.getmodule you can (sometimes) find the module in which an object was defined, e.g.

>>> from collections import defaultdict
>>> import inspect
>>> inspect.getmodule(defaultdict)

The module name can be found using __name__. Note that the defining module need not be the one you imported from due to re-exports:

>>> from scipy.sparse import csr_matrix
>>> inspect.getmodule(csr_matrix).__name__
'scipy.sparse.csr'
qid & accept id: (8470539, 8472855) query: How do I index n sets of 4 columns to plot multiple plots using matplotlib? soup:

Well if you like R's data.table, there have been a few (at least) attempts to re-create that functionality in NumPy--through additional classes in NumPy Core and through external Python libraries. The effort i find most promising is the datarray library by Fernando Perez. Here's how it works.

\n
>>> # create a NumPy array for use as our data set\n>>> import numpy as NP\n>>> D = NP.random.randint(0, 10, 40).reshape(8, 5)\n\n>>> # create some generic row and column names to pass to the constructor\n>>> row_ids = [ "row{0}".format(c) for c in range(D1.shape[0]) ]\n>>> rows = 'rows_id', row_ids\n\n>>> variables = [ "col{0}".format(c) for c in range(D1.shape[1]) ]\n>>> cols = 'variable', variables\n
\n

Instantiate the DataArray instance, by calling the constructor and passing in an ordinary NumPy array and a list of tuples--one tuple for each axis, and since ndim = 2 here, there are two tuples in the list each tuple is comprised of axis label (str) and a sequence of labels for that axes (list).

\n
>>> from datarray.datarray import DataArray as DA\n>>> D1 = DA(D, [rows, cols])\n\n>>> D1.axes\n      (Axis(name='rows', index=0, labels=['row0', 'row1', 'row2', 'row3', \n           'row4', 'row5', 'row6', 'row7']), Axis(name='cols', index=1, \n           labels=['col0', 'col1', 'col2', 'col3', 'col4']))\n\n>>> # now you can use R-like syntax to reference a NumPy data array by column:\n>>> D1[:,'col1']\n      DataArray([8, 5, 0, 7, 8, 9, 9, 4])\n      ('rows',)\n
\n soup wrap:

Well if you like R's data.table, there have been a few (at least) attempts to re-create that functionality in NumPy--through additional classes in NumPy Core and through external Python libraries. The effort i find most promising is the datarray library by Fernando Perez. Here's how it works.

>>> # create a NumPy array for use as our data set
>>> import numpy as NP
>>> D = NP.random.randint(0, 10, 40).reshape(8, 5)

>>> # create some generic row and column names to pass to the constructor
>>> row_ids = [ "row{0}".format(c) for c in range(D1.shape[0]) ]
>>> rows = 'rows_id', row_ids

>>> variables = [ "col{0}".format(c) for c in range(D1.shape[1]) ]
>>> cols = 'variable', variables

Instantiate the DataArray instance, by calling the constructor and passing in an ordinary NumPy array and a list of tuples--one tuple for each axis, and since ndim = 2 here, there are two tuples in the list each tuple is comprised of axis label (str) and a sequence of labels for that axes (list).

>>> from datarray.datarray import DataArray as DA
>>> D1 = DA(D, [rows, cols])

>>> D1.axes
      (Axis(name='rows', index=0, labels=['row0', 'row1', 'row2', 'row3', 
           'row4', 'row5', 'row6', 'row7']), Axis(name='cols', index=1, 
           labels=['col0', 'col1', 'col2', 'col3', 'col4']))

>>> # now you can use R-like syntax to reference a NumPy data array by column:
>>> D1[:,'col1']
      DataArray([8, 5, 0, 7, 8, 9, 9, 4])
      ('rows',)
qid & accept id: (8530203, 8530500) query: Match multiple lines in a file using regular expression python soup:

As python re module documentation says you may add the MULTILINE flag to re.compile method. This will let you match entire file at once.

\n
import re\n\nregex = re.match(r'''(\n    ^\s*clns\s+routing$ |\n    ^\s*bfd\s+graceful-restart$ |\n    ^\s*ip\s+default-network$ |\n    ^\s*ip\s+default-gateway$ |\n    ^\s*ip\s+subnet-zero$ |\n    ^\s*ip\s+cef\s*$\n)+''', re.MULTILINE | re.VERBOSE)\n
\n

Notice that I've added VERBOSE flag to write regex with additional formatting to make regex look nicer. Also you should see that there are several ^ and $ symbols. That is how multiline regex allows you to match over multiple lines in one file.

\n

Additionally I must warn you that this regex will only help to match file just to be sure is entire file correctly formatted. If you want to parse data from this file you need to modify this regex a little to satisfy your needs.

\n

Second code variant

\n
import re\n\nregex = re.match(r'''(^\n    \s*\n    (clns|bfd|ip)\n    \s+\n    (routing|graceful-restart|default-network|default-gateway|subnet-zero|cef)\n$)+''', re.MULTILINE | re.VERBOSE)\n
\n soup wrap:

As python re module documentation says you may add the MULTILINE flag to re.compile method. This will let you match entire file at once.

import re

regex = re.match(r'''(
    ^\s*clns\s+routing$ |
    ^\s*bfd\s+graceful-restart$ |
    ^\s*ip\s+default-network$ |
    ^\s*ip\s+default-gateway$ |
    ^\s*ip\s+subnet-zero$ |
    ^\s*ip\s+cef\s*$
)+''', re.MULTILINE | re.VERBOSE)

Notice that I've added VERBOSE flag to write regex with additional formatting to make regex look nicer. Also you should see that there are several ^ and $ symbols. That is how multiline regex allows you to match over multiple lines in one file.

Additionally I must warn you that this regex will only help to match file just to be sure is entire file correctly formatted. If you want to parse data from this file you need to modify this regex a little to satisfy your needs.

Second code variant

import re

regex = re.match(r'''(^
    \s*
    (clns|bfd|ip)
    \s+
    (routing|graceful-restart|default-network|default-gateway|subnet-zero|cef)
$)+''', re.MULTILINE | re.VERBOSE)
qid & accept id: (8605189, 8605236) query: Sorting a list of list of tuples based on the sum of first field in the tuple in Python soup:
big_list = [\n  [\n    (20, 'Item A', 'Jan'),\n    (30, 'Item B', 'Jan'),\n    (12, 'Item C', 'Jan'),\n  ],\n  [\n    (22, 'Item A', 'Feb'),\n    (34, 'Item B', 'Feb'),\n    (15, 'Item C', 'Feb'),\n  ]]\n\ns = {}\nfor l in big_list:\n    for m in l:\n        s[m[1]] = s.get(m[1], 0) + m[0]\n
\n

gives us s - the sums we want to use to sort: {'Item A': 42, 'Item B': 64, 'Item C': 27}

\n

And finally:

\n
for l in big_list:\n    l.sort(key=lambda x: s[x[1]])\n
\n

changes big_list to:

\n
[[(12, 'Item C', 'Jan'), (20, 'Item A', 'Jan'), (30, 'Item B', 'Jan')],\n [(15, 'Item C', 'Feb'), (22, 'Item A', 'Feb'), (34, 'Item B', 'Feb')]]\n
\n

This solution works for lists within months in any order and also if some item does not appear in some month.

\n soup wrap:
big_list = [
  [
    (20, 'Item A', 'Jan'),
    (30, 'Item B', 'Jan'),
    (12, 'Item C', 'Jan'),
  ],
  [
    (22, 'Item A', 'Feb'),
    (34, 'Item B', 'Feb'),
    (15, 'Item C', 'Feb'),
  ]]

s = {}
for l in big_list:
    for m in l:
        s[m[1]] = s.get(m[1], 0) + m[0]

gives us s - the sums we want to use to sort: {'Item A': 42, 'Item B': 64, 'Item C': 27}

And finally:

for l in big_list:
    l.sort(key=lambda x: s[x[1]])

changes big_list to:

[[(12, 'Item C', 'Jan'), (20, 'Item A', 'Jan'), (30, 'Item B', 'Jan')],
 [(15, 'Item C', 'Feb'), (22, 'Item A', 'Feb'), (34, 'Item B', 'Feb')]]

This solution works for lists within months in any order and also if some item does not appear in some month.

qid & accept id: (8652136, 8652206) query: Dynamic module loading in python soup:

This should answer your question:

\n
references = map(__import__, modules)\n
\n

or if you prefer dictionary with modules' names as keys:

\n
references = dict(zip(modules, map(__import__, modules)))\n
\n

Does it answer your question?

\n soup wrap:

This should answer your question:

references = map(__import__, modules)

or if you prefer dictionary with modules' names as keys:

references = dict(zip(modules, map(__import__, modules)))

Does it answer your question?

qid & accept id: (8671702, 8671854) query: Passing list of parameters to SQL in psycopg2 soup:

Python tuples are converted to sql lists in psycopg2:

\n
cur.mogrify("SELECT * FROM table WHERE column IN %s;", ((1,2,3),))\n
\n

would output

\n
'SELECT * FROM table WHERE column IN (1,2,3);'\n
\n

For Python new comers: It is unfortunately important to use a tuple, not a list here. Second example:

\n
cur.mogrify("SELECT * FROM table WHERE column IN %s;", \n    tuple([row[0] for for in rows]))\n
\n soup wrap:

Python tuples are converted to sql lists in psycopg2:

cur.mogrify("SELECT * FROM table WHERE column IN %s;", ((1,2,3),))

would output

'SELECT * FROM table WHERE column IN (1,2,3);'

For Python new comers: It is unfortunately important to use a tuple, not a list here. Second example:

cur.mogrify("SELECT * FROM table WHERE column IN %s;", 
    tuple([row[0] for for in rows]))
qid & accept id: (8682336, 8682379) query: How do I assign a variable to an object name? soup:

Instead of using a new variable for each customer you could store your object in a Python dictionary:

\n
d = dict()\n\nfor record in result:\n    objectname = 'Customer' + str(record[0])\n    customername = str(record[1])\n    d[objectname] = Customer(customername)\n\nprint d\n
\n

An example of objects stored in dictionaries

\n

I just could'nt help my self writting some code (more than I set out to do). It's like addictive. Anyway, I would'nt use objects for this kind of work. I probably would use a sqlite database (could be saved in memory if you want). But this piece of code show you (hopefully) how you can use dictionaries to save objects with customer data in:

\n
# Initiate customer dictionary\ncustomers = dict()\n\nclass Customer:\n    def __init__(self, fname, lname):\n        self.fname = fname\n        self.lname = lname\n        self.address = None\n        self.zip = None\n        self.state = None\n        self.city = None\n        self.phone = None\n\n    def add_address(self, address, zp, state, city):\n        self.address = address\n        self.zip = zp\n        self.state = state\n        self.city = city\n\n    def add_phone(self, number):\n        self.phone = number\n\n\n# Observe that these functions are not belonging to the class.    \ndef _print_layout(object):\n        print object.fname, object.lname\n        print '==========================='\n        print 'ADDRESS:'\n        print object.address\n        print object.zip\n        print object.state\n        print object.city\n        print '\nPHONE:'\n        print object.phone\n        print '\n'\n\ndef print_customer(customer_name):\n    _print_layout(customers[customer_name])\n\ndef print_customers():\n    for customer_name in customers.iterkeys():\n        _print_layout(customers[customer_name])\n\nif __name__ == '__main__':\n    # Add some customers to dictionary:\n    customers['Steve'] = Customer('Steve', 'Jobs')\n    customers['Niclas'] = Customer('Niclas', 'Nilsson')\n    # Add some more data\n    customers['Niclas'].add_address('Some road', '12312', 'WeDon\'tHaveStates', 'Hultsfred')\n    customers['Steve'].add_phone('123-543 234')\n\n    # Search one customer and print him\n    print 'Here are one customer searched:'\n    print 'ooooooooooooooooooooooooooooooo'\n    print_customer('Niclas')\n\n    # Print all the customers nicely\n    print '\n\nHere are all customers'\n    print 'oooooooooooooooooooooo'\n    print_customers()\n
\n soup wrap:

Instead of using a new variable for each customer you could store your object in a Python dictionary:

d = dict()

for record in result:
    objectname = 'Customer' + str(record[0])
    customername = str(record[1])
    d[objectname] = Customer(customername)

print d

An example of objects stored in dictionaries

I just could'nt help my self writting some code (more than I set out to do). It's like addictive. Anyway, I would'nt use objects for this kind of work. I probably would use a sqlite database (could be saved in memory if you want). But this piece of code show you (hopefully) how you can use dictionaries to save objects with customer data in:

# Initiate customer dictionary
customers = dict()

class Customer:
    def __init__(self, fname, lname):
        self.fname = fname
        self.lname = lname
        self.address = None
        self.zip = None
        self.state = None
        self.city = None
        self.phone = None

    def add_address(self, address, zp, state, city):
        self.address = address
        self.zip = zp
        self.state = state
        self.city = city

    def add_phone(self, number):
        self.phone = number


# Observe that these functions are not belonging to the class.    
def _print_layout(object):
        print object.fname, object.lname
        print '==========================='
        print 'ADDRESS:'
        print object.address
        print object.zip
        print object.state
        print object.city
        print '\nPHONE:'
        print object.phone
        print '\n'

def print_customer(customer_name):
    _print_layout(customers[customer_name])

def print_customers():
    for customer_name in customers.iterkeys():
        _print_layout(customers[customer_name])

if __name__ == '__main__':
    # Add some customers to dictionary:
    customers['Steve'] = Customer('Steve', 'Jobs')
    customers['Niclas'] = Customer('Niclas', 'Nilsson')
    # Add some more data
    customers['Niclas'].add_address('Some road', '12312', 'WeDon\'tHaveStates', 'Hultsfred')
    customers['Steve'].add_phone('123-543 234')

    # Search one customer and print him
    print 'Here are one customer searched:'
    print 'ooooooooooooooooooooooooooooooo'
    print_customer('Niclas')

    # Print all the customers nicely
    print '\n\nHere are all customers'
    print 'oooooooooooooooooooooo'
    print_customers()
qid & accept id: (8685308, 8687720) query: Allocate items according to an approximate ratio in Python soup:

Rather than try to get the fractions right, I'd just allocate the goals one at a time in the appropriate ratio. Here the 'allocate_goals' generator assigns a goal to each of the low-ratio goals, then to each of the high-ratio goals (repeating 3 times). Then it repeats. The caller, in allocate cuts off this infinite generator at the required number (the number of players) using itertools.islice.

\n
import collections\nimport itertools\nimport string\n\ndef allocate_goals(prop_low, prop_high):\n    prop_high3 = prop_high * 3\n    while True:\n        for g in prop_low:\n            yield g\n        for g in prop_high3:\n            yield g\n\ndef allocate(goals, players):\n    letters = string.ascii_uppercase[:goals]\n    high_count = goals // 2\n    prop_high, prop_low = letters[:high_count], letters[high_count:]\n    g = allocate_goals(prop_low, prop_high)\n    return collections.Counter(itertools.islice(g, players))\n\nfor goals in xrange(2, 9):\n    print goals, sorted(allocate(goals, 8).items())\n
\n

It produces this answer:

\n
2 [('A', 6), ('B', 2)]\n3 [('A', 4), ('B', 2), ('C', 2)]\n4 [('A', 3), ('B', 3), ('C', 1), ('D', 1)]\n5 [('A', 3), ('B', 2), ('C', 1), ('D', 1), ('E', 1)]\n6 [('A', 2), ('B', 2), ('C', 1), ('D', 1), ('E', 1), ('F', 1)]\n7 [('A', 2), ('B', 1), ('C', 1), ('D', 1), ('E', 1), ('F', 1), ('G', 1)]\n8 [('A', 1), ('B', 1), ('C', 1), ('D', 1), ('E', 1), ('F', 1), ('G', 1), ('H', 1)]\n
\n

The great thing about this approach (apart from, I think, that it's easy to understand) is that it's quick to turn it into a randomized version.

\n

Just replace allocate_goals with this:

\n
def allocate_goals(prop_low, prop_high):\n    all_goals = prop_low + prop_high * 3\n    while True:\n        yield random.choice(all_goals)\n
\n soup wrap:

Rather than try to get the fractions right, I'd just allocate the goals one at a time in the appropriate ratio. Here the 'allocate_goals' generator assigns a goal to each of the low-ratio goals, then to each of the high-ratio goals (repeating 3 times). Then it repeats. The caller, in allocate cuts off this infinite generator at the required number (the number of players) using itertools.islice.

import collections
import itertools
import string

def allocate_goals(prop_low, prop_high):
    prop_high3 = prop_high * 3
    while True:
        for g in prop_low:
            yield g
        for g in prop_high3:
            yield g

def allocate(goals, players):
    letters = string.ascii_uppercase[:goals]
    high_count = goals // 2
    prop_high, prop_low = letters[:high_count], letters[high_count:]
    g = allocate_goals(prop_low, prop_high)
    return collections.Counter(itertools.islice(g, players))

for goals in xrange(2, 9):
    print goals, sorted(allocate(goals, 8).items())

It produces this answer:

2 [('A', 6), ('B', 2)]
3 [('A', 4), ('B', 2), ('C', 2)]
4 [('A', 3), ('B', 3), ('C', 1), ('D', 1)]
5 [('A', 3), ('B', 2), ('C', 1), ('D', 1), ('E', 1)]
6 [('A', 2), ('B', 2), ('C', 1), ('D', 1), ('E', 1), ('F', 1)]
7 [('A', 2), ('B', 1), ('C', 1), ('D', 1), ('E', 1), ('F', 1), ('G', 1)]
8 [('A', 1), ('B', 1), ('C', 1), ('D', 1), ('E', 1), ('F', 1), ('G', 1), ('H', 1)]

The great thing about this approach (apart from, I think, that it's easy to understand) is that it's quick to turn it into a randomized version.

Just replace allocate_goals with this:

def allocate_goals(prop_low, prop_high):
    all_goals = prop_low + prop_high * 3
    while True:
        yield random.choice(all_goals)
qid & accept id: (8702772, 8702854) query: Django get list of models in application soup:

This is the best way to accomplish what you want to do:

\n
from django.db.models import get_app, get_models\n\napp = get_app('my_application_name')\nfor model in get_models(app):\n    # do something with the model\n
\n

In this example, model is the actual model, so you can do plenty of things with it:

\n
for model in get_models(app):\n    new_object = model() # Create an instance of that model\n    model.objects.filter(...) # Query the objects of that model\n    model._meta.db_table # Get the name of the model in the database\n    model._meta.verbose_name # Get a verbose name of the model\n    # ...\n
\n

UPDATE

\n

for newer versions of Django check Sjoerd answer below

\n soup wrap:

This is the best way to accomplish what you want to do:

from django.db.models import get_app, get_models

app = get_app('my_application_name')
for model in get_models(app):
    # do something with the model

In this example, model is the actual model, so you can do plenty of things with it:

for model in get_models(app):
    new_object = model() # Create an instance of that model
    model.objects.filter(...) # Query the objects of that model
    model._meta.db_table # Get the name of the model in the database
    model._meta.verbose_name # Get a verbose name of the model
    # ...

UPDATE

for newer versions of Django check Sjoerd answer below

qid & accept id: (8714744, 8715756) query: Loop over time and over list elements with python -- one-dimensional lake temperature model simulation soup:

Let me first try to rephrase your problem statement

\n

Listn = [x+f(x):x ∈ Listn-1 , f ∈ fnlist]

\n

where

\n

fnlist=[f,g,h]

\n

so in python terms that boils down to

\n
funclist = [f,g,h]\nsomelist+=[[x+f(x) for x,f in zip(somelist[-1],funclist)]]\n
\n

on the other hand, if the same function is applied to all the values of the list like

\n

Listn = [x+f(x):x ∈ Listn-1]

\n
somelist+=[[x+f(x) for x in somelist[-1]]]\n
\n

finally if a singleton function is dependent on time slice, at a certain increment timedelta

\n

Listn = [x+f(t):x ∈ Listn-1 , t ∈ T]

\n

where\n T = [t,t+∆t,t+2∆t,......]

\n

then first you need to generate the time sequence and you can use itertools.count for that purpose like

\n
itertools.count(someStartTime,delta)\n
\n

then

\n
somelist+=[[x+f(t) for x,t in zip(somelist[-1],itertools.count(someStartTime,delta))]]\n
\n

Note: f,g,h are python functions which can be defined as

\n
def f(n):\n    ........\n    return .....\n
\n soup wrap:

Let me first try to rephrase your problem statement

Listn = [x+f(x):x ∈ Listn-1 , f ∈ fnlist]

where

fnlist=[f,g,h]

so in python terms that boils down to

funclist = [f,g,h]
somelist+=[[x+f(x) for x,f in zip(somelist[-1],funclist)]]

on the other hand, if the same function is applied to all the values of the list like

Listn = [x+f(x):x ∈ Listn-1]

somelist+=[[x+f(x) for x in somelist[-1]]]

finally if a singleton function is dependent on time slice, at a certain increment timedelta

Listn = [x+f(t):x ∈ Listn-1 , t ∈ T]

where T = [t,t+∆t,t+2∆t,......]

then first you need to generate the time sequence and you can use itertools.count for that purpose like

itertools.count(someStartTime,delta)

then

somelist+=[[x+f(t) for x,t in zip(somelist[-1],itertools.count(someStartTime,delta))]]

Note: f,g,h are python functions which can be defined as

def f(n):
    ........
    return .....
qid & accept id: (8780912, 8783634) query: How can I perform a least-squares fitting over multiple data sets fast? soup:

The easiest thing to do is to linearlize the problem. You're using a non-linear, iterative method which will be slower than a linear least squares solution.

\n

Basically, you have:

\n

y = height * exp(-(x - mu)^2 / (2 * sigma^2))

\n

To make this a linear equation, take the (natural) log of both sides:

\n
ln(y) = ln(height) - (x - mu)^2 / (2 * sigma^2)\n
\n

This then simplifies to the polynomial:

\n
ln(y) = -x^2 / (2 * sigma^2) + x * mu / sigma^2 - mu^2 / sigma^2 + ln(height)\n
\n

We can recast this in a bit simpler form:

\n
ln(y) = A * x^2 + B * x + C\n
\n

where:

\n
A = 1 / (2 * sigma^2)\nB = mu / (2 * sigma^2)\nC = mu^2 / sigma^2 + ln(height)\n
\n

However, there's one catch. This will become unstable in the presence of noise in the "tails" of the distribution.

\n

Therefore, we need to use only the data near the "peaks" of the distribution. It's easy enough to only include data that falls above some threshold in the fitting. In this example, I'm only including data that's greater than 20% of the maximum observed value for a given gaussian curve that we're fitting.

\n

Once we've done this, though, it's rather fast. Solving for 262144 different gaussian curves takes only ~1 minute (Be sure to removing the plotting portion of the code if you run it on something that large...). It's also quite easy to parallelize, if you want...

\n
import numpy as np\nimport matplotlib.pyplot as plt\nimport matplotlib as mpl\nimport itertools\n\ndef main():\n    x, data = generate_data(256, 6)\n    model = [invert(x, y) for y in data.T]\n    sigma, mu, height = [np.array(item) for item in zip(*model)]\n    prediction = gaussian(x, sigma, mu, height)\n\n    plot(x, data, linestyle='none', marker='o')\n    plot(x, prediction, linestyle='-')\n    plt.show()\n\ndef invert(x, y):\n    # Use only data within the "peak" (20% of the max value...)\n    key_points = y > (0.2 * y.max())\n    x = x[key_points]\n    y = y[key_points]\n\n    # Fit a 2nd order polynomial to the log of the observed values\n    A, B, C = np.polyfit(x, np.log(y), 2)\n\n    # Solve for the desired parameters...\n    sigma = np.sqrt(-1 / (2.0 * A))\n    mu = B * sigma**2\n    height = np.exp(C + 0.5 * mu**2 / sigma**2)\n    return sigma, mu, height\n\ndef generate_data(numpoints, numcurves):\n    np.random.seed(3)\n    x = np.linspace(0, 500, numpoints)\n\n    height = 100 * np.random.random(numcurves)\n    mu = 200 * np.random.random(numcurves) + 200\n    sigma = 100 * np.random.random(numcurves) + 0.1\n    data = gaussian(x, sigma, mu, height)\n\n    noise = 5 * (np.random.random(data.shape) - 0.5)\n    return x, data + noise\n\ndef gaussian(x, sigma, mu, height):\n    data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)\n    return height * np.exp(data)\n\ndef plot(x, ydata, ax=None, **kwargs):\n    if ax is None:\n        ax = plt.gca()\n    colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])\n    for y, color in zip(ydata.T, colorcycle):\n        ax.plot(x, y, color=color, **kwargs)\n\nmain()\n
\n

enter image description here

\n

The only thing we'd need to change for a parallel version is the main function. (We also need a dummy function because multiprocessing.Pool.imap can't supply additional arguments to its function...) It would look something like this:

\n
def parallel_main():\n    import multiprocessing\n    p = multiprocessing.Pool()\n    x, data = generate_data(256, 262144)\n    args = itertools.izip(itertools.repeat(x), data.T)\n    model = p.imap(parallel_func, args, chunksize=500)\n    sigma, mu, height = [np.array(item) for item in zip(*model)]\n    prediction = gaussian(x, sigma, mu, height)\n\ndef parallel_func(args):\n    return invert(*args)\n
\n

Edit: In cases where the simple polynomial fitting isn't working well, try weighting the problem by the y-values, as mentioned in the link/paper that @tslisten shared (and Stefan van der Walt implemented, though my implementation is a bit different).

\n
import numpy as np\nimport matplotlib.pyplot as plt\nimport matplotlib as mpl\nimport itertools\n\ndef main():\n    def run(x, data, func, threshold=0):\n        model = [func(x, y, threshold=threshold) for y in data.T]\n        sigma, mu, height = [np.array(item) for item in zip(*model)]\n        prediction = gaussian(x, sigma, mu, height)\n\n        plt.figure()\n        plot(x, data, linestyle='none', marker='o', markersize=4)\n        plot(x, prediction, linestyle='-', lw=2)\n\n    x, data = generate_data(256, 6, noise=100)\n    threshold = 50\n\n    run(x, data, weighted_invert, threshold=threshold)\n    plt.title('Weighted by Y-Value')\n\n    run(x, data, invert, threshold=threshold)\n    plt.title('Un-weighted Linear Inverse'\n\n    plt.show()\n\ndef invert(x, y, threshold=0):\n    mask = y > threshold\n    x, y = x[mask], y[mask]\n\n    # Fit a 2nd order polynomial to the log of the observed values\n    A, B, C = np.polyfit(x, np.log(y), 2)\n\n    # Solve for the desired parameters...\n    sigma, mu, height = poly_to_gauss(A,B,C)\n    return sigma, mu, height\n\ndef poly_to_gauss(A,B,C):\n    sigma = np.sqrt(-1 / (2.0 * A))\n    mu = B * sigma**2\n    height = np.exp(C + 0.5 * mu**2 / sigma**2)\n    return sigma, mu, height\n\ndef weighted_invert(x, y, weights=None, threshold=0):\n    mask = y > threshold\n    x,y = x[mask], y[mask]\n    if weights is None:\n        weights = y\n    else:\n        weights = weights[mask]\n\n    d = np.log(y)\n    G = np.ones((x.size, 3), dtype=np.float)\n    G[:,0] = x**2\n    G[:,1] = x\n\n    model,_,_,_ = np.linalg.lstsq((G.T*weights**2).T, d*weights**2)\n    return poly_to_gauss(*model)\n\ndef generate_data(numpoints, numcurves, noise=None):\n    np.random.seed(3)\n    x = np.linspace(0, 500, numpoints)\n\n    height = 7000 * np.random.random(numcurves)\n    mu = 1100 * np.random.random(numcurves) \n    sigma = 100 * np.random.random(numcurves) + 0.1\n    data = gaussian(x, sigma, mu, height)\n\n    if noise is None:\n        noise = 0.1 * height.max()\n    noise = noise * (np.random.random(data.shape) - 0.5)\n    return x, data + noise\n\ndef gaussian(x, sigma, mu, height):\n    data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)\n    return height * np.exp(data)\n\ndef plot(x, ydata, ax=None, **kwargs):\n    if ax is None:\n        ax = plt.gca()\n    colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])\n    for y, color in zip(ydata.T, colorcycle):\n        #kwargs['color'] = kwargs.get('color', color)\n        ax.plot(x, y, color=color, **kwargs)\n\nmain()\n
\n

enter image description here\nenter image description here

\n

If that's still giving you trouble, then try iteratively-reweighting the least-squares problem (The final "best" reccomended method in the link @tslisten mentioned). Keep in mind that this will be considerably slower, however.

\n
def iterative_weighted_invert(x, y, threshold=None, numiter=5):\n    last_y = y\n    for _ in range(numiter):\n        model = weighted_invert(x, y, weights=last_y, threshold=threshold)\n        last_y = gaussian(x, *model)\n    return model\n
\n soup wrap:

The easiest thing to do is to linearlize the problem. You're using a non-linear, iterative method which will be slower than a linear least squares solution.

Basically, you have:

y = height * exp(-(x - mu)^2 / (2 * sigma^2))

To make this a linear equation, take the (natural) log of both sides:

ln(y) = ln(height) - (x - mu)^2 / (2 * sigma^2)

This then simplifies to the polynomial:

ln(y) = -x^2 / (2 * sigma^2) + x * mu / sigma^2 - mu^2 / sigma^2 + ln(height)

We can recast this in a bit simpler form:

ln(y) = A * x^2 + B * x + C

where:

A = 1 / (2 * sigma^2)
B = mu / (2 * sigma^2)
C = mu^2 / sigma^2 + ln(height)

However, there's one catch. This will become unstable in the presence of noise in the "tails" of the distribution.

Therefore, we need to use only the data near the "peaks" of the distribution. It's easy enough to only include data that falls above some threshold in the fitting. In this example, I'm only including data that's greater than 20% of the maximum observed value for a given gaussian curve that we're fitting.

Once we've done this, though, it's rather fast. Solving for 262144 different gaussian curves takes only ~1 minute (Be sure to removing the plotting portion of the code if you run it on something that large...). It's also quite easy to parallelize, if you want...

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import itertools

def main():
    x, data = generate_data(256, 6)
    model = [invert(x, y) for y in data.T]
    sigma, mu, height = [np.array(item) for item in zip(*model)]
    prediction = gaussian(x, sigma, mu, height)

    plot(x, data, linestyle='none', marker='o')
    plot(x, prediction, linestyle='-')
    plt.show()

def invert(x, y):
    # Use only data within the "peak" (20% of the max value...)
    key_points = y > (0.2 * y.max())
    x = x[key_points]
    y = y[key_points]

    # Fit a 2nd order polynomial to the log of the observed values
    A, B, C = np.polyfit(x, np.log(y), 2)

    # Solve for the desired parameters...
    sigma = np.sqrt(-1 / (2.0 * A))
    mu = B * sigma**2
    height = np.exp(C + 0.5 * mu**2 / sigma**2)
    return sigma, mu, height

def generate_data(numpoints, numcurves):
    np.random.seed(3)
    x = np.linspace(0, 500, numpoints)

    height = 100 * np.random.random(numcurves)
    mu = 200 * np.random.random(numcurves) + 200
    sigma = 100 * np.random.random(numcurves) + 0.1
    data = gaussian(x, sigma, mu, height)

    noise = 5 * (np.random.random(data.shape) - 0.5)
    return x, data + noise

def gaussian(x, sigma, mu, height):
    data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)
    return height * np.exp(data)

def plot(x, ydata, ax=None, **kwargs):
    if ax is None:
        ax = plt.gca()
    colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])
    for y, color in zip(ydata.T, colorcycle):
        ax.plot(x, y, color=color, **kwargs)

main()

enter image description here

The only thing we'd need to change for a parallel version is the main function. (We also need a dummy function because multiprocessing.Pool.imap can't supply additional arguments to its function...) It would look something like this:

def parallel_main():
    import multiprocessing
    p = multiprocessing.Pool()
    x, data = generate_data(256, 262144)
    args = itertools.izip(itertools.repeat(x), data.T)
    model = p.imap(parallel_func, args, chunksize=500)
    sigma, mu, height = [np.array(item) for item in zip(*model)]
    prediction = gaussian(x, sigma, mu, height)

def parallel_func(args):
    return invert(*args)

Edit: In cases where the simple polynomial fitting isn't working well, try weighting the problem by the y-values, as mentioned in the link/paper that @tslisten shared (and Stefan van der Walt implemented, though my implementation is a bit different).

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import itertools

def main():
    def run(x, data, func, threshold=0):
        model = [func(x, y, threshold=threshold) for y in data.T]
        sigma, mu, height = [np.array(item) for item in zip(*model)]
        prediction = gaussian(x, sigma, mu, height)

        plt.figure()
        plot(x, data, linestyle='none', marker='o', markersize=4)
        plot(x, prediction, linestyle='-', lw=2)

    x, data = generate_data(256, 6, noise=100)
    threshold = 50

    run(x, data, weighted_invert, threshold=threshold)
    plt.title('Weighted by Y-Value')

    run(x, data, invert, threshold=threshold)
    plt.title('Un-weighted Linear Inverse'

    plt.show()

def invert(x, y, threshold=0):
    mask = y > threshold
    x, y = x[mask], y[mask]

    # Fit a 2nd order polynomial to the log of the observed values
    A, B, C = np.polyfit(x, np.log(y), 2)

    # Solve for the desired parameters...
    sigma, mu, height = poly_to_gauss(A,B,C)
    return sigma, mu, height

def poly_to_gauss(A,B,C):
    sigma = np.sqrt(-1 / (2.0 * A))
    mu = B * sigma**2
    height = np.exp(C + 0.5 * mu**2 / sigma**2)
    return sigma, mu, height

def weighted_invert(x, y, weights=None, threshold=0):
    mask = y > threshold
    x,y = x[mask], y[mask]
    if weights is None:
        weights = y
    else:
        weights = weights[mask]

    d = np.log(y)
    G = np.ones((x.size, 3), dtype=np.float)
    G[:,0] = x**2
    G[:,1] = x

    model,_,_,_ = np.linalg.lstsq((G.T*weights**2).T, d*weights**2)
    return poly_to_gauss(*model)

def generate_data(numpoints, numcurves, noise=None):
    np.random.seed(3)
    x = np.linspace(0, 500, numpoints)

    height = 7000 * np.random.random(numcurves)
    mu = 1100 * np.random.random(numcurves) 
    sigma = 100 * np.random.random(numcurves) + 0.1
    data = gaussian(x, sigma, mu, height)

    if noise is None:
        noise = 0.1 * height.max()
    noise = noise * (np.random.random(data.shape) - 0.5)
    return x, data + noise

def gaussian(x, sigma, mu, height):
    data = -np.subtract.outer(x, mu)**2 / (2 * sigma**2)
    return height * np.exp(data)

def plot(x, ydata, ax=None, **kwargs):
    if ax is None:
        ax = plt.gca()
    colorcycle = itertools.cycle(mpl.rcParams['axes.color_cycle'])
    for y, color in zip(ydata.T, colorcycle):
        #kwargs['color'] = kwargs.get('color', color)
        ax.plot(x, y, color=color, **kwargs)

main()

enter image description here enter image description here

If that's still giving you trouble, then try iteratively-reweighting the least-squares problem (The final "best" reccomended method in the link @tslisten mentioned). Keep in mind that this will be considerably slower, however.

def iterative_weighted_invert(x, y, threshold=None, numiter=5):
    last_y = y
    for _ in range(numiter):
        model = weighted_invert(x, y, weights=last_y, threshold=threshold)
        last_y = gaussian(x, *model)
    return model
qid & accept id: (8892307, 8902261) query: Filtering a model in Django based on a condition upon the latest child record soup:

My approach is this: do 2 lists, first one with (id_store, last_success_date) tuples and second one with (id_store, last_date) tuples:

\n
l_succ = stores.objects.filter( \n                       order__success = True \n                  ).annotate(\n                       last_success=Max('order__date')\n                  ).value_list (\n                       'id', 'last_success'\n                  )\n#l_succ = [ (1, '1/1/2011'), (2, '31/12/2010'), ... ] <-l_succ result\n\nl_last = stores.objects.annotate(\n                       last_date=Max('order__date')\n                  ).value_list (\n                       'id', 'last_date'\n                  )\n#l_last = [ (1, '1/1/2011'), (2, '3/1/2011'), ... ]   <-l_last result\n
\n

Then take store ids for stores that last data and last success date are equals, and you have the query:

\n
store_success_ids =  [ k[0] for k in l_succ if k in l_last ]\n#store_success_ids = [1, 5, ... ]          <-store_success_ids result\n#Cast l_last to dictionary to do lookups if you have a lot of stores.\n\nresult = Store.objects.filter( pk__in = store_success_ids)        \n
\n

It seems an elegant solution, only four lines of code for a complex query (but with a simple requeriment). Disclaimer, it is not tested.

\n soup wrap:

My approach is this: do 2 lists, first one with (id_store, last_success_date) tuples and second one with (id_store, last_date) tuples:

l_succ = stores.objects.filter( 
                       order__success = True 
                  ).annotate(
                       last_success=Max('order__date')
                  ).value_list (
                       'id', 'last_success'
                  )
#l_succ = [ (1, '1/1/2011'), (2, '31/12/2010'), ... ] <-l_succ result

l_last = stores.objects.annotate(
                       last_date=Max('order__date')
                  ).value_list (
                       'id', 'last_date'
                  )
#l_last = [ (1, '1/1/2011'), (2, '3/1/2011'), ... ]   <-l_last result

Then take store ids for stores that last data and last success date are equals, and you have the query:

store_success_ids =  [ k[0] for k in l_succ if k in l_last ]
#store_success_ids = [1, 5, ... ]          <-store_success_ids result
#Cast l_last to dictionary to do lookups if you have a lot of stores.

result = Store.objects.filter( pk__in = store_success_ids)        

It seems an elegant solution, only four lines of code for a complex query (but with a simple requeriment). Disclaimer, it is not tested.

qid & accept id: (8916209, 8916343) query: How to build a nested list from a flat one in Python? soup:
def nested(flat, level=0):\n    for k, it in itertools.groupby(flat, lambda x: x.split("-")[level]):\n        yield next(it)\n        remainder = list(nested(it, level + 1))\n        if remainder:\n            yield remainder\n
\n

Example:

\n
>>> list(nested(flat, 0))\n['1', ['1-1', ['1-1-1'], '1-2'], '2', ['2-1', '2-2'], '3']\n
\n soup wrap:
def nested(flat, level=0):
    for k, it in itertools.groupby(flat, lambda x: x.split("-")[level]):
        yield next(it)
        remainder = list(nested(it, level + 1))
        if remainder:
            yield remainder

Example:

>>> list(nested(flat, 0))
['1', ['1-1', ['1-1-1'], '1-2'], '2', ['2-1', '2-2'], '3']
qid & accept id: (8937566, 8937925) query: Python: multidimensional array masking soup:

You can use .ravel() on the array (A) before indexing it, and then .reshape() after.

\n

Alternatively, since you know A.shape, you can use np.unravel_index on the other array (B) before indexing.

\n

Example 1:

\n
>>> import numpy as np\n>>> A = np.ones((5,5), dtype=int)\n>>> B = [1, 3, 7, 23]\n>>> A\narray([[1, 1, 1, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 1, 1]])\n>>> A_ = A.ravel()\n>>> A_[B] = 0\n>>> A_.reshape(A.shape)\narray([[1, 0, 1, 0, 1],\n       [1, 1, 0, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 0, 1]])\n
\n

Example 2:

\n
>>> b_row, b_col = np.vstack([np.unravel_index(b, A.shape) for b in B]).T\n>>> A[b_row, b_col] = 0\n>>> A\narray([[1, 0, 1, 0, 1],\n       [1, 1, 0, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 0, 1]])\n
\n
\n

Discovered later: you can use numpy.put

\n
>>> import numpy as np\n>>> A = np.ones((5,5), dtype=int)\n>>> B = [1, 3, 7, 23]\n>>> A.put(B, [0]*len(B))\n>>> A\narray([[1, 0, 1, 0, 1],\n       [1, 1, 0, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 1, 1],\n       [1, 1, 1, 0, 1]])\n
\n soup wrap:

You can use .ravel() on the array (A) before indexing it, and then .reshape() after.

Alternatively, since you know A.shape, you can use np.unravel_index on the other array (B) before indexing.

Example 1:

>>> import numpy as np
>>> A = np.ones((5,5), dtype=int)
>>> B = [1, 3, 7, 23]
>>> A
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])
>>> A_ = A.ravel()
>>> A_[B] = 0
>>> A_.reshape(A.shape)
array([[1, 0, 1, 0, 1],
       [1, 1, 0, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 0, 1]])

Example 2:

>>> b_row, b_col = np.vstack([np.unravel_index(b, A.shape) for b in B]).T
>>> A[b_row, b_col] = 0
>>> A
array([[1, 0, 1, 0, 1],
       [1, 1, 0, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 0, 1]])

Discovered later: you can use numpy.put

>>> import numpy as np
>>> A = np.ones((5,5), dtype=int)
>>> B = [1, 3, 7, 23]
>>> A.put(B, [0]*len(B))
>>> A
array([[1, 0, 1, 0, 1],
       [1, 1, 0, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 0, 1]])
qid & accept id: (8948166, 8950594) query: How to pivot/cross-tab data in Python 3? soup:

Here is some simple code. Providing row/column/grand totals is left as an exercise for the reader.

\n
class CrossTab(object):\n\n    def __init__(\n        self,\n        missing=0, # what to return for an empty cell.\n                   # Alternatives: '', 0.0, None, 'NULL'\n        ):\n        self.missing = missing\n        self.col_key_set = set()\n        self.cell_dict = {}\n        self.headings_OK = False\n\n    def add_item(self, row_key, col_key, value):\n        self.col_key_set.add(col_key)\n        try:\n            self.cell_dict[row_key][col_key] += value\n        except KeyError:\n            try:\n                self.cell_dict[row_key][col_key] = value\n            except KeyError:\n                self.cell_dict[row_key] = {col_key: value}\n\n    def _process_headings(self):\n        if self.headings_OK:\n            return\n        self.row_headings = list(sorted(self.cell_dict.keys()))\n        self.col_headings = list(sorted(self.col_key_set))\n        self.headings_OK = True\n\n    def get_col_headings(self):\n        self._process_headings()\n        return self.col_headings\n\n    def generate_row_info(self):\n        self._process_headings()\n        for row_key in self.row_headings:\n            row_dict = self.cell_dict[row_key]\n            row_vals = [\n                row_dict.get(col_key, self.missing)\n                for col_key in self.col_headings\n                ]\n            yield row_key, row_vals\n\nif __name__ == "__main__":\n\n    data = [["apples", 2, "New York"], \n      ["peaches", 6, "New York"],\n      ["apples", 6, "New York"],\n      ["peaches", 1, "Vermont"]]  \n\n    ctab = CrossTab(missing='uh-oh')\n    for s in data:\n        ctab.add_item(row_key=s[2], col_key=s[0], value=s[1])\n    print()\n    print('Column headings:', ctab.get_col_headings())\n    for row_heading, row_values in ctab.generate_row_info():\n        print(repr(row_heading), row_values)\n
\n

Output:

\n
Column headings: ['apples', 'peaches']\n'New York' [8, 6]\n'Vermont' ['uh-oh', 1]\n
\n

See also this answer.

\n

And this one, which I'd forgotten about.

\n soup wrap:

Here is some simple code. Providing row/column/grand totals is left as an exercise for the reader.

class CrossTab(object):

    def __init__(
        self,
        missing=0, # what to return for an empty cell.
                   # Alternatives: '', 0.0, None, 'NULL'
        ):
        self.missing = missing
        self.col_key_set = set()
        self.cell_dict = {}
        self.headings_OK = False

    def add_item(self, row_key, col_key, value):
        self.col_key_set.add(col_key)
        try:
            self.cell_dict[row_key][col_key] += value
        except KeyError:
            try:
                self.cell_dict[row_key][col_key] = value
            except KeyError:
                self.cell_dict[row_key] = {col_key: value}

    def _process_headings(self):
        if self.headings_OK:
            return
        self.row_headings = list(sorted(self.cell_dict.keys()))
        self.col_headings = list(sorted(self.col_key_set))
        self.headings_OK = True

    def get_col_headings(self):
        self._process_headings()
        return self.col_headings

    def generate_row_info(self):
        self._process_headings()
        for row_key in self.row_headings:
            row_dict = self.cell_dict[row_key]
            row_vals = [
                row_dict.get(col_key, self.missing)
                for col_key in self.col_headings
                ]
            yield row_key, row_vals

if __name__ == "__main__":

    data = [["apples", 2, "New York"], 
      ["peaches", 6, "New York"],
      ["apples", 6, "New York"],
      ["peaches", 1, "Vermont"]]  

    ctab = CrossTab(missing='uh-oh')
    for s in data:
        ctab.add_item(row_key=s[2], col_key=s[0], value=s[1])
    print()
    print('Column headings:', ctab.get_col_headings())
    for row_heading, row_values in ctab.generate_row_info():
        print(repr(row_heading), row_values)

Output:

Column headings: ['apples', 'peaches']
'New York' [8, 6]
'Vermont' ['uh-oh', 1]

See also this answer.

And this one, which I'd forgotten about.

qid & accept id: (8948773, 8949032) query: What is the proper way to perform a contextual search against NoSQL key-value pairs? soup:

I can give you a Mongo shell example.

\n

From the basic tutorial on MongoDB site:

\n
j = { name : "mongo" };\nt = { x : 3 };\ndb.things.save(j);\ndb.things.save(t);\n
\n

So you now have a collection called things and have stored two documents in it.

\n

Suppose you now want to do the equivalent of

\n
SELECT * FROM things WHERE name like 'mon%'\n
\n

In SQL, this would have returned you the "mongo" record.

\n

In Mongo Shell, you can do this:

\n
db.things.find({name:{$regex:'mon'}}).forEach(printjson);\n
\n

This returns the "mongo" document.

\n

Hope this helps.

\n

Atish

\n soup wrap:

I can give you a Mongo shell example.

From the basic tutorial on MongoDB site:

j = { name : "mongo" };
t = { x : 3 };
db.things.save(j);
db.things.save(t);

So you now have a collection called things and have stored two documents in it.

Suppose you now want to do the equivalent of

SELECT * FROM things WHERE name like 'mon%'

In SQL, this would have returned you the "mongo" record.

In Mongo Shell, you can do this:

db.things.find({name:{$regex:'mon'}}).forEach(printjson);

This returns the "mongo" document.

Hope this helps.

Atish

qid & accept id: (8950809, 8950825) query: Read each word and rest of line in Python? soup:
line = '20 30 i love you'.split()\na = int(line[0])\nb = int(line[1])\nword_list = line[2:]\n
\n

Make a string out of the word_list if you want it as a single string instead of a list of words.

\n
text = ''.join(word_list)\n
\n soup wrap:
line = '20 30 i love you'.split()
a = int(line[0])
b = int(line[1])
word_list = line[2:]

Make a string out of the word_list if you want it as a single string instead of a list of words.

text = ''.join(word_list)
qid & accept id: (9013354, 9013410) query: Insert data from one sorted array into another sorted array soup:

If the numbers in the first column of a are in sorted order, then you could use

\n
a[a[:,0].searchsorted(b[:,0]),1] = b[:,1]\n
\n

For example:

\n
import numpy as np\n\na = np.array([(1,0,0,0,0),\n              (2,0,0,0,0),\n              (3,0,0,0,0),\n              (4,0,0,0,0),\n              (5,0,0,0,0),\n              (6,0,0,0,0),\n              (7,0,0,0,0),\n              (8,0,0,0,0),\n              ])\n\nb = np.array([(3, 1),\n              (5, 18),\n              (7, 2)])\n\na[a[:,0].searchsorted(b[:,0]),1] = b[:,1]\nprint(a)\n
\n

yields

\n
[[ 1  0  0  0  0]\n [ 2  0  0  0  0]\n [ 3  1  0  0  0]\n [ 4  0  0  0  0]\n [ 5 18  0  0  0]\n [ 6  0  0  0  0]\n [ 7  2  0  0  0]\n [ 8  0  0  0  0]]\n
\n

(I changed your example a bit to show that the values in b's first column do not have to be contiguous.)

\n
\n

If a[:,0] is not in sorted order, then you could use np.argsort to workaround this:

\n
a = np.array( [(1,0,0,0,0),\n               (2,0,0,0,0),\n               (5,0,0,0,0),\n               (3,0,0,0,0),\n               (4,0,0,0,0),\n               (6,0,0,0,0),\n               (7,0,0,0,0),\n               (8,0,0,0,0),\n               ])\n\nb = np.array([(3, 1),\n              (5, 18),\n              (7, 2)])\n\nperm = np.argsort(a[:,0])\na[:,1][perm[a[:,0][perm].searchsorted(b[:,0])]] = b[:,1]\nprint(a)\n
\n

yields

\n
[[ 1  0  0  0  0]\n [ 2  0  0  0  0]\n [ 5 18  0  0  0]\n [ 3  1  0  0  0]\n [ 4  0  0  0  0]\n [ 6  0  0  0  0]\n [ 7  2  0  0  0]\n [ 8  0  0  0  0]]\n
\n soup wrap:

If the numbers in the first column of a are in sorted order, then you could use

a[a[:,0].searchsorted(b[:,0]),1] = b[:,1]

For example:

import numpy as np

a = np.array([(1,0,0,0,0),
              (2,0,0,0,0),
              (3,0,0,0,0),
              (4,0,0,0,0),
              (5,0,0,0,0),
              (6,0,0,0,0),
              (7,0,0,0,0),
              (8,0,0,0,0),
              ])

b = np.array([(3, 1),
              (5, 18),
              (7, 2)])

a[a[:,0].searchsorted(b[:,0]),1] = b[:,1]
print(a)

yields

[[ 1  0  0  0  0]
 [ 2  0  0  0  0]
 [ 3  1  0  0  0]
 [ 4  0  0  0  0]
 [ 5 18  0  0  0]
 [ 6  0  0  0  0]
 [ 7  2  0  0  0]
 [ 8  0  0  0  0]]

(I changed your example a bit to show that the values in b's first column do not have to be contiguous.)


If a[:,0] is not in sorted order, then you could use np.argsort to workaround this:

a = np.array( [(1,0,0,0,0),
               (2,0,0,0,0),
               (5,0,0,0,0),
               (3,0,0,0,0),
               (4,0,0,0,0),
               (6,0,0,0,0),
               (7,0,0,0,0),
               (8,0,0,0,0),
               ])

b = np.array([(3, 1),
              (5, 18),
              (7, 2)])

perm = np.argsort(a[:,0])
a[:,1][perm[a[:,0][perm].searchsorted(b[:,0])]] = b[:,1]
print(a)

yields

[[ 1  0  0  0  0]
 [ 2  0  0  0  0]
 [ 5 18  0  0  0]
 [ 3  1  0  0  0]
 [ 4  0  0  0  0]
 [ 6  0  0  0  0]
 [ 7  2  0  0  0]
 [ 8  0  0  0  0]]
qid & accept id: (9017260, 9018858) query: Remove rows from data: overlapping time intervals? soup:

Try the intervals package:

\n
library(intervals)\n\nf <- function(dd) with(dd, {\n    r <- reduce(Intervals(cbind(start, end)))\n    data.frame(username = username[1],\n         machine = machine[1],\n         start = structure(r[, 1], class = class(start)),\n         end = structure(r[, 2], class = class(end)))\n})\n\ndo.call("rbind", by(d, d[1:2], f))\n
\n

With the sample data this reduces the 15 rows to the following 13 rows (by combining rows 1 and 2 and rows 12 and 13 in the original data frame):

\n
   username          machine               start                 end\n1     user1 D5599.domain.com 2011-01-03 02:44:18 2011-01-03 03:09:16\n2     user1 D5599.domain.com 2011-01-03 07:07:36 2011-01-03 07:56:17\n3     user1 D5599.domain.com 2011-01-05 08:03:17 2011-01-05 08:23:15\n4     user1 D5599.domain.com 2011-02-14 07:33:39 2011-02-14 07:40:16\n5     user1 D5599.domain.com 2011-02-23 06:54:30 2011-02-23 06:58:23\n6     user1 D5599.domain.com 2011-03-21 04:10:18 2011-03-21 04:32:22\n7     user1 D5645.domain.com 2011-06-09 03:12:41 2011-06-09 03:58:59\n8     user1 D5682.domain.com 2011-01-03 05:03:45 2011-01-03 05:29:43\n9     USER2 D5682.domain.com 2011-01-12 07:26:05 2011-01-12 07:32:53\n10    USER2 D5682.domain.com 2011-01-17 08:06:19 2011-01-17 08:44:22\n11    USER2 D5682.domain.com 2011-01-18 08:07:30 2011-01-18 08:42:43\n12    USER2 D5682.domain.com 2011-01-25 08:20:55 2011-01-25 08:24:38\n13    USER2 D5682.domain.com 2011-02-14 07:59:23 2011-02-14 08:14:47\n
\n soup wrap:

Try the intervals package:

library(intervals)

f <- function(dd) with(dd, {
    r <- reduce(Intervals(cbind(start, end)))
    data.frame(username = username[1],
         machine = machine[1],
         start = structure(r[, 1], class = class(start)),
         end = structure(r[, 2], class = class(end)))
})

do.call("rbind", by(d, d[1:2], f))

With the sample data this reduces the 15 rows to the following 13 rows (by combining rows 1 and 2 and rows 12 and 13 in the original data frame):

   username          machine               start                 end
1     user1 D5599.domain.com 2011-01-03 02:44:18 2011-01-03 03:09:16
2     user1 D5599.domain.com 2011-01-03 07:07:36 2011-01-03 07:56:17
3     user1 D5599.domain.com 2011-01-05 08:03:17 2011-01-05 08:23:15
4     user1 D5599.domain.com 2011-02-14 07:33:39 2011-02-14 07:40:16
5     user1 D5599.domain.com 2011-02-23 06:54:30 2011-02-23 06:58:23
6     user1 D5599.domain.com 2011-03-21 04:10:18 2011-03-21 04:32:22
7     user1 D5645.domain.com 2011-06-09 03:12:41 2011-06-09 03:58:59
8     user1 D5682.domain.com 2011-01-03 05:03:45 2011-01-03 05:29:43
9     USER2 D5682.domain.com 2011-01-12 07:26:05 2011-01-12 07:32:53
10    USER2 D5682.domain.com 2011-01-17 08:06:19 2011-01-17 08:44:22
11    USER2 D5682.domain.com 2011-01-18 08:07:30 2011-01-18 08:42:43
12    USER2 D5682.domain.com 2011-01-25 08:20:55 2011-01-25 08:24:38
13    USER2 D5682.domain.com 2011-02-14 07:59:23 2011-02-14 08:14:47
qid & accept id: (9020831, 9021069) query: Run multiple subprocesses in foreach loop? One at the time? soup:

Calling

\n
self.rsyncRun.communicate()\n
\n

will block the main process until the rsyncRun process has finished.

\n
\n

If you do not want the main process to block, then spawn a thread to handle the calls to subprocess.Popen:

\n
import threading\n\ndef worker():\n    for share in shares.split(', '):\n        ...\n        rsyncRun = subprocess.Popen(...)\n        out, err = rsyncRun.communicate()\n\nt = threading.Thread(target = worker)\nt.daemon = True\nt.start()\nt.join()\n
\n soup wrap:

Calling

self.rsyncRun.communicate()

will block the main process until the rsyncRun process has finished.


If you do not want the main process to block, then spawn a thread to handle the calls to subprocess.Popen:

import threading

def worker():
    for share in shares.split(', '):
        ...
        rsyncRun = subprocess.Popen(...)
        out, err = rsyncRun.communicate()

t = threading.Thread(target = worker)
t.daemon = True
t.start()
t.join()
qid & accept id: (9091299, 9091862) query: How to let js make a request from python and preserve the loaded site in place when answered by python soup:

For example you can make with jQuery like this,\nin controller you return rendered template:

\n
def some_html():\n    return render('my_template.tpl')\n
\n

and in the client side you can use jQuery

\n
\n
\n

,where result_from_server its can be id of wrapper div like

\n
\n
\n

and /some_html, url for call your some_html() function.

\n

Very good resurce for quick start with jQuery jqapi.com

\n soup wrap:

For example you can make with jQuery like this, in controller you return rendered template:

def some_html():
    return render('my_template.tpl')

and in the client side you can use jQuery


,where result_from_server its can be id of wrapper div like

and /some_html, url for call your some_html() function.

Very good resurce for quick start with jQuery jqapi.com

qid & accept id: (9151104, 9151126) query: How to iterate through a list of lists in python? soup:

The simplest solution for doing exactly what you specified is:

\n
documents = [sub_list[0] for sub_list in documents]\n
\n

This is basically equivalent to the iterative version:

\n
temp = []\nfor sub_list in documents:\n    temp.append(sub_list[0])\ndocuments = temp\n
\n

This is however not really a general way of iterating through a multidimensional list with an arbitrary number of dimensions, since nested list comprehensions / nested for loops can get ugly; however you should be safe doing it for 2 or 3-d lists.

\n

If you do decide to you need to flatten more than 3 dimensions, I'd recommend implementing a recursive traversal function which flattens all non-flat layers.

\n soup wrap:

The simplest solution for doing exactly what you specified is:

documents = [sub_list[0] for sub_list in documents]

This is basically equivalent to the iterative version:

temp = []
for sub_list in documents:
    temp.append(sub_list[0])
documents = temp

This is however not really a general way of iterating through a multidimensional list with an arbitrary number of dimensions, since nested list comprehensions / nested for loops can get ugly; however you should be safe doing it for 2 or 3-d lists.

If you do decide to you need to flatten more than 3 dimensions, I'd recommend implementing a recursive traversal function which flattens all non-flat layers.

qid & accept id: (9232944, 9233087) query: How to save big (not huge) dictonaries in Python? soup:

If you have a dictionary where the keys are strings and the values are arrays, like this:

\n
>>> import numpy\n>>> arrs = {'a': numpy.array([1,2]),\n            'b': numpy.array([3,4]),\n            'c': numpy.array([5,6])}\n
\n

You can use numpy.savez to save them, by key, to a compressed file:

\n
>>> numpy.savez('file.npz', **arrs)\n
\n

To load it back:

\n
>>> npzfile = numpy.load('file.npz')\n>>> npzfile\n\n>>> npzfile['a']\narray([1, 2])\n>>> npzfile['b']\narray([3, 4])\n>>> npzfile['c']\narray([5, 6])\n
\n soup wrap:

If you have a dictionary where the keys are strings and the values are arrays, like this:

>>> import numpy
>>> arrs = {'a': numpy.array([1,2]),
            'b': numpy.array([3,4]),
            'c': numpy.array([5,6])}

You can use numpy.savez to save them, by key, to a compressed file:

>>> numpy.savez('file.npz', **arrs)

To load it back:

>>> npzfile = numpy.load('file.npz')
>>> npzfile

>>> npzfile['a']
array([1, 2])
>>> npzfile['b']
array([3, 4])
>>> npzfile['c']
array([5, 6])
qid & accept id: (9242450, 9242870) query: Borda Count using python? soup:
import itertools\nimport collections\n\ndef borda(ballot):\n    n = len([c for c in ballot if c.isalpha()]) - 1\n    score = itertools.count(n, step = -1)\n    result = {}\n    for group in [item.split('=') for item in ballot.split('>')]:\n        s = sum(next(score) for item in group)/float(len(group))\n        for pref in group:\n            result[pref] = s\n    return result\n\ndef tally(ballots):\n    result = collections.defaultdict(int)\n    for ballot in ballots:\n        for pref,score in borda(ballot).iteritems():\n            result[pref]+=score\n    result = dict(result)\n    return result\n\nballots = ['A>B>C>D>E',\n           'A>B>C=D=E',\n           'A>B=C>D>E', \n           ]\n\nprint(tally(ballots))\n
\n

yields

\n
{'A': 12.0, 'C': 5.5, 'B': 8.5, 'E': 1.0, 'D': 3.0}\n
\n soup wrap:
import itertools
import collections

def borda(ballot):
    n = len([c for c in ballot if c.isalpha()]) - 1
    score = itertools.count(n, step = -1)
    result = {}
    for group in [item.split('=') for item in ballot.split('>')]:
        s = sum(next(score) for item in group)/float(len(group))
        for pref in group:
            result[pref] = s
    return result

def tally(ballots):
    result = collections.defaultdict(int)
    for ballot in ballots:
        for pref,score in borda(ballot).iteritems():
            result[pref]+=score
    result = dict(result)
    return result

ballots = ['A>B>C>D>E',
           'A>B>C=D=E',
           'A>B=C>D>E', 
           ]

print(tally(ballots))

yields

{'A': 12.0, 'C': 5.5, 'B': 8.5, 'E': 1.0, 'D': 3.0}
qid & accept id: (9252871, 9252944) query: Calling variables from other files in Python soup:

A better approach is to not rely on globals from another module, and simply pass the name into the file2.action() function:

\n

file1.py

\n
import file2\n\nusername = "steven"\n\nfile2.action(username)\n
\n

file2.py

\n
def action(name):\n    print name \n
\n soup wrap:

A better approach is to not rely on globals from another module, and simply pass the name into the file2.action() function:

file1.py

import file2

username = "steven"

file2.action(username)

file2.py

def action(name):
    print name 
qid & accept id: (9288169, 9288207) query: Python word length function example needed soup:

I think the list approach is quite viable -- you're almost there already.

\n

Your text.split() already produces an array of words, so you can do:

\n
words = text.split()\ntotalwords = len(words)\n
\n

Then, you could select the first 20 as you say (if there's too many words), and join the array back together.

\n

To join, look at str.join.

\n

As an example:

\n
'||'.join(['eggs','and','ham'])\n# returns 'eggs||and||ham'\n
\n soup wrap:

I think the list approach is quite viable -- you're almost there already.

Your text.split() already produces an array of words, so you can do:

words = text.split()
totalwords = len(words)

Then, you could select the first 20 as you say (if there's too many words), and join the array back together.

To join, look at str.join.

As an example:

'||'.join(['eggs','and','ham'])
# returns 'eggs||and||ham'
qid & accept id: (9364754, 9365542) query: Remembering Scroll value of a QTreeWidget in PyQt soup:

You could scroll to the actual previous values, like you are asking, but are you sure your results will always be the same size? Those numbers could be meaningless in terms of taking you to the right spot again. But just for reference, you would have to access the scroll bar, take its value, then perform your repopulation, and then scroll that value again:

\n
bar = treeWidget.verticalScrollBar()\nyScroll = bar.value()\n# repopulate here ...\ntreeWidget.scrollContentsBy(0, yScroll)\n
\n

But a more useful approach would be to find the item that is current in view or of interest, then repopulate your tree, and then tell the tree to scroll to that actual item. Then it won't matter where in the tree the item now exists (if the data structure has changed significantly).

\n

First save the current item by some criteria:

\n
item = treeWidget.currentItem() # one way\nitem = treeWidget.itemAt(centerOfTree) # another way\n\n# either save the text value or whatever the custom \n# identifying value is of your item\ntext = item.text()\n
\n

Once you have that data value, be it the text value or some other custom data value, you can repopulate your tree, then look up that item again.

\n
# this is assuming the item is both present, \n# and referencing it by its string value\nnewItem = treeWidget.findItems(text)[0]\ntreeWidget.scrollToItem(newItem)\n
\n

You can modify this to suit your actual type of items. You may be storing some other custom value on the items to find them again.

\n soup wrap:

You could scroll to the actual previous values, like you are asking, but are you sure your results will always be the same size? Those numbers could be meaningless in terms of taking you to the right spot again. But just for reference, you would have to access the scroll bar, take its value, then perform your repopulation, and then scroll that value again:

bar = treeWidget.verticalScrollBar()
yScroll = bar.value()
# repopulate here ...
treeWidget.scrollContentsBy(0, yScroll)

But a more useful approach would be to find the item that is current in view or of interest, then repopulate your tree, and then tell the tree to scroll to that actual item. Then it won't matter where in the tree the item now exists (if the data structure has changed significantly).

First save the current item by some criteria:

item = treeWidget.currentItem() # one way
item = treeWidget.itemAt(centerOfTree) # another way

# either save the text value or whatever the custom 
# identifying value is of your item
text = item.text()

Once you have that data value, be it the text value or some other custom data value, you can repopulate your tree, then look up that item again.

# this is assuming the item is both present, 
# and referencing it by its string value
newItem = treeWidget.findItems(text)[0]
treeWidget.scrollToItem(newItem)

You can modify this to suit your actual type of items. You may be storing some other custom value on the items to find them again.

qid & accept id: (9394051, 9394126) query: Get non-contiguous columns from a list of lists soup:
>>> a = [[1,2,3],[4,5,6]]\n>>> from operator import itemgetter\n>>> map(itemgetter(0,2), a)\n[(1, 3), (4, 6)]\n>>> \n
\n

or as a list comprehension

\n
>>> [itemgetter(0,2)(i) for i in a]\n[(1, 3), (4, 6)]\n
\n soup wrap:
>>> a = [[1,2,3],[4,5,6]]
>>> from operator import itemgetter
>>> map(itemgetter(0,2), a)
[(1, 3), (4, 6)]
>>> 

or as a list comprehension

>>> [itemgetter(0,2)(i) for i in a]
[(1, 3), (4, 6)]
qid & accept id: (9406400, 9406905) query: How can I use a pre-made color map for my heat map in matplotlib? soup:

It looks like you are simply calling get_cmap wrong. Try:

\n
from pylab import imshow, show, get_cmap\nfrom numpy import random\n\nZ = random.random((50,50))   # Test data\n\nimshow(Z, cmap=get_cmap("Spectral"), interpolation='nearest')\nshow()\n
\n

enter image description here

\n

What are the named colormaps?

\n

Running the code:

\n
from pylab import cm\nprint cm.datad.keys()\n
\n

Gives a list of colormaps, any of which can be substituted for "Spectral":

\n
['Spectral', 'summer', 'RdBu', 'Set1', 'Set2', 'Set3', 'brg_r', 'Dark2', 'hot', 'PuOr_r', 'afmhot_r', 'terrain_r', 'PuBuGn_r', 'RdPu', 'gist_ncar_r', 'gist_yarg_r', 'Dark2_r', 'YlGnBu', 'RdYlBu', 'hot_r', 'gist_rainbow_r', 'gist_stern', 'gnuplot_r', 'cool_r', 'cool', 'gray', 'copper_r', 'Greens_r', 'GnBu', 'gist_ncar', 'spring_r', 'gist_rainbow', 'RdYlBu_r', 'gist_heat_r', 'OrRd_r', 'bone', 'gist_stern_r', 'RdYlGn', 'Pastel2_r', 'spring', 'terrain', 'YlOrRd_r', 'Set2_r', 'winter_r', 'PuBu', 'RdGy_r', 'spectral', 'flag_r', 'jet_r', 'RdPu_r', 'Purples_r', 'gist_yarg', 'BuGn', 'Paired_r', 'hsv_r', 'bwr', 'YlOrRd', 'Greens', 'PRGn', 'gist_heat', 'spectral_r', 'Paired', 'hsv', 'Oranges_r', 'prism_r', 'Pastel2', 'Pastel1_r', 'Pastel1', 'gray_r', 'PuRd_r', 'Spectral_r', 'gnuplot2_r', 'BuPu', 'YlGnBu_r', 'copper', 'gist_earth_r', 'Set3_r', 'OrRd', 'PuBu_r', 'ocean_r', 'brg', 'gnuplot2', 'jet', 'bone_r', 'gist_earth', 'Oranges', 'RdYlGn_r', 'PiYG', 'YlGn', 'binary_r', 'gist_gray_r', 'Accent', 'BuPu_r', 'gist_gray', 'flag', 'seismic_r', 'RdBu_r', 'BrBG', 'Reds', 'BuGn_r', 'summer_r', 'GnBu_r', 'BrBG_r', 'Reds_r', 'RdGy', 'PuRd', 'Accent_r', 'Blues', 'Greys', 'autumn', 'PRGn_r', 'Greys_r', 'pink', 'binary', 'winter', 'gnuplot', 'pink_r', 'prism', 'YlOrBr', 'rainbow_r', 'rainbow', 'PiYG_r', 'YlGn_r', 'Blues_r', 'YlOrBr_r', 'seismic', 'Purples', 'bwr_r', 'autumn_r', 'ocean', 'Set1_r', 'PuOr', 'PuBuGn', 'afmhot']\n
\n soup wrap:

It looks like you are simply calling get_cmap wrong. Try:

from pylab import imshow, show, get_cmap
from numpy import random

Z = random.random((50,50))   # Test data

imshow(Z, cmap=get_cmap("Spectral"), interpolation='nearest')
show()

enter image description here

What are the named colormaps?

Running the code:

from pylab import cm
print cm.datad.keys()

Gives a list of colormaps, any of which can be substituted for "Spectral":

['Spectral', 'summer', 'RdBu', 'Set1', 'Set2', 'Set3', 'brg_r', 'Dark2', 'hot', 'PuOr_r', 'afmhot_r', 'terrain_r', 'PuBuGn_r', 'RdPu', 'gist_ncar_r', 'gist_yarg_r', 'Dark2_r', 'YlGnBu', 'RdYlBu', 'hot_r', 'gist_rainbow_r', 'gist_stern', 'gnuplot_r', 'cool_r', 'cool', 'gray', 'copper_r', 'Greens_r', 'GnBu', 'gist_ncar', 'spring_r', 'gist_rainbow', 'RdYlBu_r', 'gist_heat_r', 'OrRd_r', 'bone', 'gist_stern_r', 'RdYlGn', 'Pastel2_r', 'spring', 'terrain', 'YlOrRd_r', 'Set2_r', 'winter_r', 'PuBu', 'RdGy_r', 'spectral', 'flag_r', 'jet_r', 'RdPu_r', 'Purples_r', 'gist_yarg', 'BuGn', 'Paired_r', 'hsv_r', 'bwr', 'YlOrRd', 'Greens', 'PRGn', 'gist_heat', 'spectral_r', 'Paired', 'hsv', 'Oranges_r', 'prism_r', 'Pastel2', 'Pastel1_r', 'Pastel1', 'gray_r', 'PuRd_r', 'Spectral_r', 'gnuplot2_r', 'BuPu', 'YlGnBu_r', 'copper', 'gist_earth_r', 'Set3_r', 'OrRd', 'PuBu_r', 'ocean_r', 'brg', 'gnuplot2', 'jet', 'bone_r', 'gist_earth', 'Oranges', 'RdYlGn_r', 'PiYG', 'YlGn', 'binary_r', 'gist_gray_r', 'Accent', 'BuPu_r', 'gist_gray', 'flag', 'seismic_r', 'RdBu_r', 'BrBG', 'Reds', 'BuGn_r', 'summer_r', 'GnBu_r', 'BrBG_r', 'Reds_r', 'RdGy', 'PuRd', 'Accent_r', 'Blues', 'Greys', 'autumn', 'PRGn_r', 'Greys_r', 'pink', 'binary', 'winter', 'gnuplot', 'pink_r', 'prism', 'YlOrBr', 'rainbow_r', 'rainbow', 'PiYG_r', 'YlGn_r', 'Blues_r', 'YlOrBr_r', 'seismic', 'Purples', 'bwr_r', 'autumn_r', 'ocean', 'Set1_r', 'PuOr', 'PuBuGn', 'afmhot']
qid & accept id: (9410760, 9422332) query: Redirect stdout to logger in Python soup:

You might be able to make use of the suggestion in this post, summarised below:

\n
import logging\n\nclass LoggerWriter:\n    def __init__(self, logger, level):\n        self.logger = logger\n        self.level = level\n\n    def write(self, message):\n        if message != '\n':\n            self.logger.log(self.level, message)\n\ndef main():\n    logging.basicConfig(level=logging.DEBUG)\n    logger = logging.getLogger("demo")\n    info_fp = LoggerWriter(logger, logging.INFO)\n    debug_fp = LoggerWriter(logger, logging.DEBUG)\n    print >> info_fp, "An INFO message"\n    print >> debug_fp, "A DEBUG message"\n\nif __name__ == "__main__":\n    main()\n
\n

When run, the script prints:

\n
INFO:demo:An INFO message\nDEBUG:demo:An DEBUG message\n
\n soup wrap:

You might be able to make use of the suggestion in this post, summarised below:

import logging

class LoggerWriter:
    def __init__(self, logger, level):
        self.logger = logger
        self.level = level

    def write(self, message):
        if message != '\n':
            self.logger.log(self.level, message)

def main():
    logging.basicConfig(level=logging.DEBUG)
    logger = logging.getLogger("demo")
    info_fp = LoggerWriter(logger, logging.INFO)
    debug_fp = LoggerWriter(logger, logging.DEBUG)
    print >> info_fp, "An INFO message"
    print >> debug_fp, "A DEBUG message"

if __name__ == "__main__":
    main()

When run, the script prints:

INFO:demo:An INFO message
DEBUG:demo:An DEBUG message
qid & accept id: (9416934, 9417798) query: Speeding up linear interpolation of many pixel locations in NumPy soup:

Thanks to @JoeKington for the suggestion. Here's the best I can come up with using scipy.ndimage.map_coordinates

\n
# rest as before\nfrom scipy import ndimage\ntic = time.time()\nnew_result = np.zeros(im.shape)\ncoords = np.array([yy,xx,np.zeros(im.shape[:2])])\nfor d in range(im.shape[2]):\n    new_result[:,:,d] = ndimage.map_coordinates(im,coords,order=1)\n    coords[2] += 1\ntoc = time.time()\nprint "interpolation time:",toc-tic\n
\n

Update: Added the tweaks suggested in the comments and tried one or two other things. This is the fastest version:

\n
tic = time.time()\nnew_result = np.zeros(im.shape)\ncoords = np.array([yy,xx])\nfor d in range(im.shape[2]):\n    ndimage.map_coordinates(im[:,:,d],\n                            coords,order=1,\n                            prefilter=False,\n                            output=new_result[:,:,d] )\ntoc = time.time()\n\nprint "interpolation time:",toc-tic\n
\n

Example running time:

\n
 original version: 0.463063955307\n   better version: 0.204537153244\n     best version: 0.121845006943\n
\n soup wrap:

Thanks to @JoeKington for the suggestion. Here's the best I can come up with using scipy.ndimage.map_coordinates

# rest as before
from scipy import ndimage
tic = time.time()
new_result = np.zeros(im.shape)
coords = np.array([yy,xx,np.zeros(im.shape[:2])])
for d in range(im.shape[2]):
    new_result[:,:,d] = ndimage.map_coordinates(im,coords,order=1)
    coords[2] += 1
toc = time.time()
print "interpolation time:",toc-tic

Update: Added the tweaks suggested in the comments and tried one or two other things. This is the fastest version:

tic = time.time()
new_result = np.zeros(im.shape)
coords = np.array([yy,xx])
for d in range(im.shape[2]):
    ndimage.map_coordinates(im[:,:,d],
                            coords,order=1,
                            prefilter=False,
                            output=new_result[:,:,d] )
toc = time.time()

print "interpolation time:",toc-tic

Example running time:

 original version: 0.463063955307
   better version: 0.204537153244
     best version: 0.121845006943
qid & accept id: (9416947, 9417088) query: Python Class Based Decorator with parameters that can decorate a method or a function soup:

You don't need to mess around with descriptors. It's enough to create a wrapper function inside the __call__() method and return it. Standard Python functions can always act as either a method or a function, depending on context:

\n
class MyDecorator(object):\n    def __init__(self, argument):\n        self.arg = argument\n\n    def __call__(self, fn):\n        @functools.wraps(fn)\n        def decorated(*args, **kwargs):\n            print "In my decorator before call, with arg %s" % self.arg\n            fn(*args, **kwargs)\n            print "In my decorator after call, with arg %s" % self.arg\n        return decorated\n
\n

A bit of explanation about what's going on when this decorator is used like this:

\n
@MyDecorator("some other func!")\ndef some_other_function():\n    print "in some other function!"\n
\n

The first line creates an instance of MyDecorator and passes "some other func!" as an argument to __init__(). Let's call this instance my_decorator. Next, the undecorated function object -- let's call it bare_func -- is created and passed to the decorator instance, so my_decorator(bare_func) is executed. This will invoke MyDecorator.__call__(), which will create and return a wrapper function. Finally this wrapper function is assigned to the name some_other_function.

\n soup wrap:

You don't need to mess around with descriptors. It's enough to create a wrapper function inside the __call__() method and return it. Standard Python functions can always act as either a method or a function, depending on context:

class MyDecorator(object):
    def __init__(self, argument):
        self.arg = argument

    def __call__(self, fn):
        @functools.wraps(fn)
        def decorated(*args, **kwargs):
            print "In my decorator before call, with arg %s" % self.arg
            fn(*args, **kwargs)
            print "In my decorator after call, with arg %s" % self.arg
        return decorated

A bit of explanation about what's going on when this decorator is used like this:

@MyDecorator("some other func!")
def some_other_function():
    print "in some other function!"

The first line creates an instance of MyDecorator and passes "some other func!" as an argument to __init__(). Let's call this instance my_decorator. Next, the undecorated function object -- let's call it bare_func -- is created and passed to the decorator instance, so my_decorator(bare_func) is executed. This will invoke MyDecorator.__call__(), which will create and return a wrapper function. Finally this wrapper function is assigned to the name some_other_function.

qid & accept id: (9419848, 9420513) query: Python - read BeautifulSoup snippet by row? (or other ways of scraping the data I want) soup:

Assuming address contains your raw address.

\n

\n Some address and street\n
\n City, State, ZIP\n (some) phone-number\n

\n
\n

Then you can replace the break line with a comma, before finally splitting by comma. This is not ideal but for these scenarios when there is no clear separation between elements (spans, id's etc...) then it all comes down to positional checking.

\n
address.find("br").replaceWith(",")\naddressComponents = address.text.split(",")\n
\n

That gives you the following four components in the addressComponents list.

\n
\nSome address and street\nCity\n State\n ZIP\n              (some) phone-number\n
\n

As there is no break line for the ZIP and phone number there appears to be a newline character inserted. So to split the final component:

\n
addressSplit = addressComponents[3].split("\n")\nprint addressSplit[0] # Zip code\nprint addressSplit[1].strip() # Phone number\n
\n soup wrap:

Assuming address contains your raw address.

Some address and street
City, State, ZIP (some) phone-number

Then you can replace the break line with a comma, before finally splitting by comma. This is not ideal but for these scenarios when there is no clear separation between elements (spans, id's etc...) then it all comes down to positional checking.

address.find("br").replaceWith(",")
addressComponents = address.text.split(",")

That gives you the following four components in the addressComponents list.

Some address and street
City
 State
 ZIP
              (some) phone-number

As there is no break line for the ZIP and phone number there appears to be a newline character inserted. So to split the final component:

addressSplit = addressComponents[3].split("\n")
print addressSplit[0] # Zip code
print addressSplit[1].strip() # Phone number
qid & accept id: (9456233, 9456315) query: Search and sort through dictionary in Python soup:

If you want to sort the dictionary based on the integer value you can do the following.

\n
d = {'secondly': 2, 'pardon': 6, 'saves': 1, 'knelt': 1}\na = sorted(d.iteritems(), key=lambda x:x[1], reverse=True)\n
\n

The a will contain a list of tuples:

\n
[('pardon', 6), ('secondly', 2), ('saves', 1), ('knelt', 1)]\n
\n

Which you can limit to a top 50 by using a[:50] and then search through the keys, with youre search pattern.

\n soup wrap:

If you want to sort the dictionary based on the integer value you can do the following.

d = {'secondly': 2, 'pardon': 6, 'saves': 1, 'knelt': 1}
a = sorted(d.iteritems(), key=lambda x:x[1], reverse=True)

The a will contain a list of tuples:

[('pardon', 6), ('secondly', 2), ('saves', 1), ('knelt', 1)]

Which you can limit to a top 50 by using a[:50] and then search through the keys, with youre search pattern.

qid & accept id: (9460992, 12950794) query: progress bar properties python2.72 pywinauto soup:

According to the documentation, Windows Progress common control has the next additional methods:

\n
GetPosition()\nGetState()\nGetStep()\nSetPosition(pos)\nStepIt()\n
\n

If you need a text, use the following construction:

\n
window['Progress1'].Texts()\n
\n

Also, you can easily view the properties and methods available through a GUI tool for pywinauto.

\n soup wrap:

According to the documentation, Windows Progress common control has the next additional methods:

GetPosition()
GetState()
GetStep()
SetPosition(pos)
StepIt()

If you need a text, use the following construction:

window['Progress1'].Texts()

Also, you can easily view the properties and methods available through a GUI tool for pywinauto.

qid & accept id: (9465236, 9469400) query: Python - Twisted, Proxy and modifying content soup:

To create ProxyFactory that can modify server response headers, content you could override ProxyClient.handle*() methods:

\n
from twisted.python import log\nfrom twisted.web import http, proxy\n\nclass ProxyClient(proxy.ProxyClient):\n    """Mangle returned header, content here.\n\n    Use `self.father` methods to modify request directly.\n    """\n    def handleHeader(self, key, value):\n        # change response header here\n        log.msg("Header: %s: %s" % (key, value))\n        proxy.ProxyClient.handleHeader(self, key, value)\n\n    def handleResponsePart(self, buffer):\n        # change response part here\n        log.msg("Content: %s" % (buffer[:50],))\n        # make all content upper case\n        proxy.ProxyClient.handleResponsePart(self, buffer.upper())\n\nclass ProxyClientFactory(proxy.ProxyClientFactory):\n    protocol = ProxyClient\n\nclass ProxyRequest(proxy.ProxyRequest):\n    protocols = dict(http=ProxyClientFactory)\n\nclass Proxy(proxy.Proxy):\n    requestFactory = ProxyRequest\n\nclass ProxyFactory(http.HTTPFactory):\n    protocol = Proxy\n
\n

I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.

\n

To run it as a script or via twistd, add at the end:

\n
portstr = "tcp:8080:interface=localhost" # serve on localhost:8080\n\nif __name__ == '__main__': # $ python proxy_modify_request.py\n    import sys\n    from twisted.internet import endpoints, reactor\n\n    def shutdown(reason, reactor, stopping=[]):\n        """Stop the reactor."""\n        if stopping: return\n        stopping.append(True)\n        if reason:\n            log.msg(reason.value)\n        reactor.callWhenRunning(reactor.stop)\n\n    log.startLogging(sys.stdout)\n    endpoint = endpoints.serverFromString(reactor, portstr)\n    d = endpoint.listen(ProxyFactory())\n    d.addErrback(shutdown, reactor)\n    reactor.run()\nelse: # $ twistd -ny proxy_modify_request.py\n    from twisted.application import service, strports\n\n    application = service.Application("proxy_modify_request")\n    strports.service(portstr, ProxyFactory()).setServiceParent(application)\n
\n

Usage

\n
$ twistd -ny proxy_modify_request.py\n
\n

In another terminal:

\n
$ curl -x localhost:8080 http://example.com\n
\n soup wrap:

To create ProxyFactory that can modify server response headers, content you could override ProxyClient.handle*() methods:

from twisted.python import log
from twisted.web import http, proxy

class ProxyClient(proxy.ProxyClient):
    """Mangle returned header, content here.

    Use `self.father` methods to modify request directly.
    """
    def handleHeader(self, key, value):
        # change response header here
        log.msg("Header: %s: %s" % (key, value))
        proxy.ProxyClient.handleHeader(self, key, value)

    def handleResponsePart(self, buffer):
        # change response part here
        log.msg("Content: %s" % (buffer[:50],))
        # make all content upper case
        proxy.ProxyClient.handleResponsePart(self, buffer.upper())

class ProxyClientFactory(proxy.ProxyClientFactory):
    protocol = ProxyClient

class ProxyRequest(proxy.ProxyRequest):
    protocols = dict(http=ProxyClientFactory)

class Proxy(proxy.Proxy):
    requestFactory = ProxyRequest

class ProxyFactory(http.HTTPFactory):
    protocol = Proxy

I've got this solution by looking at the source of twisted.web.proxy. I don't know how idiomatic it is.

To run it as a script or via twistd, add at the end:

portstr = "tcp:8080:interface=localhost" # serve on localhost:8080

if __name__ == '__main__': # $ python proxy_modify_request.py
    import sys
    from twisted.internet import endpoints, reactor

    def shutdown(reason, reactor, stopping=[]):
        """Stop the reactor."""
        if stopping: return
        stopping.append(True)
        if reason:
            log.msg(reason.value)
        reactor.callWhenRunning(reactor.stop)

    log.startLogging(sys.stdout)
    endpoint = endpoints.serverFromString(reactor, portstr)
    d = endpoint.listen(ProxyFactory())
    d.addErrback(shutdown, reactor)
    reactor.run()
else: # $ twistd -ny proxy_modify_request.py
    from twisted.application import service, strports

    application = service.Application("proxy_modify_request")
    strports.service(portstr, ProxyFactory()).setServiceParent(application)

Usage

$ twistd -ny proxy_modify_request.py

In another terminal:

$ curl -x localhost:8080 http://example.com
qid & accept id: (9487389, 9487424) query: python remove element from list while traversing it soup:

Don't change the length of a list while iterating over it. It won't work.

\n
>>> l = range(10)\n>>> for i in l:\n...     l.remove(i)\n... \n>>> l\n[1, 3, 5, 7, 9]\n
\n

See? The problem is that when you remove an item, the following items are all shifted back by one, but the location of the index remains the same. The effect is that the item after the removed item gets skipped. Depending on what you're doing, a list comprehension is preferable.

\n
>>> l = range(10)\n>>> for i in l:\n...     if i in [2, 3, 5, 6, 8, 9]:\n...         l.remove(i)\n... \n>>> l\n[0, 1, 3, 4, 6, 7, 9]\n>>> [i for i in range(10) if not i in [2, 3, 5, 6, 8, 9]]\n[0, 1, 4, 7]\n
\n soup wrap:

Don't change the length of a list while iterating over it. It won't work.

>>> l = range(10)
>>> for i in l:
...     l.remove(i)
... 
>>> l
[1, 3, 5, 7, 9]

See? The problem is that when you remove an item, the following items are all shifted back by one, but the location of the index remains the same. The effect is that the item after the removed item gets skipped. Depending on what you're doing, a list comprehension is preferable.

>>> l = range(10)
>>> for i in l:
...     if i in [2, 3, 5, 6, 8, 9]:
...         l.remove(i)
... 
>>> l
[0, 1, 3, 4, 6, 7, 9]
>>> [i for i in range(10) if not i in [2, 3, 5, 6, 8, 9]]
[0, 1, 4, 7]
qid & accept id: (9504638, 9504674) query: Evaluate multiple variables in one 'if' statement? soup:

You should never test a boolean variable with == True (or == False). Instead, either write:

\n
if not (var1 or var2 or var3 or var4):\n
\n

or use any (and in related problems its cousin all):

\n
if not any((var1, var2, var3, var4)):\n
\n

or use Python's transitive comparisons:

\n
if var1 == var2 == var3 == var4 == False:\n
\n soup wrap:

You should never test a boolean variable with == True (or == False). Instead, either write:

if not (var1 or var2 or var3 or var4):

or use any (and in related problems its cousin all):

if not any((var1, var2, var3, var4)):

or use Python's transitive comparisons:

if var1 == var2 == var3 == var4 == False:
qid & accept id: (9538171, 9538336) query: Acquiring the Minimum array out of Multiple Arrays by order in Python soup:

Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.

\n
a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]\na.sort()\n
\n

gives

\n
[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]\n
\n

The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as

\n
sorted(a.tolist())[0]\n
\n

As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).

\n soup wrap:

Using the python non-numpy .sort() or sorted() on a list of lists (not numpy arrays) automatically does this e.g.

a = [[1,2,3],[2,3,1],[3,2,1],[1,3,2]]
a.sort()

gives

[[1,2,3],[1,3,2],[2,3,1],[3,2,1]]

The numpy sort seems to only sort the subarrays recursively so it seems the best way would be to convert it to a python list first. Assuming you have an array of arrays you want to pick the minimum of you could get the minimum as

sorted(a.tolist())[0]

As someone pointed out you could also do min(a.tolist()) which uses the same type of comparisons as sort, and would be faster for large arrays (linear vs n log n asymptotic run time).

qid & accept id: (9542738, 9542768) query: Python: Find in list soup:

As for your first question: that code is perfectly fine and should work if item equals one of the elements inside myList. Maybe you try to find a string that does not exactly match one of the items or maybe you are using a float value which suffers from inaccuracy.

\n

As for your second question: There's actually several possible ways if "finding" things in lists.

\n

Checking if something is inside

\n

This is the use case you describe: Checking whether something is inside a list or not. As you know, you can use the in operator for that:

\n
3 in [1, 2, 3] # => True\n
\n

Filtering a collection

\n

That is, finding all elements in a sequence that meet a certain condition. You can use list comprehension or generator expressions for that:

\n
matches = [x for x in lst if fulfills_some_condition(x)]\nmatches = (x for x in lst if x > 6)\n
\n

The latter will return a generator which you can imagine as a sort of lazy list that will only be built as soon as you iterate through it. By the way, the first one is exactly equivalent to

\n
matches = filter(fulfills_some_condition, lst)\n
\n

in Python 2. Here you can see higher-order functions at work. In Python 3, filter doesn't return a list, but a generator-like object.

\n

Finding the first occurrence

\n

If you only want the first thing that matches a condition (but you don't know what it is yet), it's fine to use a for loop (possibly using the else clause as well, which is not really well-known). You can also use

\n
next(x for x in lst if ...)\n
\n

which will return the first match or raise a StopIteration if none is found. Alternatively, you can use

\n
next((x for x in lst if ...), [default value])\n
\n

Finding the location of an item

\n

For lists, there's also the index method that can sometimes be useful if you want to know where a certain element is in the list:

\n
[1,2,3].index(2) # => 1\n[1,2,3].index(4) # => ValueError\n
\n

However, note that if you have duplicates, .index always returns the lowest index:......

\n
[1,2,3,2].index(2) # => 1\n
\n

If there are duplicates and you want all the indexes then you can use enumerate() instead:

\n
[i for i,x in numerate([1,2,3,2]) if x==2] # => [1, 3]\n
\n soup wrap:

As for your first question: that code is perfectly fine and should work if item equals one of the elements inside myList. Maybe you try to find a string that does not exactly match one of the items or maybe you are using a float value which suffers from inaccuracy.

As for your second question: There's actually several possible ways if "finding" things in lists.

Checking if something is inside

This is the use case you describe: Checking whether something is inside a list or not. As you know, you can use the in operator for that:

3 in [1, 2, 3] # => True

Filtering a collection

That is, finding all elements in a sequence that meet a certain condition. You can use list comprehension or generator expressions for that:

matches = [x for x in lst if fulfills_some_condition(x)]
matches = (x for x in lst if x > 6)

The latter will return a generator which you can imagine as a sort of lazy list that will only be built as soon as you iterate through it. By the way, the first one is exactly equivalent to

matches = filter(fulfills_some_condition, lst)

in Python 2. Here you can see higher-order functions at work. In Python 3, filter doesn't return a list, but a generator-like object.

Finding the first occurrence

If you only want the first thing that matches a condition (but you don't know what it is yet), it's fine to use a for loop (possibly using the else clause as well, which is not really well-known). You can also use

next(x for x in lst if ...)

which will return the first match or raise a StopIteration if none is found. Alternatively, you can use

next((x for x in lst if ...), [default value])

Finding the location of an item

For lists, there's also the index method that can sometimes be useful if you want to know where a certain element is in the list:

[1,2,3].index(2) # => 1
[1,2,3].index(4) # => ValueError

However, note that if you have duplicates, .index always returns the lowest index:......

[1,2,3,2].index(2) # => 1

If there are duplicates and you want all the indexes then you can use enumerate() instead:

[i for i,x in numerate([1,2,3,2]) if x==2] # => [1, 3]
qid & accept id: (9575384, 9575493) query: Combine variable and for each loop python soup:

No, "pythonic" means how normal python code would look like (see gnibbler's answer: that is what "pythonic" is about).

\n

If you want something that will do exactly what you want, you can do:

\n
def zipMap(func, iterable):\n    for x in iterable:\n        yield x,func(x)\n
\n

Then:

\n
for x,y in zipMap(get_handler, the_list):\n    ...\n
\n

Do note that this doesn't save you any typing at all. The only way it would save you typing is if you were using it for currying:

\n
def withHandler(iterable):\n    for x in iterable:\n        yield x,get_handler(x)\n
\n

In which case it does save you typing:

\n
for x,y in withHandler(the_list):\n    ...\n
\n

Thus it might be reasonable if you happened to use it a lot. It would not be considered "pythonic" though.

\n soup wrap:

No, "pythonic" means how normal python code would look like (see gnibbler's answer: that is what "pythonic" is about).

If you want something that will do exactly what you want, you can do:

def zipMap(func, iterable):
    for x in iterable:
        yield x,func(x)

Then:

for x,y in zipMap(get_handler, the_list):
    ...

Do note that this doesn't save you any typing at all. The only way it would save you typing is if you were using it for currying:

def withHandler(iterable):
    for x in iterable:
        yield x,get_handler(x)

In which case it does save you typing:

for x,y in withHandler(the_list):
    ...

Thus it might be reasonable if you happened to use it a lot. It would not be considered "pythonic" though.

qid & accept id: (9630668, 9631548) query: Convert date to second from a reference - Python soup:

calendar.timegm is the good approach. Just pass it the utctimetuple() output from your datetime object:

\n
from datetime import datetime\nimport pytz\nimport calendar\n\ndt = datetime.now(pytz.utc)\nsecs = calendar.timegm(dt.utctimetuple())\nprint dt, secs\n
\n

prints

\n
2012-03-09 09:17:14.698500+00:00 1331284634\n
\n

just to test it against the epoch:

\n
print calendar.timegm(datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc).utctimetuple())\n
\n

prints 0.

\n soup wrap:

calendar.timegm is the good approach. Just pass it the utctimetuple() output from your datetime object:

from datetime import datetime
import pytz
import calendar

dt = datetime.now(pytz.utc)
secs = calendar.timegm(dt.utctimetuple())
print dt, secs

prints

2012-03-09 09:17:14.698500+00:00 1331284634

just to test it against the epoch:

print calendar.timegm(datetime(1970, 1, 1, 0, 0, 0, tzinfo=pytz.utc).utctimetuple())

prints 0.

qid & accept id: (9668867, 9669484) query: read snippet of file with regular expressions from text file in python soup:

I don't think you actually need a regular expression at all, you can just use endswith. Here's how I would implement it. Its not extensible, but it does what you want:

\n
matching = False\nfound = []\nwith open('fileinput.txt', 'r') as file\n    it = iter(file)\n    for line in it:\n        if matching:\n            if line.strip() == '':\n                break\n            else:\n                found.append(line)\n        elif line.endswith('PATTERN:'):\n            for _ in range(6):\n                next(it)\n            matching = True\n
\n

Since you know that START happens 5 lines after PATTERN there's no need to search for it, so instead I used assert to make sure that it is where expected. The lines matching are stored to found, and you can print them out nicely with

\n
for line in found:\n    print line\n
\n soup wrap:

I don't think you actually need a regular expression at all, you can just use endswith. Here's how I would implement it. Its not extensible, but it does what you want:

matching = False
found = []
with open('fileinput.txt', 'r') as file
    it = iter(file)
    for line in it:
        if matching:
            if line.strip() == '':
                break
            else:
                found.append(line)
        elif line.endswith('PATTERN:'):
            for _ in range(6):
                next(it)
            matching = True

Since you know that START happens 5 lines after PATTERN there's no need to search for it, so instead I used assert to make sure that it is where expected. The lines matching are stored to found, and you can print them out nicely with

for line in found:
    print line
qid & accept id: (9670866, 9671028) query: Dynamic field calculations in Django soup:

You could do something like:

\n
class MyModel(models.Model):\n    ...\n    @property\n    def priority(self):\n        return (1 + (date.today() - self.reset_date) / self.days_to_expiration) * self.importance\n
\n

Then, you can access it like any other attribute on your model

\n
priority = my_model.priority\n
\n soup wrap:

You could do something like:

class MyModel(models.Model):
    ...
    @property
    def priority(self):
        return (1 + (date.today() - self.reset_date) / self.days_to_expiration) * self.importance

Then, you can access it like any other attribute on your model

priority = my_model.priority
qid & accept id: (9671165, 9671502) query: Open txt file, skip first lines and then monitor a given column of data soup:

You can try this:

\n
inputFile = open(path,'r')\nfor n, line in enumerate(inputFile):\n    if n > given_number:\n       variableX = line.split(' ')[5]\ninputFile.close()\n
\n

Edit based on the new information provided:

\n

Since you have a header, then the data and then one extra line, you can skip the header lines and then process only the ones that have the right amount of columns.

\n
inputFile = open(path,'r')\nhead_lines = 4\nfor n, line in enumerate(inputFile):\n    if n > head_lines:\n       cols = line.split()\n       if len(cols) == 9:               \n           variableX = cols[7]\n           # do whatever you need with variableX\ninputFile.close()\n
\n soup wrap:

You can try this:

inputFile = open(path,'r')
for n, line in enumerate(inputFile):
    if n > given_number:
       variableX = line.split(' ')[5]
inputFile.close()

Edit based on the new information provided:

Since you have a header, then the data and then one extra line, you can skip the header lines and then process only the ones that have the right amount of columns.

inputFile = open(path,'r')
head_lines = 4
for n, line in enumerate(inputFile):
    if n > head_lines:
       cols = line.split()
       if len(cols) == 9:               
           variableX = cols[7]
           # do whatever you need with variableX
inputFile.close()
qid & accept id: (9706041, 9706105) query: finding index of an item closest to the value in a list that's not entirely sorted soup:

Try the following:

\n
min(range(len(a)), key=lambda i: abs(a[i]-11.5))\n
\n

For example:

\n
>>> a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]\n>>> min(range(len(a)), key=lambda i: abs(a[i]-11.5))\n16\n
\n

Or to get the index and the value:

\n
>>> min(enumerate(a), key=lambda x: abs(x[1]-11.5))\n(16, 11.33447)\n
\n soup wrap:

Try the following:

min(range(len(a)), key=lambda i: abs(a[i]-11.5))

For example:

>>> a = [25.75443, 26.7803, 25.79099, 24.17642, 24.3526, 22.79056, 20.84866, 19.49222, 18.38086, 18.0358, 16.57819, 15.71255, 14.79059, 13.64154, 13.09409, 12.18347, 11.33447, 10.32184, 9.544922, 8.813385, 8.181152, 6.983734, 6.048035, 5.505096, 4.65799]
>>> min(range(len(a)), key=lambda i: abs(a[i]-11.5))
16

Or to get the index and the value:

>>> min(enumerate(a), key=lambda x: abs(x[1]-11.5))
(16, 11.33447)
qid & accept id: (9761554, 9764301) query: How to get a list of the elements in TreeView? PyGtk soup:

I'd say you get the model:

\n
model = self.treeview.get_model()\n
\n

And then you have tons of different ways to access your data/items depending on what you want and how the model look... For more on that check http://pygtk.org

\n

You could get first row by doing:

\n
model[0]\n
\n

And also you could iterate through it...

\n soup wrap:

I'd say you get the model:

model = self.treeview.get_model()

And then you have tons of different ways to access your data/items depending on what you want and how the model look... For more on that check http://pygtk.org

You could get first row by doing:

model[0]

And also you could iterate through it...

qid & accept id: (9761562, 9761614) query: How many factors in an integer soup:

The % (modulus) operator gives you the remainder of a division. If that remainder is 0, then the second multiple is a factor of the second. So just loop through all the numbers from 1 to n and check if they're factors; if so, add them to the list with append:

\n
def factors(n):\n    result = []\n\n    for i in range(1, n + 1):\n        if n % i == 0:\n            result.append(i)\n\n    return result\n
\n

Here's a demo.

\n

Or, more concisely using lambdas:

\n
def factors(n):\n    return filter(lambda i: n % i == 0, range(1, n + 1))\n
\n

Here's a demo.

\n soup wrap:

The % (modulus) operator gives you the remainder of a division. If that remainder is 0, then the second multiple is a factor of the second. So just loop through all the numbers from 1 to n and check if they're factors; if so, add them to the list with append:

def factors(n):
    result = []

    for i in range(1, n + 1):
        if n % i == 0:
            result.append(i)

    return result

Here's a demo.

Or, more concisely using lambdas:

def factors(n):
    return filter(lambda i: n % i == 0, range(1, n + 1))

Here's a demo.

qid & accept id: (9787427, 9788226) query: What would be a good regexp for identifying the "original message" prefix in gmail? soup:

The following regex will match gmails prefix in a pretty safe manner. It ensures that there are 3 commas and the liter text On ... wrote

\n
On([^,]+,){3}.*?wrote:\n
\n

If the regex should match in a case insensitve way then don't forget to add the modifier.

\n
if re.search("On([^,]+,){3}.*?wrote:", subject, re.IGNORECASE):\n    # Successful match\nelse:\n    # Match attempt failed\n
\n

Kind Regards, Buckley

\n
Match the characters “On” literally «On»\nMatch the regular expression below and capture its match into backreference number 1 «([^,]+,){3}»\n   Exactly 3 times «{3}»\n   Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{3}»\n   Match any character that is NOT a “,” «[^,]+»\n      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»\n   Match the character “,” literally «,»\nMatch any single character that is not a line break character «.*?»\n   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»\nMatch the characters “wrote:” literally «wrote:»\n\nCreated with RegexBuddy\n
\n soup wrap:

The following regex will match gmails prefix in a pretty safe manner. It ensures that there are 3 commas and the liter text On ... wrote

On([^,]+,){3}.*?wrote:

If the regex should match in a case insensitve way then don't forget to add the modifier.

if re.search("On([^,]+,){3}.*?wrote:", subject, re.IGNORECASE):
    # Successful match
else:
    # Match attempt failed

Kind Regards, Buckley

Match the characters “On” literally «On»
Match the regular expression below and capture its match into backreference number 1 «([^,]+,){3}»
   Exactly 3 times «{3}»
   Note: You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{3}»
   Match any character that is NOT a “,” «[^,]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
   Match the character “,” literally «,»
Match any single character that is not a line break character «.*?»
   Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “wrote:” literally «wrote:»

Created with RegexBuddy
qid & accept id: (9849828, 9850366) query: running through a loop and find a condition that match soup:

Print '#' if red else print '.'. If encounted sequence red, not red then print '.' for the rest of the array:

\n
prev = None\nit = iter(data)\nfor point in it:\n    if point == 'red':\n       print '#',\n    else:\n       print '.',\n       if prev == 'red': # encounted ['red', 'blank']\n          break\n    prev = point\n\nfor point in it:\n    print '.',\nprint\n
\n

Example

\n
blank blank red red blank red blank red red\n. . # # . . . . .\n
\n soup wrap:

Print '#' if red else print '.'. If encounted sequence red, not red then print '.' for the rest of the array:

prev = None
it = iter(data)
for point in it:
    if point == 'red':
       print '#',
    else:
       print '.',
       if prev == 'red': # encounted ['red', 'blank']
          break
    prev = point

for point in it:
    print '.',
print

Example

blank blank red red blank red blank red red
. . # # . . . . .
qid & accept id: (9857382, 9858152) query: Django Form with extra information soup:

The best way I know to do this is to initialize the fields before you pass the form to the template by passing an initial dictionary to the form or by passing a instance object to the form.

\n

You should then make sure that the fields are disabled, or you should make them hidden fields and then display the fields as regular text.

\n

Most importantly, if you're passing data to the client that will then be sent back in a form, you should make sure that the data coming in is the same as the data that went out (for security's sake). Do this with at clean_[field] function on the Form. It should look like the following.

\n
class MyForm(forms.ModelForm):\n    class Meta:\n        model = MyModel\n    def clean_date_created(self):\n        if self.cleaned_fields['date_created'] != self.instance.date_created:\n            raise ValidationError, 'date_created has been tampered'\n        self.cleaned_fields['date_created']\n
\n
\n

[Edit/Addendum] Alternatively, you can pass the data directly to your template to render separately, and then tack on the data to your form after you get it back into your view. It should go something like this:

\n
def recieve_form(request, ...):\n    ...\n    f = MyForm(request.POST, instance=a)\n    new_model_instance = f.save(commit=False)\n    new_model_instance.date_created = \n    new_model_instance.save()\n
\n soup wrap:

The best way I know to do this is to initialize the fields before you pass the form to the template by passing an initial dictionary to the form or by passing a instance object to the form.

You should then make sure that the fields are disabled, or you should make them hidden fields and then display the fields as regular text.

Most importantly, if you're passing data to the client that will then be sent back in a form, you should make sure that the data coming in is the same as the data that went out (for security's sake). Do this with at clean_[field] function on the Form. It should look like the following.

class MyForm(forms.ModelForm):
    class Meta:
        model = MyModel
    def clean_date_created(self):
        if self.cleaned_fields['date_created'] != self.instance.date_created:
            raise ValidationError, 'date_created has been tampered'
        self.cleaned_fields['date_created']

[Edit/Addendum] Alternatively, you can pass the data directly to your template to render separately, and then tack on the data to your form after you get it back into your view. It should go something like this:

def recieve_form(request, ...):
    ...
    f = MyForm(request.POST, instance=a)
    new_model_instance = f.save(commit=False)
    new_model_instance.date_created = 
    new_model_instance.save()
qid & accept id: (9897007, 10396105) query: Split string elements of a list with multiple separators/conditions. Any good Python library? soup:

Not sure if you're still having problems with this but here's an answer that I believe would work for you:

\n
#location_regexes.py\nimport re\nparen_pattern = re.compile(r"([^(]+, )?([^(]+?),? \(([^)]+)\)")\n\ndef parse_city_state(locations_list):\n    city_list = []\n    state_list = []\n    coordinate_pairs = []\n    for location in locations_list:\n        if '(' in location:\n            r = re.match(paren_pattern, location)\n            city_list.append(r.group(2))\n            state_list.append(r.group(3))\n        elif location[0].isdigit() or location[0] == '-':\n            coordinate_pairs.append(location.split(', '))\n        else:\n            city_list.append(location.split(', ', 1)[0])\n            state_list.append(location.split(', ', 1)[1])\n    return city_list, state_list, coordinate_pairs\n\n#to demonstrate output\nif __name__ == "__main__":\n    locations = ['Washington, DC', 'Miami, FL', 'New York, NY',\n                'Kaslo/Nelson area (Canada), BC', 'Plymouth (UK/England)',\n                'Mexico, DF - outskirts-, (Mexico),', '38.206471, -111.165271']\n\n    for parse_group in parse_city_state(locations):\n        print parse_group\n
\n

Output:

\n
$ python location_regexes.py \n['Washington', 'Miami', 'New York', 'Kaslo/Nelson area', 'Plymouth', 'DF - outskirts-']\n['DC', 'FL', 'NY', 'Canada', 'UK/England', 'Mexico']\n[['38.206471', '-111.165271']]\n
\n soup wrap:

Not sure if you're still having problems with this but here's an answer that I believe would work for you:

#location_regexes.py
import re
paren_pattern = re.compile(r"([^(]+, )?([^(]+?),? \(([^)]+)\)")

def parse_city_state(locations_list):
    city_list = []
    state_list = []
    coordinate_pairs = []
    for location in locations_list:
        if '(' in location:
            r = re.match(paren_pattern, location)
            city_list.append(r.group(2))
            state_list.append(r.group(3))
        elif location[0].isdigit() or location[0] == '-':
            coordinate_pairs.append(location.split(', '))
        else:
            city_list.append(location.split(', ', 1)[0])
            state_list.append(location.split(', ', 1)[1])
    return city_list, state_list, coordinate_pairs

#to demonstrate output
if __name__ == "__main__":
    locations = ['Washington, DC', 'Miami, FL', 'New York, NY',
                'Kaslo/Nelson area (Canada), BC', 'Plymouth (UK/England)',
                'Mexico, DF - outskirts-, (Mexico),', '38.206471, -111.165271']

    for parse_group in parse_city_state(locations):
        print parse_group

Output:

$ python location_regexes.py 
['Washington', 'Miami', 'New York', 'Kaslo/Nelson area', 'Plymouth', 'DF - outskirts-']
['DC', 'FL', 'NY', 'Canada', 'UK/England', 'Mexico']
[['38.206471', '-111.165271']]
qid & accept id: (9950474, 9955722) query: Real Hierarchical Builds with SCons? soup:

Im not sure why you would need to make a custom builder, if I understand you correctly, I think everything you need can be done with SCons and its builtin builders.

\n

To do what you explain, you would indeed need 3 Seperate SConsctruct files, to be able to do 3 seperate builds. I would also add 3 SConscript files and make all of them as follows:

\n

Edit: In this example, its better to create the Environment() in the SConstruct scripts

\n

project_root/SConstruct

\n
# This SConstruct orchestrates building 3 subdirs\n\nimport os\n\nsubdirs = ['libfoo_subrepo', 'barapp_subrepo', 'test']\nenv = Environment()\n\nfor subdir in subdirs:\n    SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])\n
\n

libfoo_subrepo/SConstruct

\n
# This SConstruct does nothing more than load the SConscript in this dir\n# The Environment() is created in the SConstruct script\n# This dir can be built standalone by executing scons here, or together\n# by executing scons in the parent directory\nenv = Environment()\nSConscript('SConscript', exports = ['env'])\n
\n

libfoo_subrepo/SConscript

\n
# This SConstruct orchestrates building 2 subdirs\nimport os\n\nImport('env')\nsubdirs = ['src', 'test']\n\nfor subdir in subdirs:\n    SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])\n
\n

barapp_subrepo/SConstruct

\n
# This SConstruct does nothing more than load the SConscript in this dir\n# The Environment() is created in the SConstruct script\n# This dir can be build standalone by executing scons here, or together\n# by executing scons in the parent directory\nenv = Environment()\nSConscript('SConscript', exports = ['env'])\n
\n

barapp_subrepo/SConscript

\n
# This SConstruct orchestrates building 2 subdirs\nimport os\n\nImport('env')\nsubdirs = ['src', 'test']\n\nfor subdir in subdirs:\n    SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])\n
\n

I hope the comments in each file explains its purpose.

\n

Hope this helps.

\n soup wrap:

Im not sure why you would need to make a custom builder, if I understand you correctly, I think everything you need can be done with SCons and its builtin builders.

To do what you explain, you would indeed need 3 Seperate SConsctruct files, to be able to do 3 seperate builds. I would also add 3 SConscript files and make all of them as follows:

Edit: In this example, its better to create the Environment() in the SConstruct scripts

project_root/SConstruct

# This SConstruct orchestrates building 3 subdirs

import os

subdirs = ['libfoo_subrepo', 'barapp_subrepo', 'test']
env = Environment()

for subdir in subdirs:
    SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])

libfoo_subrepo/SConstruct

# This SConstruct does nothing more than load the SConscript in this dir
# The Environment() is created in the SConstruct script
# This dir can be built standalone by executing scons here, or together
# by executing scons in the parent directory
env = Environment()
SConscript('SConscript', exports = ['env'])

libfoo_subrepo/SConscript

# This SConstruct orchestrates building 2 subdirs
import os

Import('env')
subdirs = ['src', 'test']

for subdir in subdirs:
    SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])

barapp_subrepo/SConstruct

# This SConstruct does nothing more than load the SConscript in this dir
# The Environment() is created in the SConstruct script
# This dir can be build standalone by executing scons here, or together
# by executing scons in the parent directory
env = Environment()
SConscript('SConscript', exports = ['env'])

barapp_subrepo/SConscript

# This SConstruct orchestrates building 2 subdirs
import os

Import('env')
subdirs = ['src', 'test']

for subdir in subdirs:
    SConscript(os.path.join(subdir, 'SConscript'), exports = ['env'])

I hope the comments in each file explains its purpose.

Hope this helps.

qid & accept id: (9969684, 9969689) query: How do I add space between two variables after a print in Python soup:

A simple way would be:

\n
print str(count) + '  ' + str(conv)\n
\n

If you need more spaces, simply add them to the string:

\n
print str(count) + '    ' + str(conv)\n
\n

A fancier way, using the new syntax for string formatting:

\n
print '{0}  {1}'.format(count, conv)\n
\n

Or using the old syntax, limiting the number of decimals to two:

\n
print '%d  %.2f' % (count, conv)\n
\n soup wrap:

A simple way would be:

print str(count) + '  ' + str(conv)

If you need more spaces, simply add them to the string:

print str(count) + '    ' + str(conv)

A fancier way, using the new syntax for string formatting:

print '{0}  {1}'.format(count, conv)

Or using the old syntax, limiting the number of decimals to two:

print '%d  %.2f' % (count, conv)
qid & accept id: (10001301, 10001339) query: List Slicing python soup:

You can do this concisely using a list comprehension or generator expression:

\n
>>> myl = ['A','B','C','D','E','F']\n>>> [''.join(myl[i:i+2]) for i in range(0, len(myl), 2)]\n['AB', 'CD', 'EF']\n>>> print '\n'.join(''.join(myl[i:i+2]) for i in range(0, len(myl), 2))\nAB\nCD\nEF\n
\n

You could replace ''.join(myl[i:i+2]) with myl[i] + myl[i+1] for this particular case, but using the ''.join() method is easier for when you want to do groups of three or more.

\n

Or an alternative that comes from the documentation for zip():

\n
>>> map(''.join, zip(*[iter(myl)]*2))\n['AB', 'CD', 'EF']\n
\n soup wrap:

You can do this concisely using a list comprehension or generator expression:

>>> myl = ['A','B','C','D','E','F']
>>> [''.join(myl[i:i+2]) for i in range(0, len(myl), 2)]
['AB', 'CD', 'EF']
>>> print '\n'.join(''.join(myl[i:i+2]) for i in range(0, len(myl), 2))
AB
CD
EF

You could replace ''.join(myl[i:i+2]) with myl[i] + myl[i+1] for this particular case, but using the ''.join() method is easier for when you want to do groups of three or more.

Or an alternative that comes from the documentation for zip():

>>> map(''.join, zip(*[iter(myl)]*2))
['AB', 'CD', 'EF']
qid & accept id: (10014572, 10015086) query: Python - open pdf file to specific page/section soup:

Here are two basic ideas

\n

Case 1: you want to open the file in Python

\n
from pyPdf import PdfFileReader, PageObject\n\npdf_toread = PdfFileReader(path_to_your_pdf)\n\n# 1 is the number of the page\npage_one = pdf_toread.getPage(1)\n\n# This will dump the content (unicode string)\n# According to the doc, the formatting is dependent on the\n# structure of the document\nprint page_one.extractText()\n
\n

As for the section, you can have a look to this answer

\n

Case 2: you want to call acrobat to open your file at a specific page

\n

From this Acrobat help document, you can pass this to a subprocess:

\n
import subprocess\nimport os\n\npath_to_pdf = os.path.abspath('C:\test_file.pdf')\n# I am testing this on my Windows Install machine\npath_to_acrobat = os.path.abspath('C:\Program Files (x86)\Adobe\Reader 10.0\Reader\AcroRd32.exe') \n\n# this will open your document on page 12\nprocess = subprocess.Popen([path_to_acrobat, '/A', 'page=12', path_to_pdf], shell=False, stdout=subprocess.PIPE)\nprocess.wait()\n
\n

Just a suggestion: if you want to open the file at a specific section, you could use the parameter search=wordList where wordlist is a list of words seperated by spaces. The document will be opened and the search will be performed, the first result of it being highlighted. This way, as a wordlist, you can try to put the name of the section.

\n soup wrap:

Here are two basic ideas

Case 1: you want to open the file in Python

from pyPdf import PdfFileReader, PageObject

pdf_toread = PdfFileReader(path_to_your_pdf)

# 1 is the number of the page
page_one = pdf_toread.getPage(1)

# This will dump the content (unicode string)
# According to the doc, the formatting is dependent on the
# structure of the document
print page_one.extractText()

As for the section, you can have a look to this answer

Case 2: you want to call acrobat to open your file at a specific page

From this Acrobat help document, you can pass this to a subprocess:

import subprocess
import os

path_to_pdf = os.path.abspath('C:\test_file.pdf')
# I am testing this on my Windows Install machine
path_to_acrobat = os.path.abspath('C:\Program Files (x86)\Adobe\Reader 10.0\Reader\AcroRd32.exe') 

# this will open your document on page 12
process = subprocess.Popen([path_to_acrobat, '/A', 'page=12', path_to_pdf], shell=False, stdout=subprocess.PIPE)
process.wait()

Just a suggestion: if you want to open the file at a specific section, you could use the parameter search=wordList where wordlist is a list of words seperated by spaces. The document will be opened and the search will be performed, the first result of it being highlighted. This way, as a wordlist, you can try to put the name of the section.

qid & accept id: (10024640, 10041712) query: Automatically Insert file-modification-time after @date command soup:

You could use an input filter that adds the file modification date to the @date command. The following perl file would do the trick:

\n
use File::stat;\n$fn = $ARGV[0];\n$time = localtime stat($fn)->mtime;\nopen F,"<$fn";\nwhile ()\n{\n  s/\@date/\@date $time/;\n  print $_;\n}\nclose F;\n
\n

If you save this as filemod.pl you can make doxygen use the filter for each input file by setting the following in the configuration file:

\n
INPUT_FILTER = "perl filemod.pl"\n
\n soup wrap:

You could use an input filter that adds the file modification date to the @date command. The following perl file would do the trick:

use File::stat;
$fn = $ARGV[0];
$time = localtime stat($fn)->mtime;
open F,"<$fn";
while ()
{
  s/\@date/\@date $time/;
  print $_;
}
close F;

If you save this as filemod.pl you can make doxygen use the filter for each input file by setting the following in the configuration file:

INPUT_FILTER = "perl filemod.pl"
qid & accept id: (10040037, 10040089) query: Dictionary As Table In Django Template soup:

You can make the template code a lot easier to read if you provide the data as a table in your dictionary. It would look more like this:

\n
field = {\n    'headers': [u'Birthday:', u'Education', u'Job', u'Child Sex'],\n    'rows': [[datetime.date(2012, 4, 6), u'A1', u'job1', u'M']\n            ,[datetime.date(2012, 4, 27), u'A2', u'job2', u'F']]\n}\n
\n

You can now iterate over the headers as follows:

\n
\n{% for header in field.headers %}\n    {{ header }}\n{% endfor %}\n\n
\n

And each row can be displayed using:

\n
    \n{% for value in field.rows %}\n    {{ value }}\n{% endfor %}\n\n
\n

Now, you can obtain the 'headers' value using field.keys():

\n
[u'Birthday:', u'Education', u'Job:', u'Child Sex:']\n
\n

You can get the 'values' using the following loop (where 2 is the number of rows):

\n
rows = []\nfor i in xrange(2):\n    row = []\n    for k in field.keys():\n        row.append(field[k][i])\n    rows.append(row)\n
\n

Or as a one-liner:

\n
rows = [[field[k][i] for k in field.keys()] for i in xrange(2)]\n
\n soup wrap:

You can make the template code a lot easier to read if you provide the data as a table in your dictionary. It would look more like this:

field = {
    'headers': [u'Birthday:', u'Education', u'Job', u'Child Sex'],
    'rows': [[datetime.date(2012, 4, 6), u'A1', u'job1', u'M']
            ,[datetime.date(2012, 4, 27), u'A2', u'job2', u'F']]
}

You can now iterate over the headers as follows:


{% for header in field.headers %}
    {{ header }}
{% endfor %}

And each row can be displayed using:

    
{% for value in field.rows %}
    {{ value }}
{% endfor %}

Now, you can obtain the 'headers' value using field.keys():

[u'Birthday:', u'Education', u'Job:', u'Child Sex:']

You can get the 'values' using the following loop (where 2 is the number of rows):

rows = []
for i in xrange(2):
    row = []
    for k in field.keys():
        row.append(field[k][i])
    rows.append(row)

Or as a one-liner:

rows = [[field[k][i] for k in field.keys()] for i in xrange(2)]
qid & accept id: (10048069, 10048168) query: What is the most pythonic way to pop a random element from a list? soup:

What you seem to be up to doesn't look very Pythonic in the first place. You shouldn't remove stuff from the middle of a list, because lists are implemented as arrays in all Python implementations I know of, so this is an O(n) operation.

\n

If you really need this functionality as part of an algorithm, you should check out a data structure like the blist that supports efficient deletion from the middle.

\n

In pure Python, what you can do if you don't need access to the remaining elements is just shuffle the list first and then iterate over it:

\n
lst = [1,2,3]\nrandom.shuffle(lst)\nfor x in lst:\n  # ...\n
\n

If you really need the remainder (which is a bit of a code smell, IMHO), at least you can pop() from the end of the list now (which is fast!):

\n
while lst:\n  x = lst.pop()\n  # do something with the element      \n
\n

In general, you can often express your programs more elegantly if you use a more functional style, instead of mutating state (like you do with the list).

\n soup wrap:

What you seem to be up to doesn't look very Pythonic in the first place. You shouldn't remove stuff from the middle of a list, because lists are implemented as arrays in all Python implementations I know of, so this is an O(n) operation.

If you really need this functionality as part of an algorithm, you should check out a data structure like the blist that supports efficient deletion from the middle.

In pure Python, what you can do if you don't need access to the remaining elements is just shuffle the list first and then iterate over it:

lst = [1,2,3]
random.shuffle(lst)
for x in lst:
  # ...

If you really need the remainder (which is a bit of a code smell, IMHO), at least you can pop() from the end of the list now (which is fast!):

while lst:
  x = lst.pop()
  # do something with the element      

In general, you can often express your programs more elegantly if you use a more functional style, instead of mutating state (like you do with the list).

qid & accept id: (10099326, 10102741) query: how to do an embedded python module for remote sandbox execution? soup:

Other modules can be imported to sandbox (you mean modules that are created dynamically at runtime) by

\n
    sandbox.other_module = __import__('other_module')\n
\n

or:

\n
    exec 'import other_module' in sandbox.__dict__\n
\n

If you call "sandbox" modules from other modules or other sandbox modules and you want to reload some new code later, it is easier to import only a module, not names from it like "from sandbox import f", and call "sandbox.f" not "f". Then is reloading easy. (but naturarely reload command is not useful for it)

\n
\n

Classes

\n
>>> class A(object): pass\n... \n>>> a = A()\n>>> A.f = lambda self, x: 2 * x  # or a pickled function\n>>> a.f(1)\n2\n>>> A.f = lambda self, x: 3 * x\n>>> a.f(1)\n3\n
\n

It seems that reloading methods can be easy. I remember that reloading classes defined in a modified source code can be complicated because the old class code can be held by some instance. The instance's code can/need be updated individually in the worst case:

\n
    some_instance.__class__ = sandbox.SomeClass  # that means the same reloaded class\n
\n

I used the latter with a python service accessed via win32com automation and reloading of classes code was succesful without loss instances data

\n soup wrap:

Other modules can be imported to sandbox (you mean modules that are created dynamically at runtime) by

    sandbox.other_module = __import__('other_module')

or:

    exec 'import other_module' in sandbox.__dict__

If you call "sandbox" modules from other modules or other sandbox modules and you want to reload some new code later, it is easier to import only a module, not names from it like "from sandbox import f", and call "sandbox.f" not "f". Then is reloading easy. (but naturarely reload command is not useful for it)


Classes

>>> class A(object): pass
... 
>>> a = A()
>>> A.f = lambda self, x: 2 * x  # or a pickled function
>>> a.f(1)
2
>>> A.f = lambda self, x: 3 * x
>>> a.f(1)
3

It seems that reloading methods can be easy. I remember that reloading classes defined in a modified source code can be complicated because the old class code can be held by some instance. The instance's code can/need be updated individually in the worst case:

    some_instance.__class__ = sandbox.SomeClass  # that means the same reloaded class

I used the latter with a python service accessed via win32com automation and reloading of classes code was succesful without loss instances data

qid & accept id: (10099710, 10100140) query: How to manually create a select field from a ModelForm in Django? soup:

The ModelChoiceField documentation explains how to do this.

\n

To change the empty label:

\n
empty_label\n\n    By default the  widget used by ModelChoiceField
    will have an empty choice at the top of the list. You can change the text
    of this label (which is "---------" by default) with the empty_label
    attribute, or you can disable the empty label entirely by setting
    empty_label to None:

    # A custom empty label
    field1 = forms.ModelChoiceField(queryset=..., empty_label="(Nothing)")

    # No empty label
    field2 = forms.ModelChoiceField(queryset=..., empty_label=None)

As for your second query, it is also explained the in docs:

The __unicode__ method of the model will be called to generate string
representations of the objects for use in the field's choices;
to provide customized representations, subclass ModelChoiceField and override
label_from_instance. This method will receive a model object, and should return
a string suitable for representing it. For example:

class MyModelChoiceField(ModelChoiceField):
    def label_from_instance(self, obj):
        return "My Object #%i" % obj.id

Finally, to pass some custom ajax, use the attrs argument for the select widget (which is what is used in the ModelForm field).

In the end, you should have something like this:

creator = MyCustomField(queryset=...,
                        empty_label="Please select",
                        widget=forms.Select(attrs={'onchange':'some_ajax_function()'})
qid & accept id: (10108070, 10108106) query: reinterpret signed long as unsigned in Python soup:

How about

\n
if x < 0:\n   x += 2 ** 64\n
\n

or, if you prefer bit twiddling,

\n
x &= 2 ** 64 - 1\n
\n soup wrap:

How about

if x < 0:
   x += 2 ** 64

or, if you prefer bit twiddling,

x &= 2 ** 64 - 1
qid & accept id: (10108368, 10114328) query: Detecting geographic clusters soup:

I was able to combine Joran's answer along with Dan H's comment. This is an example ouput:\ncluster map

\n

The python code emits functions for R: map() and rect(). This USA example map was created with:

\n
map('state', plot = TRUE, fill = FALSE, col = palette())\n
\n

and then you can apply the rect()'s accordingly from with in the R GUI interpreter (see below).

\n
import math\nfrom collections import defaultdict\n\nto_rad = math.pi / 180.0   # convert lat or lng to radians\nfname = "site.tsv"        # file format: LAT\tLONG\nthreshhold_dist=50         # adjust to your needs\nthreshhold_locations=15    # minimum # of locations needed in a cluster\n\ndef dist(lat1,lng1,lat2,lng2):\n    global to_rad\n    earth_radius_km = 6371\n\n    dLat = (lat2-lat1) * to_rad\n    dLon = (lng2-lng1) * to_rad\n    lat1_rad = lat1 * to_rad\n    lat2_rad = lat2 * to_rad\n\n    a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1_rad) * math.cos(lat2_rad)\n    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a)); \n    dist = earth_radius_km * c\n    return dist\n\ndef bounding_box(src, neighbors):\n    neighbors.append(src)\n    # nw = NorthWest se=SouthEast\n    nw_lat = -360\n    nw_lng = 360\n    se_lat = 360\n    se_lng = -360\n\n    for (y,x) in neighbors:\n        if y > nw_lat: nw_lat = y\n        if x > se_lng: se_lng = x\n\n        if y < se_lat: se_lat = y\n        if x < nw_lng: nw_lng = x\n\n    # add some padding\n    pad = 0.5\n    nw_lat += pad\n    nw_lng -= pad\n    se_lat -= pad\n    se_lng += pad\n\n    # sutiable for r's map() function\n    return (se_lat,nw_lat,nw_lng,se_lng)\n\ndef sitesDist(site1,site2): \n    #just a helper to shorted list comprehension below \n    return dist(site1[0],site1[1], site2[0], site2[1])\n\ndef load_site_data():\n    global fname\n    sites = defaultdict(tuple)\n\n    data = open(fname,encoding="latin-1")\n    data.readline() # skip header\n    for line in data:\n        line = line[:-1]\n        slots = line.split("\t")\n        lat = float(slots[0])\n        lng = float(slots[1])\n        lat_rad = lat * math.pi / 180.0\n        lng_rad = lng * math.pi / 180.0\n        sites[(lat,lng)] = (lat,lng) #(lat_rad,lng_rad)\n    return sites\n\ndef main():\n    sites_dict = {}\n    sites = load_site_data()\n    for site in sites: \n        #for each site put it in a dictionary with its value being an array of neighbors \n        sites_dict[site] = [x for x in sites if x != site and sitesDist(site,x) < threshhold_dist] \n\n    results = {}\n    for site in sites: \n        j = len(sites_dict[site])\n        if j >= threshhold_locations:\n            coord = bounding_box( site, sites_dict[site] )\n            results[coord] = coord\n\n    for bbox in results:\n        yx="ylim=c(%s,%s), xlim=c(%s,%s)" % (results[bbox]) #(se_lat,nw_lat,nw_lng,se_lng)\n        print('map("county", plot=T, fill=T, col=palette(), %s)' % yx)\n        rect='rect(%s,%s, %s,%s, col=c("red"))' % (results[bbox][2], results[bbox][0], results[bbox][3], results[bbox][2])\n        print(rect)\n        print("")\n\nmain()\n
\n

Here is an example TSV file (site.tsv)

\n
LAT     LONG\n36.3312 -94.1334\n36.6828 -121.791\n37.2307 -121.96\n37.3857 -122.026\n37.3857 -122.026\n37.3857 -122.026\n37.3895 -97.644\n37.3992 -122.139\n37.3992 -122.139\n37.402  -122.078\n37.402  -122.078\n37.402  -122.078\n37.402  -122.078\n37.402  -122.078\n37.48   -122.144\n37.48   -122.144\n37.55   126.967\n
\n

With my data set, the output of my python script, shown on the USA map. I changed the colors for clarity.

\n
rect(-74.989,39.7667, -73.0419,41.5209, col=c("red"))\nrect(-123.005,36.8144, -121.392,38.3672, col=c("green"))\nrect(-78.2422,38.2474, -76.3,39.9282, col=c("blue"))\n
\n
\n

Addition on 2013-05-01 for Yacob

\n
\n

These 2 lines give you the over all goal...

\n
map("county", plot=T )\nrect(-122.644,36.7307, -121.46,37.98, col=c("red"))\n
\n

If you want to narrow in on a portion of a map, you can use ylim and xlim

\n
map("county", plot=T, ylim=c(36.7307,37.98), xlim=c(-122.644,-121.46))\n# or for more coloring, but choose one or the other map("country") commands\nmap("county", plot=T, fill=T, col=palette(), ylim=c(36.7307,37.98), xlim=c(-122.644,-121.46))\nrect(-122.644,36.7307, -121.46,37.98, col=c("red"))\n
\n

You will want to use the 'world' map...

\n
map("world", plot=T )\n
\n

It has been a long time since I have used this python code I have posted below so I will try my best to help you.

\n
threshhold_dist is the size of the bounding box, ie: the geographical area\ntheshhold_location is the number of lat/lng points needed with in\n    the bounding box in order for it to be considered a cluster.\n
\n

Here is a complete example. The TSV file is located on pastebin.com. I have also included an image generated from R that contains the output of all of the rect() commands.

\n
# pyclusters.py\n# May-02-2013\n# -John Taylor\n\n# latlng.tsv is located at http://pastebin.com/cyvEdx3V\n# use the "RAW Paste Data" to preserve the tab characters\n\nimport math\nfrom collections import defaultdict\n\n# See also: http://www.geomidpoint.com/example.html\n# See also: http://www.movable-type.co.uk/scripts/latlong.html\n\nto_rad = math.pi / 180.0  # convert lat or lng to radians\nfname = "latlng.tsv"      # file format: LAT\tLONG\nthreshhold_dist=20        # adjust to your needs\nthreshhold_locations=20   # minimum # of locations needed in a cluster\nearth_radius_km = 6371\n\ndef coord2cart(lat,lng):\n    x = math.cos(lat) * math.cos(lng)\n    y = math.cos(lat) * math.sin(lng)\n    z = math.sin(lat)\n    return (x,y,z)\n\ndef cart2corrd(x,y,z):\n    lon = math.atan2(y,x)\n    hyp = math.sqrt(x*x + y*y)\n    lat = math.atan2(z,hyp)\n    return(lat,lng)\n\ndef dist(lat1,lng1,lat2,lng2):\n    global to_rad, earth_radius_km\n\n    dLat = (lat2-lat1) * to_rad\n    dLon = (lng2-lng1) * to_rad\n    lat1_rad = lat1 * to_rad\n    lat2_rad = lat2 * to_rad\n\n    a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1_rad) * math.cos(lat2_rad)\n    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a)); \n    dist = earth_radius_km * c\n    return dist\n\ndef bounding_box(src, neighbors):\n    neighbors.append(src)\n    # nw = NorthWest se=SouthEast\n    nw_lat = -360\n    nw_lng = 360\n    se_lat = 360\n    se_lng = -360\n\n    for (y,x) in neighbors:\n        if y > nw_lat: nw_lat = y\n        if x > se_lng: se_lng = x\n\n        if y < se_lat: se_lat = y\n        if x < nw_lng: nw_lng = x\n\n    # add some padding\n    pad = 0.5\n    nw_lat += pad\n    nw_lng -= pad\n    se_lat -= pad\n    se_lng += pad\n\n    #print("answer:")\n    #print("nw lat,lng : %s %s" % (nw_lat,nw_lng))\n    #print("se lat,lng : %s %s" % (se_lat,se_lng))\n\n    # sutiable for r's map() function\n    return (se_lat,nw_lat,nw_lng,se_lng)\n\ndef sitesDist(site1,site2): \n    # just a helper to shorted list comprehensioin below \n    return dist(site1[0],site1[1], site2[0], site2[1])\n\ndef load_site_data():\n    global fname\n    sites = defaultdict(tuple)\n\n    data = open(fname,encoding="latin-1")\n    data.readline() # skip header\n    for line in data:\n        line = line[:-1]\n        slots = line.split("\t")\n        lat = float(slots[0])\n        lng = float(slots[1])\n        lat_rad = lat * math.pi / 180.0\n        lng_rad = lng * math.pi / 180.0\n        sites[(lat,lng)] = (lat,lng) #(lat_rad,lng_rad)\n    return sites\n\ndef main():\n    color_list = ( "red", "blue", "green", "yellow", "orange", "brown", "pink", "purple" )\n    color_idx = 0\n    sites_dict = {}\n    sites = load_site_data()\n    for site in sites: \n        #for each site put it in a dictionarry with its value being an array of neighbors \n        sites_dict[site] = [x for x in sites if x != site and sitesDist(site,x) < threshhold_dist] \n\n    print("")\n    print('map("state", plot=T)') # or use: county instead of state\n    print("")\n\n\n    results = {}\n    for site in sites: \n        j = len(sites_dict[site])\n        if j >= threshhold_locations:\n            coord = bounding_box( site, sites_dict[site] )\n            results[coord] = coord\n\n    for bbox in results:\n        yx="ylim=c(%s,%s), xlim=c(%s,%s)" % (results[bbox]) #(se_lat,nw_lat,nw_lng,se_lng)\n\n        # important!\n        # if you want an individual map for each cluster, uncomment this line\n        #print('map("county", plot=T, fill=T, col=palette(), %s)' % yx)\n        if len(color_list) == color_idx:\n            color_idx = 0\n        rect='rect(%s,%s, %s,%s, col=c("%s"))' % (results[bbox][2], results[bbox][0], results[bbox][3], results[bbox][1], color_list[color_idx])\n        color_idx += 1\n        print(rect)\n    print("")\n\n\nmain()\n
\n

pyclusters.py / R image result

\n soup wrap:

I was able to combine Joran's answer along with Dan H's comment. This is an example ouput: cluster map

The python code emits functions for R: map() and rect(). This USA example map was created with:

map('state', plot = TRUE, fill = FALSE, col = palette())

and then you can apply the rect()'s accordingly from with in the R GUI interpreter (see below).

import math
from collections import defaultdict

to_rad = math.pi / 180.0   # convert lat or lng to radians
fname = "site.tsv"        # file format: LAT\tLONG
threshhold_dist=50         # adjust to your needs
threshhold_locations=15    # minimum # of locations needed in a cluster

def dist(lat1,lng1,lat2,lng2):
    global to_rad
    earth_radius_km = 6371

    dLat = (lat2-lat1) * to_rad
    dLon = (lng2-lng1) * to_rad
    lat1_rad = lat1 * to_rad
    lat2_rad = lat2 * to_rad

    a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1_rad) * math.cos(lat2_rad)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a)); 
    dist = earth_radius_km * c
    return dist

def bounding_box(src, neighbors):
    neighbors.append(src)
    # nw = NorthWest se=SouthEast
    nw_lat = -360
    nw_lng = 360
    se_lat = 360
    se_lng = -360

    for (y,x) in neighbors:
        if y > nw_lat: nw_lat = y
        if x > se_lng: se_lng = x

        if y < se_lat: se_lat = y
        if x < nw_lng: nw_lng = x

    # add some padding
    pad = 0.5
    nw_lat += pad
    nw_lng -= pad
    se_lat -= pad
    se_lng += pad

    # sutiable for r's map() function
    return (se_lat,nw_lat,nw_lng,se_lng)

def sitesDist(site1,site2): 
    #just a helper to shorted list comprehension below 
    return dist(site1[0],site1[1], site2[0], site2[1])

def load_site_data():
    global fname
    sites = defaultdict(tuple)

    data = open(fname,encoding="latin-1")
    data.readline() # skip header
    for line in data:
        line = line[:-1]
        slots = line.split("\t")
        lat = float(slots[0])
        lng = float(slots[1])
        lat_rad = lat * math.pi / 180.0
        lng_rad = lng * math.pi / 180.0
        sites[(lat,lng)] = (lat,lng) #(lat_rad,lng_rad)
    return sites

def main():
    sites_dict = {}
    sites = load_site_data()
    for site in sites: 
        #for each site put it in a dictionary with its value being an array of neighbors 
        sites_dict[site] = [x for x in sites if x != site and sitesDist(site,x) < threshhold_dist] 

    results = {}
    for site in sites: 
        j = len(sites_dict[site])
        if j >= threshhold_locations:
            coord = bounding_box( site, sites_dict[site] )
            results[coord] = coord

    for bbox in results:
        yx="ylim=c(%s,%s), xlim=c(%s,%s)" % (results[bbox]) #(se_lat,nw_lat,nw_lng,se_lng)
        print('map("county", plot=T, fill=T, col=palette(), %s)' % yx)
        rect='rect(%s,%s, %s,%s, col=c("red"))' % (results[bbox][2], results[bbox][0], results[bbox][3], results[bbox][2])
        print(rect)
        print("")

main()

Here is an example TSV file (site.tsv)

LAT     LONG
36.3312 -94.1334
36.6828 -121.791
37.2307 -121.96
37.3857 -122.026
37.3857 -122.026
37.3857 -122.026
37.3895 -97.644
37.3992 -122.139
37.3992 -122.139
37.402  -122.078
37.402  -122.078
37.402  -122.078
37.402  -122.078
37.402  -122.078
37.48   -122.144
37.48   -122.144
37.55   126.967

With my data set, the output of my python script, shown on the USA map. I changed the colors for clarity.

rect(-74.989,39.7667, -73.0419,41.5209, col=c("red"))
rect(-123.005,36.8144, -121.392,38.3672, col=c("green"))
rect(-78.2422,38.2474, -76.3,39.9282, col=c("blue"))

Addition on 2013-05-01 for Yacob


These 2 lines give you the over all goal...

map("county", plot=T )
rect(-122.644,36.7307, -121.46,37.98, col=c("red"))

If you want to narrow in on a portion of a map, you can use ylim and xlim

map("county", plot=T, ylim=c(36.7307,37.98), xlim=c(-122.644,-121.46))
# or for more coloring, but choose one or the other map("country") commands
map("county", plot=T, fill=T, col=palette(), ylim=c(36.7307,37.98), xlim=c(-122.644,-121.46))
rect(-122.644,36.7307, -121.46,37.98, col=c("red"))

You will want to use the 'world' map...

map("world", plot=T )

It has been a long time since I have used this python code I have posted below so I will try my best to help you.

threshhold_dist is the size of the bounding box, ie: the geographical area
theshhold_location is the number of lat/lng points needed with in
    the bounding box in order for it to be considered a cluster.

Here is a complete example. The TSV file is located on pastebin.com. I have also included an image generated from R that contains the output of all of the rect() commands.

# pyclusters.py
# May-02-2013
# -John Taylor

# latlng.tsv is located at http://pastebin.com/cyvEdx3V
# use the "RAW Paste Data" to preserve the tab characters

import math
from collections import defaultdict

# See also: http://www.geomidpoint.com/example.html
# See also: http://www.movable-type.co.uk/scripts/latlong.html

to_rad = math.pi / 180.0  # convert lat or lng to radians
fname = "latlng.tsv"      # file format: LAT\tLONG
threshhold_dist=20        # adjust to your needs
threshhold_locations=20   # minimum # of locations needed in a cluster
earth_radius_km = 6371

def coord2cart(lat,lng):
    x = math.cos(lat) * math.cos(lng)
    y = math.cos(lat) * math.sin(lng)
    z = math.sin(lat)
    return (x,y,z)

def cart2corrd(x,y,z):
    lon = math.atan2(y,x)
    hyp = math.sqrt(x*x + y*y)
    lat = math.atan2(z,hyp)
    return(lat,lng)

def dist(lat1,lng1,lat2,lng2):
    global to_rad, earth_radius_km

    dLat = (lat2-lat1) * to_rad
    dLon = (lng2-lng1) * to_rad
    lat1_rad = lat1 * to_rad
    lat2_rad = lat2 * to_rad

    a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1_rad) * math.cos(lat2_rad)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a)); 
    dist = earth_radius_km * c
    return dist

def bounding_box(src, neighbors):
    neighbors.append(src)
    # nw = NorthWest se=SouthEast
    nw_lat = -360
    nw_lng = 360
    se_lat = 360
    se_lng = -360

    for (y,x) in neighbors:
        if y > nw_lat: nw_lat = y
        if x > se_lng: se_lng = x

        if y < se_lat: se_lat = y
        if x < nw_lng: nw_lng = x

    # add some padding
    pad = 0.5
    nw_lat += pad
    nw_lng -= pad
    se_lat -= pad
    se_lng += pad

    #print("answer:")
    #print("nw lat,lng : %s %s" % (nw_lat,nw_lng))
    #print("se lat,lng : %s %s" % (se_lat,se_lng))

    # sutiable for r's map() function
    return (se_lat,nw_lat,nw_lng,se_lng)

def sitesDist(site1,site2): 
    # just a helper to shorted list comprehensioin below 
    return dist(site1[0],site1[1], site2[0], site2[1])

def load_site_data():
    global fname
    sites = defaultdict(tuple)

    data = open(fname,encoding="latin-1")
    data.readline() # skip header
    for line in data:
        line = line[:-1]
        slots = line.split("\t")
        lat = float(slots[0])
        lng = float(slots[1])
        lat_rad = lat * math.pi / 180.0
        lng_rad = lng * math.pi / 180.0
        sites[(lat,lng)] = (lat,lng) #(lat_rad,lng_rad)
    return sites

def main():
    color_list = ( "red", "blue", "green", "yellow", "orange", "brown", "pink", "purple" )
    color_idx = 0
    sites_dict = {}
    sites = load_site_data()
    for site in sites: 
        #for each site put it in a dictionarry with its value being an array of neighbors 
        sites_dict[site] = [x for x in sites if x != site and sitesDist(site,x) < threshhold_dist] 

    print("")
    print('map("state", plot=T)') # or use: county instead of state
    print("")


    results = {}
    for site in sites: 
        j = len(sites_dict[site])
        if j >= threshhold_locations:
            coord = bounding_box( site, sites_dict[site] )
            results[coord] = coord

    for bbox in results:
        yx="ylim=c(%s,%s), xlim=c(%s,%s)" % (results[bbox]) #(se_lat,nw_lat,nw_lng,se_lng)

        # important!
        # if you want an individual map for each cluster, uncomment this line
        #print('map("county", plot=T, fill=T, col=palette(), %s)' % yx)
        if len(color_list) == color_idx:
            color_idx = 0
        rect='rect(%s,%s, %s,%s, col=c("%s"))' % (results[bbox][2], results[bbox][0], results[bbox][3], results[bbox][1], color_list[color_idx])
        color_idx += 1
        print(rect)
    print("")


main()

pyclusters.py / R image result

qid & accept id: (10112614, 10112665) query: How do I create a multiline Python string with inline variables? soup:

The common way is the format() function:

\n
>>> s = "This is an {example} with {vars}".format(vars="variables", example="example")\n>>> s\n'This is an example with variables'\n
\n

You can also pass a dictionary with variables:

\n
>>> d = { 'vars': "variables", 'example': "example" }\n>>> s = "This is an {example} with {vars}"\n>>> s.format(**d)\n'This is an example with variables'\n
\n

The closest thing to what you asked (in terms of syntax) are template strings. For example:

\n
>>> from string import Template\n>>> t = Template("This is an $example with $vars")\n>>> t.substitute({ 'example': "example", 'vars': "variables"})\n'This is an example with variables'\n
\n

I should add though that the format() function is more common because it's readily available and it does not require an import line.

\n soup wrap:

The common way is the format() function:

>>> s = "This is an {example} with {vars}".format(vars="variables", example="example")
>>> s
'This is an example with variables'

You can also pass a dictionary with variables:

>>> d = { 'vars': "variables", 'example': "example" }
>>> s = "This is an {example} with {vars}"
>>> s.format(**d)
'This is an example with variables'

The closest thing to what you asked (in terms of syntax) are template strings. For example:

>>> from string import Template
>>> t = Template("This is an $example with $vars")
>>> t.substitute({ 'example': "example", 'vars': "variables"})
'This is an example with variables'

I should add though that the format() function is more common because it's readily available and it does not require an import line.

qid & accept id: (10126668, 12700121) query: Can I override a C++ virtual function within Python with Cython? soup:

Excellent !

\n

Not complete but sufficient.\nI've been able to do the trick for my own purpose. Combining this post with the sources linked above.\nIt's not been easy, since I'm a beginner at Cython, but I confirm that it is the only way I could find over the www.

\n

Thanks a lot to you guys.

\n

I am sorry that I don't have so much time go into textual details, but here are my files (might help to get an additional point of view on how to put all of this together)

\n

setup.py :

\n
from distutils.core import setup\nfrom distutils.extension import Extension\nfrom Cython.Distutils import build_ext\n\nsetup(\n    cmdclass = {'build_ext': build_ext},\n    ext_modules = [\n    Extension("elps", \n              sources=["elps.pyx", "src/ITestClass.cpp"],\n              libraries=["elp"],\n              language="c++",\n              )\n    ]\n)\n
\n

TestClass :

\n
#ifndef TESTCLASS_H_\n#define TESTCLASS_H_\n\n\nnamespace elps {\n\nclass TestClass {\n\npublic:\n    TestClass(){};\n    virtual ~TestClass(){};\n\n    int getA() { return this->a; };\n    virtual int override_me() { return 2; };\n    int calculate(int a) { return a * this->override_me(); }\n\nprivate:\n    int a;\n\n};\n\n} /* namespace elps */\n#endif /* TESTCLASS_H_ */\n
\n

ITestClass.h :

\n
#ifndef ITESTCLASS_H_\n#define ITESTCLASS_H_\n\n// Created by Cython when providing 'public api' keywords\n#include "../elps_api.h"\n\n#include "../../inc/TestClass.h"\n\nnamespace elps {\n\nclass ITestClass : public TestClass {\npublic:\n    PyObject *m_obj;\n\n    ITestClass(PyObject *obj);\n    virtual ~ITestClass();\n    virtual int override_me();\n};\n\n} /* namespace elps */\n#endif /* ITESTCLASS_H_ */\n
\n

ITestClass.cpp :

\n
#include "ITestClass.h"\n\nnamespace elps {\n\nITestClass::ITestClass(PyObject *obj): m_obj(obj) {\n    // Provided by "elps_api.h"\n    if (import_elps()) {\n    } else {\n        Py_XINCREF(this->m_obj);\n    }\n}\n\nITestClass::~ITestClass() {\n    Py_XDECREF(this->m_obj);\n}\n\nint ITestClass::override_me()\n{\n    if (this->m_obj) {\n        int error;\n        // Call a virtual overload, if it exists\n        int result = cy_call_func(this->m_obj, (char*)"override_me", &error);\n        if (error)\n            // Call parent method\n            result = TestClass::override_me();\n        return result;\n    }\n    // Throw error ?\n    return 0;\n}\n\n} /* namespace elps */\n
\n

EDIT2 : A note about PURE virtual methods (it appears to be a quite recurrent concern). As shown in the above code, in that particular fashion, "TestClass::override_me()" CANNOT be pure since it has to be callable in case the method is not overridden in the Python's extended class (aka : one doesn't fall in the "error"/"override not found" part of the "ITestClass::override_me()" body).

\n

Extension : elps.pyx :

\n
cimport cpython.ref as cpy_ref\n\ncdef extern from "src/ITestClass.h" namespace "elps" :\n    cdef cppclass ITestClass:\n        ITestClass(cpy_ref.PyObject *obj)\n        int getA()\n        int override_me()\n        int calculate(int a)\n\ncdef class PyTestClass:\n    cdef ITestClass* thisptr\n\n    def __cinit__(self):\n       ##print "in TestClass: allocating thisptr"\n       self.thisptr = new ITestClass(self)\n    def __dealloc__(self):\n       if self.thisptr:\n           ##print "in TestClass: deallocating thisptr"\n           del self.thisptr\n\n    def getA(self):\n       return self.thisptr.getA()\n\n#    def override_me(self):\n#        return self.thisptr.override_me()\n\n    cpdef int calculate(self, int a):\n        return self.thisptr.calculate(a) ;\n\n\ncdef public api int cy_call_func(object self, char* method, int *error):\n    try:\n        func = getattr(self, method);\n    except AttributeError:\n        error[0] = 1\n    else:\n        error[0] = 0\n        return func()\n
\n

Finally, the python calls :

\n
from elps import PyTestClass as TC;\n\na = TC(); \nprint a.calculate(1);\n\nclass B(TC):\n#   pass\n    def override_me(self):\n        return 5\n\nb = B()\nprint b.calculate(1)\n
\n

This should make the previous linked work hopefully more straight to the point we're discussing here...

\n

EDIT : On the other hand the above code could be optimized by using 'hasattr' instead of try/catch block :

\n
cdef public api int cy_call_func_int_fast(object self, char* method, bint *error):\n    if (hasattr(self, method)):\n        error[0] = 0\n        return getattr(self, method)();\n    else:\n        error[0] = 1\n
\n

The above code, of course, makes a difference only in the case where we don't override the 'override_me' method.

\n soup wrap:

Excellent !

Not complete but sufficient. I've been able to do the trick for my own purpose. Combining this post with the sources linked above. It's not been easy, since I'm a beginner at Cython, but I confirm that it is the only way I could find over the www.

Thanks a lot to you guys.

I am sorry that I don't have so much time go into textual details, but here are my files (might help to get an additional point of view on how to put all of this together)

setup.py :

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
    cmdclass = {'build_ext': build_ext},
    ext_modules = [
    Extension("elps", 
              sources=["elps.pyx", "src/ITestClass.cpp"],
              libraries=["elp"],
              language="c++",
              )
    ]
)

TestClass :

#ifndef TESTCLASS_H_
#define TESTCLASS_H_


namespace elps {

class TestClass {

public:
    TestClass(){};
    virtual ~TestClass(){};

    int getA() { return this->a; };
    virtual int override_me() { return 2; };
    int calculate(int a) { return a * this->override_me(); }

private:
    int a;

};

} /* namespace elps */
#endif /* TESTCLASS_H_ */

ITestClass.h :

#ifndef ITESTCLASS_H_
#define ITESTCLASS_H_

// Created by Cython when providing 'public api' keywords
#include "../elps_api.h"

#include "../../inc/TestClass.h"

namespace elps {

class ITestClass : public TestClass {
public:
    PyObject *m_obj;

    ITestClass(PyObject *obj);
    virtual ~ITestClass();
    virtual int override_me();
};

} /* namespace elps */
#endif /* ITESTCLASS_H_ */

ITestClass.cpp :

#include "ITestClass.h"

namespace elps {

ITestClass::ITestClass(PyObject *obj): m_obj(obj) {
    // Provided by "elps_api.h"
    if (import_elps()) {
    } else {
        Py_XINCREF(this->m_obj);
    }
}

ITestClass::~ITestClass() {
    Py_XDECREF(this->m_obj);
}

int ITestClass::override_me()
{
    if (this->m_obj) {
        int error;
        // Call a virtual overload, if it exists
        int result = cy_call_func(this->m_obj, (char*)"override_me", &error);
        if (error)
            // Call parent method
            result = TestClass::override_me();
        return result;
    }
    // Throw error ?
    return 0;
}

} /* namespace elps */

EDIT2 : A note about PURE virtual methods (it appears to be a quite recurrent concern). As shown in the above code, in that particular fashion, "TestClass::override_me()" CANNOT be pure since it has to be callable in case the method is not overridden in the Python's extended class (aka : one doesn't fall in the "error"/"override not found" part of the "ITestClass::override_me()" body).

Extension : elps.pyx :

cimport cpython.ref as cpy_ref

cdef extern from "src/ITestClass.h" namespace "elps" :
    cdef cppclass ITestClass:
        ITestClass(cpy_ref.PyObject *obj)
        int getA()
        int override_me()
        int calculate(int a)

cdef class PyTestClass:
    cdef ITestClass* thisptr

    def __cinit__(self):
       ##print "in TestClass: allocating thisptr"
       self.thisptr = new ITestClass(self)
    def __dealloc__(self):
       if self.thisptr:
           ##print "in TestClass: deallocating thisptr"
           del self.thisptr

    def getA(self):
       return self.thisptr.getA()

#    def override_me(self):
#        return self.thisptr.override_me()

    cpdef int calculate(self, int a):
        return self.thisptr.calculate(a) ;


cdef public api int cy_call_func(object self, char* method, int *error):
    try:
        func = getattr(self, method);
    except AttributeError:
        error[0] = 1
    else:
        error[0] = 0
        return func()

Finally, the python calls :

from elps import PyTestClass as TC;

a = TC(); 
print a.calculate(1);

class B(TC):
#   pass
    def override_me(self):
        return 5

b = B()
print b.calculate(1)

This should make the previous linked work hopefully more straight to the point we're discussing here...

EDIT : On the other hand the above code could be optimized by using 'hasattr' instead of try/catch block :

cdef public api int cy_call_func_int_fast(object self, char* method, bint *error):
    if (hasattr(self, method)):
        error[0] = 0
        return getattr(self, method)();
    else:
        error[0] = 1

The above code, of course, makes a difference only in the case where we don't override the 'override_me' method.

qid & accept id: (10127973, 10128317) query: Extracting text from webpage, processing with Perl/Python, then rebuilding the page with links added soup:

I do know that Python has a module for opening webpages, called urllib:

\n
import urllib\nurl = 'https://www.google.com/'\npage = urllib.urlopen(url)\nprint page.read()    \n#page.read is the url's source code, so you would print the source  code here. \n
\n

you could also save a new html file with python like this:

\n
page = page.read()\nfile = open('url.html', 'w')\nfile.writelines(page)\nfile.close()\n
\n

In between you could modify the html source. Keep in mind that the webpages will look silly if you don't figure out how to save the files the pages are using. Hope this helps.

\n soup wrap:

I do know that Python has a module for opening webpages, called urllib:

import urllib
url = 'https://www.google.com/'
page = urllib.urlopen(url)
print page.read()    
#page.read is the url's source code, so you would print the source  code here. 

you could also save a new html file with python like this:

page = page.read()
file = open('url.html', 'w')
file.writelines(page)
file.close()

In between you could modify the html source. Keep in mind that the webpages will look silly if you don't figure out how to save the files the pages are using. Hope this helps.

qid & accept id: (10154289, 10154518) query: Use BeautifulSoup to extract text before the first child tag soup:

I'm fairly sure the following should do what you want

\n
parsed.find('a').previousSibling # or something like that\n
\n

That would return a NavigableString instance which is pretty much the same\nthing as a unicode instance, but you may call unicode on that to get a\nunicode object.

\n

I'll see if I can test this out and let you know.

\n

EDIT: I just confirmed that it works:

\n
>>> from BeautifulSoup import BeautifulSoup\n>>> soup = BeautifulSoup('
Category: a link
')\n>>> soup.find('a')\na link\n>>> soup.find('a').previousSibling\nu'Category: '\n>>> \n
\n soup wrap:

I'm fairly sure the following should do what you want

parsed.find('a').previousSibling # or something like that

That would return a NavigableString instance which is pretty much the same thing as a unicode instance, but you may call unicode on that to get a unicode object.

I'll see if I can test this out and let you know.

EDIT: I just confirmed that it works:

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('
Category: a link
') >>> soup.find('a') a link >>> soup.find('a').previousSibling u'Category: ' >>>
qid & accept id: (10156909, 10156959) query: How do I get the number of posts on each day with annotation in Django? soup:

You are almost there. You need two additional clauses:

\n
day_counts = Post.objects.filter(author=someuser).values('posted_day').annotate(\n                                       dailycount=Count('posted_day')).order_by()\n
\n

The values('posted_day') enables the grouping, and the empty order_by ensures the results are ordered by posted_day so the default ordering doesn't interfere.

\n

The clearest documentation of this seems to be in the Django Aggregation docs section on the Order of annotate() and values() clauses.

\n

values returns a list of dicts like:

\n
[{'posted-day': 'the-first-day', 'dailycount': 2}, . . . ,\n {'posted-day': 'the-last-day', 'dailycount': 3}]\n
\n

so if you wanted the last day the user posted, it would be the last item in the list:

\n
last_day_dict = day_counts[-1]\ndate = last_day_dict['posted_day']\ncount = last_day_dict['dailycount']\n
\n

You can then compare date to today() to see if they match, and if they don't the user didn't post today, and if he did, he posted count times.

\n soup wrap:

You are almost there. You need two additional clauses:

day_counts = Post.objects.filter(author=someuser).values('posted_day').annotate(
                                       dailycount=Count('posted_day')).order_by()

The values('posted_day') enables the grouping, and the empty order_by ensures the results are ordered by posted_day so the default ordering doesn't interfere.

The clearest documentation of this seems to be in the Django Aggregation docs section on the Order of annotate() and values() clauses.

values returns a list of dicts like:

[{'posted-day': 'the-first-day', 'dailycount': 2}, . . . ,
 {'posted-day': 'the-last-day', 'dailycount': 3}]

so if you wanted the last day the user posted, it would be the last item in the list:

last_day_dict = day_counts[-1]
date = last_day_dict['posted_day']
count = last_day_dict['dailycount']

You can then compare date to today() to see if they match, and if they don't the user didn't post today, and if he did, he posted count times.

qid & accept id: (10211546, 10213106) query: Mock only a subset of all calls to a method soup:

How about this:

\n
fubar = Fubar()\nmyMethod = fubar.myMethod # note instance fubar, resulting in a bound method.\nfubar.myMethod = lambda self, calls = [myMethod, (lambda: 'MyMock'), myMethod]: calls.pop()()\n
\n

This assumes that myMethod takes no arguments. You can either pass arguments in the body of the lambda, or partially-apply myMethod, as necessary.

\n

If you want to select programmatically the calls to mock, you could use a generator instead of a list, and next rather than pop:

\n
fubar.myMethod = lambda self, calls = iter([myMethod, (lambda: 'MyMock'), myMethod]): next(calls)()\n
\n soup wrap:

How about this:

fubar = Fubar()
myMethod = fubar.myMethod # note instance fubar, resulting in a bound method.
fubar.myMethod = lambda self, calls = [myMethod, (lambda: 'MyMock'), myMethod]: calls.pop()()

This assumes that myMethod takes no arguments. You can either pass arguments in the body of the lambda, or partially-apply myMethod, as necessary.

If you want to select programmatically the calls to mock, you could use a generator instead of a list, and next rather than pop:

fubar.myMethod = lambda self, calls = iter([myMethod, (lambda: 'MyMock'), myMethod]): next(calls)()
qid & accept id: (10255972, 10256078) query: Parsing text in BS4 soup:

I would use get_text() instead of find_all()

\n
price_str = price.get_text() # $17.95\n
\n

Then you can use lstrip to get rid of the dollar sign

\n
price_str = price_str.lstrip('$') # 17.95\n
\n

And you're done!

\n soup wrap:

I would use get_text() instead of find_all()

price_str = price.get_text() # $17.95

Then you can use lstrip to get rid of the dollar sign

price_str = price_str.lstrip('$') # 17.95

And you're done!

qid & accept id: (10263217, 10268315) query: Drawing window border in Python xlib soup:

Looks like this was complete PEBKAC. I've found an answer. Basically, I was doing this:

\n
def set_active_border(self, window):\n    border_color = self.colormap.alloc_named_color(\\n        "#ff00ff").pixel\n    window.configure(border_width = 2)\n    window.change_attributes(None,border_pixel=border_color,\n         border_width = 2)\n    self.dpy.sync()\n
\n

Apparently this was confusing X enough that it was doing nothing. The solution that I've stumbled upon was to remove the border_width portion from the window.change_attributes() call, like so:

\n
def set_active_border(self, window):\n    border_color = self.colormap.alloc_named_color(\\n        "#ff00ff").pixel\n    window.configure(border_width = 2)\n    window.change_attributes(None,border_pixel=border_color)\n    self.dpy.sync()\n
\n

I hope this helps someone later on down the road!

\n soup wrap:

Looks like this was complete PEBKAC. I've found an answer. Basically, I was doing this:

def set_active_border(self, window):
    border_color = self.colormap.alloc_named_color(\
        "#ff00ff").pixel
    window.configure(border_width = 2)
    window.change_attributes(None,border_pixel=border_color,
         border_width = 2)
    self.dpy.sync()

Apparently this was confusing X enough that it was doing nothing. The solution that I've stumbled upon was to remove the border_width portion from the window.change_attributes() call, like so:

def set_active_border(self, window):
    border_color = self.colormap.alloc_named_color(\
        "#ff00ff").pixel
    window.configure(border_width = 2)
    window.change_attributes(None,border_pixel=border_color)
    self.dpy.sync()

I hope this helps someone later on down the road!

qid & accept id: (10282693, 10283932) query: Double helix generating algorithm soup:

The key to this question is to recognize that you can represent each strand on the helix as a combination of sine waves - one for the periodic portion, and one for the "depth" into the page. Once you've parameterized the problem this way, you can control every aspect of your helix. The example below uses * and # to show the different strands to illustrate the point. If you choose values for the wavelength that do not commensurate with integer values you'll get less then optimal results - but now you can play with inputs to find what you consider the most aesthetically pleasing representation.

\n
from numpy import *\n\namp = 10\nlength = 100\nwavelength = 20\n\nomega = (2*pi)/wavelength\nphi   = wavelength*(0.5)\nX = arange(1,length)\nY1 = round_(amp*(sin(omega*X) + 1))\nY2 = round_(amp*(sin(omega*X+phi) + 1))\n\noffset = phi/2\nZ1 = sin(omega*X + offset)\nZ2 = sin(omega*X + phi + offset)\n\nT1 = " ######### "\nT2 = " ********* "\nclen = len(T1)\n\nH = zeros((length,amp*2+clen),dtype='str')\nH[:,:] = " "\n\nfor n,(y1,y2,z1,z2) in enumerate(zip(Y1,Y2,Z1,Z2)):\n    H[n,y1:y1+clen] = list(T1)\n    H[n,y2:y2+clen] = list(T2)\n\n    # Overwrite if first helix is on top\n    if z1>z2: H[n,y1:y1+clen] = list(T1)\n\nfor line in H:\n    print "".join(line)\n
\n

These values give:

\n
   *********  #########        \n  *********      #########     \n *********         #########   \n *********           ######### \n   *********         ######### \n     *********       ######### \n       *********   #########   \n          ****** #########     \n              #########        \n           ######### ****      \n        #########  *********   \n     #########      *********  \n   #########         ********* \n #########           ********* \n #########         *********   \n #########       *********     \n   #########   *********       \n     ###### *********          \n        *********              \n      ********* ####           \n   *********  #########        \n  *********      #########     \n *********         #########   \n *********           ######### \n   *********         ######### \n     *********       ######### \n       *********   #########   \n          ****** #########     \n              #########        \n
\n soup wrap:

The key to this question is to recognize that you can represent each strand on the helix as a combination of sine waves - one for the periodic portion, and one for the "depth" into the page. Once you've parameterized the problem this way, you can control every aspect of your helix. The example below uses * and # to show the different strands to illustrate the point. If you choose values for the wavelength that do not commensurate with integer values you'll get less then optimal results - but now you can play with inputs to find what you consider the most aesthetically pleasing representation.

from numpy import *

amp = 10
length = 100
wavelength = 20

omega = (2*pi)/wavelength
phi   = wavelength*(0.5)
X = arange(1,length)
Y1 = round_(amp*(sin(omega*X) + 1))
Y2 = round_(amp*(sin(omega*X+phi) + 1))

offset = phi/2
Z1 = sin(omega*X + offset)
Z2 = sin(omega*X + phi + offset)

T1 = " ######### "
T2 = " ********* "
clen = len(T1)

H = zeros((length,amp*2+clen),dtype='str')
H[:,:] = " "

for n,(y1,y2,z1,z2) in enumerate(zip(Y1,Y2,Z1,Z2)):
    H[n,y1:y1+clen] = list(T1)
    H[n,y2:y2+clen] = list(T2)

    # Overwrite if first helix is on top
    if z1>z2: H[n,y1:y1+clen] = list(T1)

for line in H:
    print "".join(line)

These values give:

   *********  #########        
  *********      #########     
 *********         #########   
 *********           ######### 
   *********         ######### 
     *********       ######### 
       *********   #########   
          ****** #########     
              #########        
           ######### ****      
        #########  *********   
     #########      *********  
   #########         ********* 
 #########           ********* 
 #########         *********   
 #########       *********     
   #########   *********       
     ###### *********          
        *********              
      ********* ####           
   *********  #########        
  *********      #########     
 *********         #########   
 *********           ######### 
   *********         ######### 
     *********       ######### 
       *********   #########   
          ****** #########     
              #########        
qid & accept id: (10303797, 23716239) query: Print floating point values without leading zero soup:

You may use the following MyFloat class instead of the builtin float class.

\n
def _remove_leading_zero(value, string):\n    if 1 > value > -1:\n        string = string.replace('0', '', 1)\n    return string\n\n\nclass MyFloat(float):\n    def __str__(self):\n        string = super().__str__()\n        return _remove_leading_zero(self, string)\n\n    def __format__(self, format_string):\n        string = super().__format__(format_string)\n        return _remove_leading_zero(self, string)\n
\n

Using this class you'll have to use str.format function instead of the modulus operator (%) for formatting. Following are some examples:

\n
>>> print(MyFloat(.4444))\n.4444\n\n>>> print(MyFloat(-.4444))\n-.4444\n\n>>> print('some text {:.3f} some more text',format(MyFloat(.4444)))\nsome text .444 some more text\n\n>>> print('some text {:+.3f} some more text',format(MyFloat(.4444)))\nsome text +.444 some more text\n
\n

If you also want to make the modulus operator (%) of str class to behave the same way then you'll have to override the __mod__ method of str class by subclassing the class. But it won't be as easy as overriding the __format__ method of float class, as in that case the formatted float number could be present at any position in the resultant string.

\n

[Note: All the above code is written in Python3. You'll also have to override __unicode__ in Python2 and also have to change the super calls.]

\n

P.S.: You may also override __repr__ method similar to __str__, if you also want to change the official string representation of MyFloat.

\n
\n
\n
\n

Edit: Actually you can add new syntax to format sting using __format__ method. So, if you want to keep both behaviours, i.e. show leading zero when needed and don't show leading zero when not needed. You may create the MyFloat class as follows:

\n
class MyFloat(float):\n    def __format__(self, format_string):\n        if format_string.endswith('z'):  # 'fz' is format sting for floats without leading the zero\n            format_string = format_string[:-1]\n            remove_leading_zero = True\n        else:\n            remove_leading_zero = False\n\n        string = super(MyFloat, self).__format__(format_string)\n        return _remove_leading_zero(self, string) if remove_leading_zero else string\n        # `_remove_leading_zero` function is same as in the first example\n
\n

And use this class as follows:

\n
>>> print('some text {:.3f} some more text',format(MyFloat(.4444)))\nsome text 0.444 some more text\n>>> print('some text {:.3fz} some more text',format(MyFloat(.4444)))\nsome text .444 some more text\n\n\n>>> print('some text {:+.3f} some more text',format(MyFloat(.4444)))\nsome text +0.444 some more text\n>>> print('some text {:+.3fz} some more text',format(MyFloat(.4444)))\nsome text +.444 some more text\n\n\n>>> print('some text {:.3f} some more text',format(MyFloat(-.4444)))\nsome text -0.444 some more text\n>>> print('some text {:.3fz} some more text',format(MyFloat(-.4444)))\nsome text -.444 some more text\n
\n

Note that using 'fz' instead of 'f' removes the leading zero.

\n

Also, the above code works in both Python2 and Python3.

\n soup wrap:

You may use the following MyFloat class instead of the builtin float class.

def _remove_leading_zero(value, string):
    if 1 > value > -1:
        string = string.replace('0', '', 1)
    return string


class MyFloat(float):
    def __str__(self):
        string = super().__str__()
        return _remove_leading_zero(self, string)

    def __format__(self, format_string):
        string = super().__format__(format_string)
        return _remove_leading_zero(self, string)

Using this class you'll have to use str.format function instead of the modulus operator (%) for formatting. Following are some examples:

>>> print(MyFloat(.4444))
.4444

>>> print(MyFloat(-.4444))
-.4444

>>> print('some text {:.3f} some more text',format(MyFloat(.4444)))
some text .444 some more text

>>> print('some text {:+.3f} some more text',format(MyFloat(.4444)))
some text +.444 some more text

If you also want to make the modulus operator (%) of str class to behave the same way then you'll have to override the __mod__ method of str class by subclassing the class. But it won't be as easy as overriding the __format__ method of float class, as in that case the formatted float number could be present at any position in the resultant string.

[Note: All the above code is written in Python3. You'll also have to override __unicode__ in Python2 and also have to change the super calls.]

P.S.: You may also override __repr__ method similar to __str__, if you also want to change the official string representation of MyFloat.




Edit: Actually you can add new syntax to format sting using __format__ method. So, if you want to keep both behaviours, i.e. show leading zero when needed and don't show leading zero when not needed. You may create the MyFloat class as follows:

class MyFloat(float):
    def __format__(self, format_string):
        if format_string.endswith('z'):  # 'fz' is format sting for floats without leading the zero
            format_string = format_string[:-1]
            remove_leading_zero = True
        else:
            remove_leading_zero = False

        string = super(MyFloat, self).__format__(format_string)
        return _remove_leading_zero(self, string) if remove_leading_zero else string
        # `_remove_leading_zero` function is same as in the first example

And use this class as follows:

>>> print('some text {:.3f} some more text',format(MyFloat(.4444)))
some text 0.444 some more text
>>> print('some text {:.3fz} some more text',format(MyFloat(.4444)))
some text .444 some more text


>>> print('some text {:+.3f} some more text',format(MyFloat(.4444)))
some text +0.444 some more text
>>> print('some text {:+.3fz} some more text',format(MyFloat(.4444)))
some text +.444 some more text


>>> print('some text {:.3f} some more text',format(MyFloat(-.4444)))
some text -0.444 some more text
>>> print('some text {:.3fz} some more text',format(MyFloat(-.4444)))
some text -.444 some more text

Note that using 'fz' instead of 'f' removes the leading zero.

Also, the above code works in both Python2 and Python3.

qid & accept id: (10342939, 10342948) query: Power set and Cartesian Product of a set python soup:

For the Cartesian product, check out itertools.product.

\n

For the powerset, the itertools docs also give us a recipe:

\n
def powerset(iterable):\n    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"\n    s = list(iterable)\n    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))\n
\n

For example:

\n
>>> test = {1, 2, 3}\n>>> list(powerset(test))\n[(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]\n>>> list(product(test, test))\n[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]\n
\n soup wrap:

For the Cartesian product, check out itertools.product.

For the powerset, the itertools docs also give us a recipe:

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

For example:

>>> test = {1, 2, 3}
>>> list(powerset(test))
[(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]
>>> list(product(test, test))
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)]
qid & accept id: (10429919, 10439432) query: QTableView item selection based on a QStandardItem data attribute soup:

As you said, right now you have your QTableView.selectionChanged() feeding the selections back to your matplot. The most efficient approach would be to have your matplot emit a signal for its selection, with the relevant items.

\n

A table view already stores its selections in a QItemSelectionModel, so as far as I can see it would be redundant and unnecessary to store your own isSelected attribute on the items. Your matplot view should know the items it is using and should be able to notify the table view of its selection changes.

\n

Your matplot view can have a signal that you emit, such as selectionChanged(items), and can continue having no knowledge of the table view.

\n

Your table view, as it already knows about the matplot view, can connect to its selectionChanged(items) to the matplot and listen for selection changes. Even if your table is also emitting a signal and has no knowledge of the matplot, you can make the connection in whatever parent class does know of them both.

\n

This is why I think the attribute isn't needed: The only way to make use of that attribute is to scan the entire model, checking each item. Thats not really efficient. The selection should happen in reaction to the signal being emitted.

\n
myMatPlotView.selectionchanged.connect(myTableView.matplotSelected)\n
\n

And in your matPlotSelected() slot, you can use the selection model to set the items selection:

\n

tableView

\n
def matPlotSelected(self, qStandardItems):\n\n    selModel = self.selectionModel()\n    model = self.model()\n\n    for item in qStandardItems:\n        idx = model.indexFromItem(item)\n        selModel.select(idx, selModel.Select)\n
\n

Update

\n

In the comments you provided a code snippet, which really helped to isolate what you want to achieve.

\n

Your example

\n
def __init__(self):\n    super(myDialog, self).__init__()\n    self.t = QtGui.QTreeView()\n    self.m = QtGui.QStandardItemModel()\n    self.t.setModel(self.m)\n    layout = QtGui.QVBoxLayout()\n    layout.addWidget(self.t)\n    self.setLayout(layout)\n    self.l = [\n        ['one', False], ['two', True], \n        ['three', False], ['four', True], \n        ['five', False]]\n    self.populate()\n\ndef populate(self):\n    self.m.clear()\n    root = self.m.invisibleRootItem()\n    for item in self.l:\n        e = QtGui.QStandardItem()\n        e.setText(item[0])\n        root.appendRow(e)\n
\n

If this is your actual situation, then what I suggested above fits in like this:

\n
def populate(self):\n    self.m.clear()\n    root = self.m.invisibleRootItem()\n    selModel = self.t.selectionModel()\n    for item in self.l:\n        e = QtGui.QStandardItem()\n        e.setText(item[0])\n        root.appendRow(e)\n\n        if item[1]:\n            idx = self.m.indexFromItem(e)\n            selModel.select(idx, selModel.Select)\n
\n soup wrap:

As you said, right now you have your QTableView.selectionChanged() feeding the selections back to your matplot. The most efficient approach would be to have your matplot emit a signal for its selection, with the relevant items.

A table view already stores its selections in a QItemSelectionModel, so as far as I can see it would be redundant and unnecessary to store your own isSelected attribute on the items. Your matplot view should know the items it is using and should be able to notify the table view of its selection changes.

Your matplot view can have a signal that you emit, such as selectionChanged(items), and can continue having no knowledge of the table view.

Your table view, as it already knows about the matplot view, can connect to its selectionChanged(items) to the matplot and listen for selection changes. Even if your table is also emitting a signal and has no knowledge of the matplot, you can make the connection in whatever parent class does know of them both.

This is why I think the attribute isn't needed: The only way to make use of that attribute is to scan the entire model, checking each item. Thats not really efficient. The selection should happen in reaction to the signal being emitted.

myMatPlotView.selectionchanged.connect(myTableView.matplotSelected)

And in your matPlotSelected() slot, you can use the selection model to set the items selection:

tableView

def matPlotSelected(self, qStandardItems):

    selModel = self.selectionModel()
    model = self.model()

    for item in qStandardItems:
        idx = model.indexFromItem(item)
        selModel.select(idx, selModel.Select)

Update

In the comments you provided a code snippet, which really helped to isolate what you want to achieve.

Your example

def __init__(self):
    super(myDialog, self).__init__()
    self.t = QtGui.QTreeView()
    self.m = QtGui.QStandardItemModel()
    self.t.setModel(self.m)
    layout = QtGui.QVBoxLayout()
    layout.addWidget(self.t)
    self.setLayout(layout)
    self.l = [
        ['one', False], ['two', True], 
        ['three', False], ['four', True], 
        ['five', False]]
    self.populate()

def populate(self):
    self.m.clear()
    root = self.m.invisibleRootItem()
    for item in self.l:
        e = QtGui.QStandardItem()
        e.setText(item[0])
        root.appendRow(e)

If this is your actual situation, then what I suggested above fits in like this:

def populate(self):
    self.m.clear()
    root = self.m.invisibleRootItem()
    selModel = self.t.selectionModel()
    for item in self.l:
        e = QtGui.QStandardItem()
        e.setText(item[0])
        root.appendRow(e)

        if item[1]:
            idx = self.m.indexFromItem(e)
            selModel.select(idx, selModel.Select)
qid & accept id: (10437805, 10437928) query: ScraperWiki/Python: filtering out records when property is false soup:

Do you just want this? I tried on the free ScraperWiki test page and seems to do what you want. If you're looking for something more complicated, let me know.

\n
import scraperwiki\nimport simplejson\nimport urllib2\n\nQUERY = 'meetup'\nRESULTS_PER_PAGE = '100'\nNUM_PAGES = 10\n\nfor page in range(1, NUM_PAGES+1):\n    base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&page=%s' \\n         % (urllib2.quote(QUERY), RESULTS_PER_PAGE, page)\n    try:\n        results_json = simplejson.loads(scraperwiki.scrape(base_url))\n        for result in results_json['results']:\n            #print result\n            data = {}\n            data['id'] = result['id']\n            data['text'] = result['text']\n            data['location'] = scraperwiki.geo.extract_gb_postcode(result['text'])\n            data['from_user'] = result['from_user']\n            data['created_at'] = result['created_at']\n            if data['location']:\n                print data['location'], data['from_user']\n                scraperwiki.sqlite.save(["id"], data)\n    except:\n        print 'Oh dear, failed to scrape %s' % base_url\n        break\n
\n

Outputs:

\n
P93JX VSDC\nFV36RL Bootstrappers\nCi76fP Eli_Regalado\nUN56fn JasonPalmer1971\niQ3H6zR GNOTP\nQr04eB fcnewtech\nsE79dW melindaveee\nud08GT MariaPanlilio\nc9B8EE akibantech\nay26th Thepinkleash\n
\n

I've refined it a bit so it's a bit picker than the scraperwiki check for extracting gb postcodes, which lets though quite a few false positives. Basically I took the accepted answer from here, and added some negative lookbehind/lookahead to filter out a few more. It looks like the scraper wiki check does the regex without the negative lookbehind/lookahead. Hope that helps a bit.

\n
import scraperwiki\nimport simplejson\nimport urllib2\nimport re\n\nQUERY = 'sw4'\nRESULTS_PER_PAGE = '100'\nNUM_PAGES = 10\n\npostcode_match = re.compile('(?
\n soup wrap:

Do you just want this? I tried on the free ScraperWiki test page and seems to do what you want. If you're looking for something more complicated, let me know.

import scraperwiki
import simplejson
import urllib2

QUERY = 'meetup'
RESULTS_PER_PAGE = '100'
NUM_PAGES = 10

for page in range(1, NUM_PAGES+1):
    base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&page=%s' \
         % (urllib2.quote(QUERY), RESULTS_PER_PAGE, page)
    try:
        results_json = simplejson.loads(scraperwiki.scrape(base_url))
        for result in results_json['results']:
            #print result
            data = {}
            data['id'] = result['id']
            data['text'] = result['text']
            data['location'] = scraperwiki.geo.extract_gb_postcode(result['text'])
            data['from_user'] = result['from_user']
            data['created_at'] = result['created_at']
            if data['location']:
                print data['location'], data['from_user']
                scraperwiki.sqlite.save(["id"], data)
    except:
        print 'Oh dear, failed to scrape %s' % base_url
        break

Outputs:

P93JX VSDC
FV36RL Bootstrappers
Ci76fP Eli_Regalado
UN56fn JasonPalmer1971
iQ3H6zR GNOTP
Qr04eB fcnewtech
sE79dW melindaveee
ud08GT MariaPanlilio
c9B8EE akibantech
ay26th Thepinkleash

I've refined it a bit so it's a bit picker than the scraperwiki check for extracting gb postcodes, which lets though quite a few false positives. Basically I took the accepted answer from here, and added some negative lookbehind/lookahead to filter out a few more. It looks like the scraper wiki check does the regex without the negative lookbehind/lookahead. Hope that helps a bit.

import scraperwiki
import simplejson
import urllib2
import re

QUERY = 'sw4'
RESULTS_PER_PAGE = '100'
NUM_PAGES = 10

postcode_match = re.compile('(?
qid & accept id: (10460286, 10460314) query: Concat every 4 strings from a list? soup:
>>> data = ['192', '168', '0', '1', '80', '192', '168', '0', '2', '8080']\n>>> ['{}.{}.{}.{}:{}'.format(*x) for x in zip(*[iter(data)]*5)]\n['192.168.0.1:80', '192.168.0.2:8080']\n
\n

Using starmap

\n
>>> from itertools import starmap\n>>> list(starmap('{}.{}.{}.{}:{}'.format,zip(*[iter(data)]*5)))\n['192.168.0.1:80', '192.168.0.2:8080']\n
\n soup wrap:
>>> data = ['192', '168', '0', '1', '80', '192', '168', '0', '2', '8080']
>>> ['{}.{}.{}.{}:{}'.format(*x) for x in zip(*[iter(data)]*5)]
['192.168.0.1:80', '192.168.0.2:8080']

Using starmap

>>> from itertools import starmap
>>> list(starmap('{}.{}.{}.{}:{}'.format,zip(*[iter(data)]*5)))
['192.168.0.1:80', '192.168.0.2:8080']
qid & accept id: (10472907, 10473054) query: How to convert dictionary into string soup:

To convert from the dict to the string in the format you want:

\n
''.join('{}{}'.format(key, val) for key, val in adict.items())\n
\n

if you want them alphabetically ordered by key:

\n
''.join('{}{}'.format(key, val) for key, val in sorted(adict.items()))\n
\n soup wrap:

To convert from the dict to the string in the format you want:

''.join('{}{}'.format(key, val) for key, val in adict.items())

if you want them alphabetically ordered by key:

''.join('{}{}'.format(key, val) for key, val in sorted(adict.items()))
qid & accept id: (10500834, 10500919) query: Able to use any case in input to generate the same dict values in output soup:

You should use capitalize() and lower()

\n
while response[0] != 'quit': \n    response = raw_input("Please enter who you're looking for, or type 'exit' to quit the program: ").split() \n    try:\n        print "%s's %s is %s" % (response[0].capitalize(), response[1].lower(), people[response[0].capitalize()][response[1].lower()])  \n    except KeyError: \n        print wrong,\n
\n

You should change the 'bob' key to 'Bob', if you go this route...

\n

Alternatively, you can save a few more CPU cycles if you reuse results, as mentioned by rubik below.

\n
while response[0] != 'quit': \n    response = raw_input("Please enter who you're looking for, or type 'exit' to quit the program: ").split() \n    try:\n        fn, thing = response[0].capitalize(), response[1].lower()\n        print "%s's %s is %s" % (fn, thing, people[fn][thing])  \n    except KeyError: \n        print wrong,\n
\n soup wrap:

You should use capitalize() and lower()

while response[0] != 'quit': 
    response = raw_input("Please enter who you're looking for, or type 'exit' to quit the program: ").split() 
    try:
        print "%s's %s is %s" % (response[0].capitalize(), response[1].lower(), people[response[0].capitalize()][response[1].lower()])  
    except KeyError: 
        print wrong,

You should change the 'bob' key to 'Bob', if you go this route...

Alternatively, you can save a few more CPU cycles if you reuse results, as mentioned by rubik below.

while response[0] != 'quit': 
    response = raw_input("Please enter who you're looking for, or type 'exit' to quit the program: ").split() 
    try:
        fn, thing = response[0].capitalize(), response[1].lower()
        print "%s's %s is %s" % (fn, thing, people[fn][thing])  
    except KeyError: 
        print wrong,
qid & accept id: (10507011, 10507271) query: django serialize foreign key objects soup:

One potential way around this is to construct your own dictionary object based on the returns of a queryset. You'd do something like this:

\n
queryset = Model.objects.all()\nlist = [] #create list\nfor row in queryset: #populate list\n    list.append({'title':row.title, 'body': row.body, 'name': row.user.username})\nrecipe_list_json = json.dumps(list) #dump list as JSON\nreturn HttpResponse(recipe_list_json, 'application/javascript')\n
\n

You need to import json for this to work.

\n
import json\n
\n soup wrap:

One potential way around this is to construct your own dictionary object based on the returns of a queryset. You'd do something like this:

queryset = Model.objects.all()
list = [] #create list
for row in queryset: #populate list
    list.append({'title':row.title, 'body': row.body, 'name': row.user.username})
recipe_list_json = json.dumps(list) #dump list as JSON
return HttpResponse(recipe_list_json, 'application/javascript')

You need to import json for this to work.

import json
qid & accept id: (10526579, 10527953) query: use scikit-learn to classify into multiple categories soup:

What you want is called multi-label classification. Scikits-learn can do that. See here: http://scikit-learn.org/dev/modules/multiclass.html.

\n

I'm not sure what's going wrong in your example, my version of sklearn apparently doesn't have WordNGramAnalyzer. Perhaps it's a question of using more training examples or trying a different classifier? Though note that the multi-label classifier expects the target to be a list of tuples/lists of labels.

\n

The following works for me:

\n
import numpy as np\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.feature_extraction.text import CountVectorizer\nfrom sklearn.svm import LinearSVC\nfrom sklearn.feature_extraction.text import TfidfTransformer\nfrom sklearn.multiclass import OneVsRestClassifier\n\nX_train = np.array(["new york is a hell of a town",\n                    "new york was originally dutch",\n                    "the big apple is great",\n                    "new york is also called the big apple",\n                    "nyc is nice",\n                    "people abbreviate new york city as nyc",\n                    "the capital of great britain is london",\n                    "london is in the uk",\n                    "london is in england",\n                    "london is in great britain",\n                    "it rains a lot in london",\n                    "london hosts the british museum",\n                    "new york is great and so is london",\n                    "i like london better than new york"])\ny_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],[0,1],[0,1]]\nX_test = np.array(['nice day in nyc',\n                   'welcome to london',\n                   'hello welcome to new york. enjoy it here and london too'])   \ntarget_names = ['New York', 'London']\n\nclassifier = Pipeline([\n    ('vectorizer', CountVectorizer(min_n=1,max_n=2)),\n    ('tfidf', TfidfTransformer()),\n    ('clf', OneVsRestClassifier(LinearSVC()))])\nclassifier.fit(X_train, y_train)\npredicted = classifier.predict(X_test)\nfor item, labels in zip(X_test, predicted):\n    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))\n
\n

For me, this produces the output:

\n
nice day in nyc => New York\nwelcome to london => London\nhello welcome to new york. enjoy it here and london too => New York, London\n
\n

Hope this helps.

\n soup wrap:

What you want is called multi-label classification. Scikits-learn can do that. See here: http://scikit-learn.org/dev/modules/multiclass.html.

I'm not sure what's going wrong in your example, my version of sklearn apparently doesn't have WordNGramAnalyzer. Perhaps it's a question of using more training examples or trying a different classifier? Though note that the multi-label classifier expects the target to be a list of tuples/lists of labels.

The following works for me:

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier

X_train = np.array(["new york is a hell of a town",
                    "new york was originally dutch",
                    "the big apple is great",
                    "new york is also called the big apple",
                    "nyc is nice",
                    "people abbreviate new york city as nyc",
                    "the capital of great britain is london",
                    "london is in the uk",
                    "london is in england",
                    "london is in great britain",
                    "it rains a lot in london",
                    "london hosts the british museum",
                    "new york is great and so is london",
                    "i like london better than new york"])
y_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],[0,1],[0,1]]
X_test = np.array(['nice day in nyc',
                   'welcome to london',
                   'hello welcome to new york. enjoy it here and london too'])   
target_names = ['New York', 'London']

classifier = Pipeline([
    ('vectorizer', CountVectorizer(min_n=1,max_n=2)),
    ('tfidf', TfidfTransformer()),
    ('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(X_train, y_train)
predicted = classifier.predict(X_test)
for item, labels in zip(X_test, predicted):
    print '%s => %s' % (item, ', '.join(target_names[x] for x in labels))

For me, this produces the output:

nice day in nyc => New York
welcome to london => London
hello welcome to new york. enjoy it here and london too => New York, London

Hope this helps.

qid & accept id: (10562180, 10562673) query: How to do a basic query on yahoo search engine using Python without using any yahoo api? soup:

first, avoid urllib - use requests instead, it's a much saner interface.

\n

Then, all links in the returned page have the class yschttl and an ID following the scheme link-1, link-2 and so on. That you can use with beautiful soup:

\n
import requests\nfrom bs4 import BeautifulSoup\nurl = "http://search.yahoo.com/search?p=%s"\nquery = "python"\nr = requests.get(url % query) \nsoup = BeautifulSoup(r.text)\nsoup.find_all(attrs={"class": "yschttl"})\n\nfor link in soup.find_all(attrs={"class": "yschttl"}):\n    print "%s (%s)" %(link.text, link.get('href'))\n
\n

Gives us

\n
\n
Python Programming Language – Official Website (http://www.python.org/)\nPython - Image Results (http://images.search.yahoo.com/search/images?_adv_prop=image&va=python)\nPython (programming language) - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Python_(programming_language))\n
\n
\n

and more.

\n soup wrap:

first, avoid urllib - use requests instead, it's a much saner interface.

Then, all links in the returned page have the class yschttl and an ID following the scheme link-1, link-2 and so on. That you can use with beautiful soup:

import requests
from bs4 import BeautifulSoup
url = "http://search.yahoo.com/search?p=%s"
query = "python"
r = requests.get(url % query) 
soup = BeautifulSoup(r.text)
soup.find_all(attrs={"class": "yschttl"})

for link in soup.find_all(attrs={"class": "yschttl"}):
    print "%s (%s)" %(link.text, link.get('href'))

Gives us

Python Programming Language – Official Website (http://www.python.org/)
Python - Image Results (http://images.search.yahoo.com/search/images?_adv_prop=image&va=python)
Python (programming language) - Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Python_(programming_language))

and more.

qid & accept id: (10586471, 10638039) query: How do I define custom function to be called from IPython's prompts? soup:

After reading a bit of the documentation (and peeking at the source code for leads) I found the solution for this problem.

\n

Simply now you should move all your custom functions to a module inside your .ipython directory. Since what I was doing was a simple function that returns the git branch and status for the current directory, I created a file called gitprompt.py and then I included the filename in the exec_file configuration option:

\n
c.InteractiveShellApp.exec_files = [b'gitprompt.py']\n
\n

All definitions in such files are placed into the user namespace. So now I can use it inside my prompt:

\n
# Input prompt.  '\#' will be transformed to the prompt number\nc.PromptManager.in_template = br'{color.Green}\# {color.LightBlue}~\u{color.Green}:\w{color.LightBlue} {git_branch_and_st} \$\n>>> '\n\n# Continuation prompt.\nc.PromptManager.in2_template = br'... '\n
\n

Notice that in order for the function to behave as such (i.e called each time the prompt is printed) you need to use the IPython.core.prompts.LazyEvaluation class. You may use it as a decorator for your function. The gitprompt.py has being placed in the public domain as the gist: https://gist.github.com/2719419

\n soup wrap:

After reading a bit of the documentation (and peeking at the source code for leads) I found the solution for this problem.

Simply now you should move all your custom functions to a module inside your .ipython directory. Since what I was doing was a simple function that returns the git branch and status for the current directory, I created a file called gitprompt.py and then I included the filename in the exec_file configuration option:

c.InteractiveShellApp.exec_files = [b'gitprompt.py']

All definitions in such files are placed into the user namespace. So now I can use it inside my prompt:

# Input prompt.  '\#' will be transformed to the prompt number
c.PromptManager.in_template = br'{color.Green}\# {color.LightBlue}~\u{color.Green}:\w{color.LightBlue} {git_branch_and_st} \$\n>>> '

# Continuation prompt.
c.PromptManager.in2_template = br'... '

Notice that in order for the function to behave as such (i.e called each time the prompt is printed) you need to use the IPython.core.prompts.LazyEvaluation class. You may use it as a decorator for your function. The gitprompt.py has being placed in the public domain as the gist: https://gist.github.com/2719419

qid & accept id: (10599771, 10599944) query: How to loop through subfolders showing jpg in Tkinter? soup:

The easiest way that I can think of doing this :

\n

first, create a method display_next which will increment an index and display the image associated with that index in a list (assume the list is a list of filenames). Enclosing the list inquiry in a try/except clause will let you catch the IndexError that happens when you run out of images to display -- At this point you can reset your index to -1 or whatever you want to happen at that point.

\n

get the list of filenames in __init__ and initialize some index to -1 (e.g. self.index=-1).

\n

create a tk.Button in __init__ like this:

\n
self.Button = Tkinter.Button(self,text="Next",command=self.display_next)\n
\n

Another side note, you can use a widget's config method to update a widget on the fly (instead of recreating it all the time). In other words, move all the widget creation into __init__ and then in display_next just update the widget using config. Also, it's probably better to inherit from Tkinter.Frame...

\n
class SimpleAppTk(Tkinter.Frame):\n    def __init__(self,*args,**kwargs):\n        Tkinter.Frame.__init__(self,*args,**kwargs)\n\n        self.filelist=[]  #get your files here\n        #it probably would look like:\n        #for d in os.listdir(parentDir):\n        #    self.filelist.extend(glob.glob(os.path.join(parentDir,d,'*.jpg'))\n        self.index=-1\n        self.setup()\n        self.display_next()\n\n    def setup(self):\n        self.Label=Tkinter.Label(self)\n        self.Label.grid(row=0,column=0)\n        self.Button=Tkinter.Button(self,text="Next",command=self.display_next)\n        self.Button.grid(row=0,column=1)\n\n    def display_next(self):\n        self.index+=1\n        try:\n            f=self.filelist[self.index]\n        except IndexError:\n            self.index=-1  #go back to the beginning of the list.\n            self.display_next()\n            return\n\n        #create PhotoImage here\n        photoimage=...\n        self.Label.config(image=photoimage)\n        self.Label.image=photoimage\n\nif __name__ == "__main__":\n   root=Tkinter.Tk()\n   my_app=SimpleAppTk(root)\n   my_app.grid(row=0,column=0)\n   root.mainloop()\n
\n

EDIT

\n

I've given an example of how to actually grid the Frame. In your previous example, you had self.grid in your initialization code. This really did nothing. The only reason you had results was because you were inheriting from Tkinter.Tk which gets gridded automatically. Typically it's best practice to grid after you create the object because if you come back later and decide you want to put that widget someplace else in a different gui, it's trivial to do so. I've also changed the name of the class to use CamelCase in agreement with PEP 8 ... But you can change it back if you want.

\n soup wrap:

The easiest way that I can think of doing this :

first, create a method display_next which will increment an index and display the image associated with that index in a list (assume the list is a list of filenames). Enclosing the list inquiry in a try/except clause will let you catch the IndexError that happens when you run out of images to display -- At this point you can reset your index to -1 or whatever you want to happen at that point.

get the list of filenames in __init__ and initialize some index to -1 (e.g. self.index=-1).

create a tk.Button in __init__ like this:

self.Button = Tkinter.Button(self,text="Next",command=self.display_next)

Another side note, you can use a widget's config method to update a widget on the fly (instead of recreating it all the time). In other words, move all the widget creation into __init__ and then in display_next just update the widget using config. Also, it's probably better to inherit from Tkinter.Frame...

class SimpleAppTk(Tkinter.Frame):
    def __init__(self,*args,**kwargs):
        Tkinter.Frame.__init__(self,*args,**kwargs)

        self.filelist=[]  #get your files here
        #it probably would look like:
        #for d in os.listdir(parentDir):
        #    self.filelist.extend(glob.glob(os.path.join(parentDir,d,'*.jpg'))
        self.index=-1
        self.setup()
        self.display_next()

    def setup(self):
        self.Label=Tkinter.Label(self)
        self.Label.grid(row=0,column=0)
        self.Button=Tkinter.Button(self,text="Next",command=self.display_next)
        self.Button.grid(row=0,column=1)

    def display_next(self):
        self.index+=1
        try:
            f=self.filelist[self.index]
        except IndexError:
            self.index=-1  #go back to the beginning of the list.
            self.display_next()
            return

        #create PhotoImage here
        photoimage=...
        self.Label.config(image=photoimage)
        self.Label.image=photoimage

if __name__ == "__main__":
   root=Tkinter.Tk()
   my_app=SimpleAppTk(root)
   my_app.grid(row=0,column=0)
   root.mainloop()

EDIT

I've given an example of how to actually grid the Frame. In your previous example, you had self.grid in your initialization code. This really did nothing. The only reason you had results was because you were inheriting from Tkinter.Tk which gets gridded automatically. Typically it's best practice to grid after you create the object because if you come back later and decide you want to put that widget someplace else in a different gui, it's trivial to do so. I've also changed the name of the class to use CamelCase in agreement with PEP 8 ... But you can change it back if you want.

qid & accept id: (10602071, 10603296) query: Following users like twitter in Django, how would you do it? soup:

First, you should understand how to store additional information about users. It requires another model that has a relation to one user, the "profile" model.

\n

Then, you could use an M2M field, assuming you'd use django-annoying, you could define your user profile model as such:

\n
from django.db import models\n\nfrom annoying.fields import AutoOneToOneField\n\nclass UserProfile(models.Model):\n    user = AutoOneToOneField('auth.user')\n    follows = models.ManyToManyField('UserProfile', related_name='followed_by')\n\n    def __unicode__(self):\n        return self.user.username\n
\n

And use it as such:

\n
In [1]: tim, c = User.objects.get_or_create(username='tim')\n\nIn [2]: chris, c = User.objects.get_or_create(username='chris')\n\nIn [3]: tim.userprofile.follows.add(chris.userprofile) # chris follows tim\n\nIn [4]: tim.userprofile.follows.all() # list of userprofiles of users that tim follows\nOut[4]: []\n\nIn [5]: chris.userprofile.followed_by.all() # list of userprofiles of users that follow chris\nOut[5]: []\n
\n

Also, note that you could check / reuse apps like django-subscription, django-actstream, django-social (harder to use probably)...

\n

You might want to take a look at the django packages for notifications and activities as they all require some follow/subscription database design.

\n soup wrap:

First, you should understand how to store additional information about users. It requires another model that has a relation to one user, the "profile" model.

Then, you could use an M2M field, assuming you'd use django-annoying, you could define your user profile model as such:

from django.db import models

from annoying.fields import AutoOneToOneField

class UserProfile(models.Model):
    user = AutoOneToOneField('auth.user')
    follows = models.ManyToManyField('UserProfile', related_name='followed_by')

    def __unicode__(self):
        return self.user.username

And use it as such:

In [1]: tim, c = User.objects.get_or_create(username='tim')

In [2]: chris, c = User.objects.get_or_create(username='chris')

In [3]: tim.userprofile.follows.add(chris.userprofile) # chris follows tim

In [4]: tim.userprofile.follows.all() # list of userprofiles of users that tim follows
Out[4]: []

In [5]: chris.userprofile.followed_by.all() # list of userprofiles of users that follow chris
Out[5]: []

Also, note that you could check / reuse apps like django-subscription, django-actstream, django-social (harder to use probably)...

You might want to take a look at the django packages for notifications and activities as they all require some follow/subscription database design.

qid & accept id: (10610592, 10610780) query: Specifying types and patterns using argparse choices soup:

You could use the type argument to add_argument(...) instead. For example:

\n
import os\nimport argparse\n\ndef intOrUnderscore(s):\n    if s != '_':\n        return int(s)\n    cases = (n for n in os.listdir(".") if n.startswith("file."))\n    return max(int(c[c.rindex(".")+1:]) for c in cases)\n\nparser = argparse.ArgumentParser()\nparser.add_argument('case', type=intOrUnderscore)\n\nargs = parser.parse_args()\nprint args.case\n
\n

When I run this I get:

\n
$ ls\nfile.1  file.2  file.3  s.py\n$ python s.py 2\n2\n$ python s.py _\n3\n
\n

Alternately, you could build the choices list in code:

\n
import os\nimport argparse\n\ncases = [n[n.rindex(".")+1:] for n in os.listdir(".") if n.startswith("file.")]\ncases.append("_")\nparser = argparse.ArgumentParser()\nparser.add_argument('case', choices = cases)\n\nargs = parser.parse_args()\nprint args.case\n
\n soup wrap:

You could use the type argument to add_argument(...) instead. For example:

import os
import argparse

def intOrUnderscore(s):
    if s != '_':
        return int(s)
    cases = (n for n in os.listdir(".") if n.startswith("file."))
    return max(int(c[c.rindex(".")+1:]) for c in cases)

parser = argparse.ArgumentParser()
parser.add_argument('case', type=intOrUnderscore)

args = parser.parse_args()
print args.case

When I run this I get:

$ ls
file.1  file.2  file.3  s.py
$ python s.py 2
2
$ python s.py _
3

Alternately, you could build the choices list in code:

import os
import argparse

cases = [n[n.rindex(".")+1:] for n in os.listdir(".") if n.startswith("file.")]
cases.append("_")
parser = argparse.ArgumentParser()
parser.add_argument('case', choices = cases)

args = parser.parse_args()
print args.case
qid & accept id: (10636203, 10636720) query: Simple loop for all elements of an etree object? soup:

The problem you are facing is that you are not visiting all nodes in the file. You are only visiting the children of the elem element, but you are not visiting the children of these elements. To illustrate this, running the following (I have edited your XML to be valid):

\n
from xml.etree.ElementTree as etree\n\nxml_string = """\n    \n    \n        \n    \n    """\n\ne = etree.fromstring(xml_string)\n\nfor node in e:\n    print node\n
\n

results in

\n
\n\n
\n

So you are not visiting the child variable of the node if. You will need to recursively visit each node in your XML file, i.e. you function collect_vars will need to call itself. I'll post some code in a bit to illustrate this.

\n

Edit: As promised, some code to get all id attributes from your element tree. Rather than using an accumulator as Niek de Klein has I have used a generator. This has a number of advantages. For example, this returns the ids one at a time, so you can stop processing at any point, if, for example, a certain id is encountered, which saves reading the entire XML file.

\n
def get_attrs(element, tag, attr):\n    """Return attribute `attr` of `tag` child elements of `element`."""\n\n    # If an element has any cildren (nested elements) loop through them:\n    if len(element):\n         for node in element:\n            # Recursively call this function, yielding each result:\n            for attribute in get_attrs(node, tag, attr):\n                yield attribute\n\n    # Otherwise, check if element is of type `tag` with attribute `attr`, if so\n    # yield the value of that attribute.\n    if element.tag == 'variable':\n        if attr in element.attrib:\n            yield element.attrib[attr]\n\nids = [id for id in get_attrs(e, 'variable', 'id')]\n\nprint ids\n
\n

This yields the result

\n
 ['getthis', 'alsoGetThis']\n
\n soup wrap:

The problem you are facing is that you are not visiting all nodes in the file. You are only visiting the children of the elem element, but you are not visiting the children of these elements. To illustrate this, running the following (I have edited your XML to be valid):

from xml.etree.ElementTree as etree

xml_string = """
    
    
        
    
    """

e = etree.fromstring(xml_string)

for node in e:
    print node

results in



So you are not visiting the child variable of the node if. You will need to recursively visit each node in your XML file, i.e. you function collect_vars will need to call itself. I'll post some code in a bit to illustrate this.

Edit: As promised, some code to get all id attributes from your element tree. Rather than using an accumulator as Niek de Klein has I have used a generator. This has a number of advantages. For example, this returns the ids one at a time, so you can stop processing at any point, if, for example, a certain id is encountered, which saves reading the entire XML file.

def get_attrs(element, tag, attr):
    """Return attribute `attr` of `tag` child elements of `element`."""

    # If an element has any cildren (nested elements) loop through them:
    if len(element):
         for node in element:
            # Recursively call this function, yielding each result:
            for attribute in get_attrs(node, tag, attr):
                yield attribute

    # Otherwise, check if element is of type `tag` with attribute `attr`, if so
    # yield the value of that attribute.
    if element.tag == 'variable':
        if attr in element.attrib:
            yield element.attrib[attr]

ids = [id for id in get_attrs(e, 'variable', 'id')]

print ids

This yields the result

 ['getthis', 'alsoGetThis']
qid & accept id: (10645986, 10646263) query: Custom sort python soup:

Your first link more or less solves the problem. You just need to have the lambda function only look at the first item in your list:

\n
alphabet = "zyxwvutsrqpomnlkjihgfedcba"\n\nnew_list = sorted(inputList, key=lambda word: [alphabet.index(c) for c in word[0]])\n
\n

One modification I might suggest, if you're sorting a reasonably large list, is to change the alphabet structure into a dict first, so that index lookup is faster:

\n
alphabet_dict = dict([(x, alphabet.index(x)) for x in alphabet)\nnew_list = sorted(inputList, key=lambda word: [alphabet_dict[c] for c in word[0]])\n
\n soup wrap:

Your first link more or less solves the problem. You just need to have the lambda function only look at the first item in your list:

alphabet = "zyxwvutsrqpomnlkjihgfedcba"

new_list = sorted(inputList, key=lambda word: [alphabet.index(c) for c in word[0]])

One modification I might suggest, if you're sorting a reasonably large list, is to change the alphabet structure into a dict first, so that index lookup is faster:

alphabet_dict = dict([(x, alphabet.index(x)) for x in alphabet)
new_list = sorted(inputList, key=lambda word: [alphabet_dict[c] for c in word[0]])
qid & accept id: (10647449, 10648116) query: serving i18n js using babel, django, & jinja2 soup:

Well, it looks like I can just do:

\n
{{gettext("message")}} \n
\n

(without defining gettext)

\n

in the JS and babel will extract & jinja2 will replace it ok.

\n

Watch out for quotes, though. You can't do:

\n
'{{gettext("message")}}'\n
\n

because extract_javascript will not read it. But, you can just put the quotes inside, as long as you render them safely:

\n
{{gettext("'message'")|safe}}\n
\n

So have your translators make sure to leave quotations wherever they find them in the original.

\n soup wrap:

Well, it looks like I can just do:

{{gettext("message")}} 

(without defining gettext)

in the JS and babel will extract & jinja2 will replace it ok.

Watch out for quotes, though. You can't do:

'{{gettext("message")}}'

because extract_javascript will not read it. But, you can just put the quotes inside, as long as you render them safely:

{{gettext("'message'")|safe}}

So have your translators make sure to leave quotations wherever they find them in the original.

qid & accept id: (10683659, 10683981) query: Find the number of ways a sequence can be rearranged soup:

Brute-force using itertools:

\n
import itertools\ndef arrangements(arr):\n    p = itertools.permutations(arr)\n    return set(item for item in p if all(x!=y for x,y in zip(item,arr)))\n
\n

Result:

\n
>>> arrangements([0,0,0,1,1,1])\n{(1, 1, 1, 0, 0, 0)}\n>>> arrangements([0,0,0,1,1,1,1])\nset()\n>>> arrangements([1,2,2,14])\n{(2, 14, 1, 2), (2, 1, 14, 2)}\n>>> arrangements([1,1,2,2,14])\n{(2, 14, 1, 1, 2), (2, 2, 1, 14, 1), (14, 2, 1, 1, 2), (2, 2, 14, 1, 1)}\n
\n soup wrap:

Brute-force using itertools:

import itertools
def arrangements(arr):
    p = itertools.permutations(arr)
    return set(item for item in p if all(x!=y for x,y in zip(item,arr)))

Result:

>>> arrangements([0,0,0,1,1,1])
{(1, 1, 1, 0, 0, 0)}
>>> arrangements([0,0,0,1,1,1,1])
set()
>>> arrangements([1,2,2,14])
{(2, 14, 1, 2), (2, 1, 14, 2)}
>>> arrangements([1,1,2,2,14])
{(2, 14, 1, 1, 2), (2, 2, 1, 14, 1), (14, 2, 1, 1, 2), (2, 2, 14, 1, 1)}
qid & accept id: (10741346, 10741692) query: numpy: most efficient frequency counts for unique values in an array soup:

Take a look at np.bincount:

\n

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

\n
import numpy as np\nx = np.array([1,1,1,2,2,2,5,25,1,1])\ny = np.bincount(x)\nii = np.nonzero(y)[0]\n
\n

And then:

\n
zip(ii,y[ii]) \n# [(1, 5), (2, 3), (5, 1), (25, 1)]\n
\n

or:

\n
np.vstack((ii,y[ii])).T\n# array([[ 1,  5],\n         [ 2,  3],\n         [ 5,  1],\n         [25,  1]])\n
\n

or however you want to combine the counts and the unique values.

\n soup wrap:

Take a look at np.bincount:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]

And then:

zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

or:

np.vstack((ii,y[ii])).T
# array([[ 1,  5],
         [ 2,  3],
         [ 5,  1],
         [25,  1]])

or however you want to combine the counts and the unique values.

qid & accept id: (10805846, 10806164) query: How to make a Python function sleep some time while the rest of the game continues? soup:

The right thing to do here is set a timer event using this in the setup code (after the line HEAD = 0)

\n
SHRINKSNAKE = pygame.USEREVENT+0\n
\n

this goes in the runGame function after direction = RIGHT

\n
pygame.time.set_timer(SHRINKSNAKE, 4*1000)\n
\n

and this in the event handling loop in runGame before the line elif event.type == KEYDOWN: the elifs should line up

\n
elif event.type == SHRINKSNAKE:\n  if len(wormCoords) > 2:\n    del wormCoords[-1]\n
\n

For more details check the documentation on pygame.time.set_timer

\n soup wrap:

The right thing to do here is set a timer event using this in the setup code (after the line HEAD = 0)

SHRINKSNAKE = pygame.USEREVENT+0

this goes in the runGame function after direction = RIGHT

pygame.time.set_timer(SHRINKSNAKE, 4*1000)

and this in the event handling loop in runGame before the line elif event.type == KEYDOWN: the elifs should line up

elif event.type == SHRINKSNAKE:
  if len(wormCoords) > 2:
    del wormCoords[-1]

For more details check the documentation on pygame.time.set_timer

qid & accept id: (10829302, 10833417) query: Writing to separate columns instead of comma seperated for csv files in scrapy soup:
\n

Update -- Code re-factored in order to:

\n
    \n
  1. use a generator function as suggested by @madjar and
  2. \n
  3. fit more closely to the code snippet provided by the OP.
  4. \n
\n
\n

The Target Output

\n

I am trying an alternative using texttable. It produces an identical output to that in the question. This output may be written to a csv file (the records will need massaging for the appropriate csv dialect, and I cannot find a way to still use the csv.writer and still get the padded spaces in each field.

\n
                  Title,                      Release Date,             Director            \nAnd Now For Something Completely Different,       1971,              Ian MacNaughton        \nMonty Python And The Holy Grail,                  1975,       Terry Gilliam and Terry Jones \nMonty Python's Life Of Brian,                     1979,                Terry Jones    \n
\n

The Code

\n

Here is a sketch of the code you would need to produce the result above:\n

\n
from texttable import Texttable\n\n# ----------------------------------------------------------------\n# Imagine data to be generated by Scrapy, for each record:\n# a dictionary of three items. The first set ot functions\n# generate the data for use in the texttable function\n\ndef process_item(item):\n    # This massages each record in preparation for writing to csv\n    item['Title'] = item['Title'].encode('utf-8') + ','\n    item['Release Date'] = item['Release Date'].encode('utf-8') + ','\n    item['Director'] = item['Director'].encode('utf-8')\n    return item\n\ndef initialise_dataset():\n    data = [{'Title' : 'Title',\n         'Release Date' : 'Release Date',\n         'Director' : 'Director'\n         }, # first item holds the table header\n            {'Title' : 'And Now For Something Completely Different',\n         'Release Date' : '1971',\n         'Director' : 'Ian MacNaughton'\n         },\n        {'Title' : 'Monty Python And The Holy Grail',\n         'Release Date' : '1975',\n         'Director' : 'Terry Gilliam and Terry Jones'\n         },\n        {'Title' : "Monty Python's Life Of Brian",\n         'Release Date' : '1979',\n         'Director' : 'Terry Jones'\n         }\n        ]\n\n    data = [ process_item(item) for item in data ]\n    return data\n\ndef records(data):\n    for item in data:\n        yield [item['Title'], item['Release Date'], item['Director'] ]\n\n# this ends the data simulation part\n# --------------------------------------------------------\n\ndef create_table(data):\n    # Create the table\n    table = Texttable(max_width=0)\n    table.set_deco(Texttable.HEADER)\n    table.set_cols_align(["l", "c", "c"])\n    table.add_rows( records(data) )\n\n    # split, remove the underlining below the header\n    # and pull together again. Many ways of cleaning this...\n    tt = table.draw().split('\n')\n    del tt[1] # remove the line under the header\n    tt = '\n'.join(tt)\n    return tt\n\nif __name__ == '__main__':\n    data = initialise_dataset()\n    table = create_table(data)\n    print table\n
\n soup wrap:

Update -- Code re-factored in order to:

  1. use a generator function as suggested by @madjar and
  2. fit more closely to the code snippet provided by the OP.

The Target Output

I am trying an alternative using texttable. It produces an identical output to that in the question. This output may be written to a csv file (the records will need massaging for the appropriate csv dialect, and I cannot find a way to still use the csv.writer and still get the padded spaces in each field.

                  Title,                      Release Date,             Director            
And Now For Something Completely Different,       1971,              Ian MacNaughton        
Monty Python And The Holy Grail,                  1975,       Terry Gilliam and Terry Jones 
Monty Python's Life Of Brian,                     1979,                Terry Jones    

The Code

Here is a sketch of the code you would need to produce the result above:

from texttable import Texttable

# ----------------------------------------------------------------
# Imagine data to be generated by Scrapy, for each record:
# a dictionary of three items. The first set ot functions
# generate the data for use in the texttable function

def process_item(item):
    # This massages each record in preparation for writing to csv
    item['Title'] = item['Title'].encode('utf-8') + ','
    item['Release Date'] = item['Release Date'].encode('utf-8') + ','
    item['Director'] = item['Director'].encode('utf-8')
    return item

def initialise_dataset():
    data = [{'Title' : 'Title',
         'Release Date' : 'Release Date',
         'Director' : 'Director'
         }, # first item holds the table header
            {'Title' : 'And Now For Something Completely Different',
         'Release Date' : '1971',
         'Director' : 'Ian MacNaughton'
         },
        {'Title' : 'Monty Python And The Holy Grail',
         'Release Date' : '1975',
         'Director' : 'Terry Gilliam and Terry Jones'
         },
        {'Title' : "Monty Python's Life Of Brian",
         'Release Date' : '1979',
         'Director' : 'Terry Jones'
         }
        ]

    data = [ process_item(item) for item in data ]
    return data

def records(data):
    for item in data:
        yield [item['Title'], item['Release Date'], item['Director'] ]

# this ends the data simulation part
# --------------------------------------------------------

def create_table(data):
    # Create the table
    table = Texttable(max_width=0)
    table.set_deco(Texttable.HEADER)
    table.set_cols_align(["l", "c", "c"])
    table.add_rows( records(data) )

    # split, remove the underlining below the header
    # and pull together again. Many ways of cleaning this...
    tt = table.draw().split('\n')
    del tt[1] # remove the line under the header
    tt = '\n'.join(tt)
    return tt

if __name__ == '__main__':
    data = initialise_dataset()
    table = create_table(data)
    print table
qid & accept id: (10843549, 10843634) query: Solving 5 Linear Equations in Python soup:
import numpy\nimport scipy.linalg\n\nm = numpy.matrix([\n    [1, 1, 1, 1, 1],\n    [16, 8, 4, 2, 1],\n    [81, 27, 9, 3, 1],\n    [256, 64, 16, 4, 1],\n    [625, 125, 25, 5, 1]\n])\n\nres = numpy.matrix([[1],[2],[3],[4],[8]])\n\nprint scipy.linalg.solve(m, res)\n
\n

returns

\n
[[ 0.125]\n [-1.25 ]\n [ 4.375]\n [-5.25 ]\n [ 3.   ]]\n
\n

(your solution coefficients for a,b,c,d,e)

\n soup wrap:
import numpy
import scipy.linalg

m = numpy.matrix([
    [1, 1, 1, 1, 1],
    [16, 8, 4, 2, 1],
    [81, 27, 9, 3, 1],
    [256, 64, 16, 4, 1],
    [625, 125, 25, 5, 1]
])

res = numpy.matrix([[1],[2],[3],[4],[8]])

print scipy.linalg.solve(m, res)

returns

[[ 0.125]
 [-1.25 ]
 [ 4.375]
 [-5.25 ]
 [ 3.   ]]

(your solution coefficients for a,b,c,d,e)

qid & accept id: (10870736, 10870745) query: Python: Keep track of current column in text file soup:

You could try something like this

\n
for i,col in enumerate(fields[5:], 5):\n    ....\n
\n

enumerate() will generate an index value for you, by default it starts with 0 unless a starting value is specified as 2nd parameter to enumerate() as shown above with 5.

\n

Variable i will start with the value 5 and allow you to track the current column you are working on and col (as before) the value of the field in that column.

\n

Alternatively, just for convenience and easier modification, you could use a variable:

\n
start_col = 5\nfor i,col in enumerate(fields[start_col:], start_col):\n    ....\n
\n

--- UPDATE in reply to comments below:

\n

I am still not quite sure I understand your comment, but if the loop you posted is inside a bigger loop you could to keep track of your current columns like this:

\n
cur_column = 5\nfor line in Input:\n    line = line.rstrip() \n    fields = line.split("\t")   \n    for col in fields[cur_colum:]:\n       ...\n       ...\n\ncur_column += 1 # done processing current column, increment value to next column\n
\n

Posting some simple input/output examples would help if your code is too big to post. Hard to really know how to help without more information. I hope this is helpful.

\n soup wrap:

You could try something like this

for i,col in enumerate(fields[5:], 5):
    ....

enumerate() will generate an index value for you, by default it starts with 0 unless a starting value is specified as 2nd parameter to enumerate() as shown above with 5.

Variable i will start with the value 5 and allow you to track the current column you are working on and col (as before) the value of the field in that column.

Alternatively, just for convenience and easier modification, you could use a variable:

start_col = 5
for i,col in enumerate(fields[start_col:], start_col):
    ....

--- UPDATE in reply to comments below:

I am still not quite sure I understand your comment, but if the loop you posted is inside a bigger loop you could to keep track of your current columns like this:

cur_column = 5
for line in Input:
    line = line.rstrip() 
    fields = line.split("\t")   
    for col in fields[cur_colum:]:
       ...
       ...

cur_column += 1 # done processing current column, increment value to next column

Posting some simple input/output examples would help if your code is too big to post. Hard to really know how to help without more information. I hope this is helpful.

qid & accept id: (10881852, 10881925) query: Parse multi-line string up until first line with certain character soup:

change

\n
s2 = s1[:s.rfind('\n')]  #This picks up the newline after "everything"\n
\n

to

\n
s2 = s1[:s1.rfind('\n')]  \n
\n

and it will work. There might be a better way to do this though...

\n soup wrap:

change

s2 = s1[:s.rfind('\n')]  #This picks up the newline after "everything"

to

s2 = s1[:s1.rfind('\n')]  

and it will work. There might be a better way to do this though...

qid & accept id: (10889564, 10889606) query: RegEx for matching multiple substrings using one group? soup:
pat = re.compile(r' A(\d+)')\nlst = re.findall(pat, "= A1 A2 A3 A4")\n
\n

This returns a list, and in your example you showed a tuple. I presume a list will work for you, but of course you can always do:

\n
t = tuple(lst)\n
\n

The answer I just gave doesn't actually check for the = in the input string. If you need to do that, you can always use two patterns and two steps:

\n
pat0 = re.compile(r'=(?: A\d+)+')\npat1 = re.compile(r' A(\d+)')\n\nm = pat0.search("= A1 A2 A3 A4")\nif not m:\n    print("input string not what was expected")\nelse:\n    s = m.group(0)\n    lst = re.findall(pat, s)\n
\n

EDIT: Code that handles your func() example:

\n
s_code = "func(cmd, param1, param2, param3, param4)"\npat_recognize_args = re.compile(r'func\(cmd([^)]*)\)')\npat_parse_args = re.compile(r'[, ]+([^, ]+)')\n\nm = pat_recognize_args.search(s_code)\nif m:\n    s = m.group(1)\n    lst = re.findall(pat_parse_args, s)\n
\n

When I ran the above code, lst was set to: ['param1', 'param2', 'param3', 'param4']

\n

pat_recognize_args looks for the literal string func with a literal ( (which is backslash-escaped in the pattern so re won't try to use it to start a match group), then the literal string cmd, and then a match group that matches anything up to a literal ) character; then the match group is closed with a ) and a literal ) is there to match the actual ) that finishes the function call. After this pattern matches, the match object will have group 1 set to just the interesting arguments from the function call.

\n

So next we set s = m.group(1) and then have re.findall() pull out the arguments for us.

\n soup wrap:
pat = re.compile(r' A(\d+)')
lst = re.findall(pat, "= A1 A2 A3 A4")

This returns a list, and in your example you showed a tuple. I presume a list will work for you, but of course you can always do:

t = tuple(lst)

The answer I just gave doesn't actually check for the = in the input string. If you need to do that, you can always use two patterns and two steps:

pat0 = re.compile(r'=(?: A\d+)+')
pat1 = re.compile(r' A(\d+)')

m = pat0.search("= A1 A2 A3 A4")
if not m:
    print("input string not what was expected")
else:
    s = m.group(0)
    lst = re.findall(pat, s)

EDIT: Code that handles your func() example:

s_code = "func(cmd, param1, param2, param3, param4)"
pat_recognize_args = re.compile(r'func\(cmd([^)]*)\)')
pat_parse_args = re.compile(r'[, ]+([^, ]+)')

m = pat_recognize_args.search(s_code)
if m:
    s = m.group(1)
    lst = re.findall(pat_parse_args, s)

When I ran the above code, lst was set to: ['param1', 'param2', 'param3', 'param4']

pat_recognize_args looks for the literal string func with a literal ( (which is backslash-escaped in the pattern so re won't try to use it to start a match group), then the literal string cmd, and then a match group that matches anything up to a literal ) character; then the match group is closed with a ) and a literal ) is there to match the actual ) that finishes the function call. After this pattern matches, the match object will have group 1 set to just the interesting arguments from the function call.

So next we set s = m.group(1) and then have re.findall() pull out the arguments for us.

qid & accept id: (10920180, 10921408) query: Is there a pythonic way to support keyword arguments for a memoize decorator in Python? soup:

I'd suggest something like the following:

\n
import inspect\n\nclass key_memoized(object):\n    def __init__(self, func):\n       self.func = func\n       self.cache = {}\n\n    def __call__(self, *args, **kwargs):\n        key = self.key(args, kwargs)\n        if key not in self.cache:\n            self.cache[key] = self.func(*args, **kwargs)\n        return self.cache[key]\n\n    def normalize_args(self, args, kwargs):\n        spec = inspect.getargs(self.func.__code__).args\n        return dict(kwargs.items() + zip(spec, args))\n\n    def key(self, args, kwargs):\n        a = self.normalize_args(args, kwargs)\n        return tuple(sorted(a.items()))\n
\n

Example:

\n
@key_memoized\ndef foo(bar, baz, spam):\n    print 'calling foo: bar=%r baz=%r spam=%r' % (bar, baz, spam)\n    return bar + baz + spam\n\nprint foo(1, 2, 3)\nprint foo(1, 2, spam=3)         #memoized\nprint foo(spam=3, baz=2, bar=1) #memoized\n
\n

Note that you can also extend key_memoized and override its key() method to provide more specific memoization strategies, e.g. to ignore some of the arguments:

\n
class memoize_by_bar(key_memoized):\n    def key(self, args, kwargs):\n        return self.normalize_args(args, kwargs)['bar']\n\n@memoize_by_bar\ndef foo(bar, baz, spam):\n    print 'calling foo: bar=%r baz=%r spam=%r' % (bar, baz, spam)\n    return bar\n\nprint foo('x', 'ignore1', 'ignore2')\nprint foo('x', 'ignore3', 'ignore4')\n
\n soup wrap:

I'd suggest something like the following:

import inspect

class key_memoized(object):
    def __init__(self, func):
       self.func = func
       self.cache = {}

    def __call__(self, *args, **kwargs):
        key = self.key(args, kwargs)
        if key not in self.cache:
            self.cache[key] = self.func(*args, **kwargs)
        return self.cache[key]

    def normalize_args(self, args, kwargs):
        spec = inspect.getargs(self.func.__code__).args
        return dict(kwargs.items() + zip(spec, args))

    def key(self, args, kwargs):
        a = self.normalize_args(args, kwargs)
        return tuple(sorted(a.items()))

Example:

@key_memoized
def foo(bar, baz, spam):
    print 'calling foo: bar=%r baz=%r spam=%r' % (bar, baz, spam)
    return bar + baz + spam

print foo(1, 2, 3)
print foo(1, 2, spam=3)         #memoized
print foo(spam=3, baz=2, bar=1) #memoized

Note that you can also extend key_memoized and override its key() method to provide more specific memoization strategies, e.g. to ignore some of the arguments:

class memoize_by_bar(key_memoized):
    def key(self, args, kwargs):
        return self.normalize_args(args, kwargs)['bar']

@memoize_by_bar
def foo(bar, baz, spam):
    print 'calling foo: bar=%r baz=%r spam=%r' % (bar, baz, spam)
    return bar

print foo('x', 'ignore1', 'ignore2')
print foo('x', 'ignore3', 'ignore4')
qid & accept id: (10921316, 11191972) query: Plot multiple y-axis AND colorbar in matplotlib soup:

@OZ123 Sorry that I took so long to respond. Matplotlib has extensible customizability, sometimes to the point where you get confused to what you are actually doing. Thanks for the help on creating separate axes.

\n

However, I didn't think I needed that much control, and I ended up just using the PAD keyword argument in

\n
fig.colorbar()\n
\n

and this provided what I needed.

\n

The pseudo-code then becomes this:\n

\n
#!/usr/bin/python\n\nimport matplotlib.pyplot as plt\nfrom matplotlib import cm\n\nfig = plt.figure()\nax1 = fig.add_subplot(111)\nmappable = ax1.scatter(xgrid,\n                       ygrid,\n                       c=be,                   # set colorbar to blaze efficiency\n                       cmap=cm.hot,\n                       vmin=0.0,\n                       vmax=1.0)\n\ncbar = fig.colorbar(mappable, pad=0.15)\ncbar.set_label('Blaze Efficiency')\n\nax2 = ax1.twinx()\nax2.set_ylabel('Wavelength')\n\nplt.show()\n
\n

Here is to show what it looks like now:enter image description here:

\n soup wrap:

@OZ123 Sorry that I took so long to respond. Matplotlib has extensible customizability, sometimes to the point where you get confused to what you are actually doing. Thanks for the help on creating separate axes.

However, I didn't think I needed that much control, and I ended up just using the PAD keyword argument in

fig.colorbar()

and this provided what I needed.

The pseudo-code then becomes this:

#!/usr/bin/python

import matplotlib.pyplot as plt
from matplotlib import cm

fig = plt.figure()
ax1 = fig.add_subplot(111)
mappable = ax1.scatter(xgrid,
                       ygrid,
                       c=be,                   # set colorbar to blaze efficiency
                       cmap=cm.hot,
                       vmin=0.0,
                       vmax=1.0)

cbar = fig.colorbar(mappable, pad=0.15)
cbar.set_label('Blaze Efficiency')

ax2 = ax1.twinx()
ax2.set_ylabel('Wavelength')

plt.show()

Here is to show what it looks like now:enter image description here:

qid & accept id: (10961378, 10961991) query: How to generate an html directory list using Python soup:

You could separate the directory tree generation and its rendering as html.

\n

To generate the tree you could use a simple recursive function:

\n
def make_tree(path):\n    tree = dict(name=os.path.basename(path), children=[])\n    try: lst = os.listdir(path)\n    except OSError:\n        pass #ignore errors\n    else:\n        for name in lst:\n            fn = os.path.join(path, name)\n            if os.path.isdir(fn):\n                tree['children'].append(make_tree(fn))\n            else:\n                tree['children'].append(dict(name=name))\n    return tree\n
\n

To render it as html you could use jinja2's loop recursive feature:

\n
\nPath: {{ tree.name }}\n

{{ tree.name }}

\n
    \n{%- for item in tree.children recursive %}\n
  • {{ item.name }}\n {%- if item.children -%}\n
      {{ loop(item.children) }}
    \n {%- endif %}
  • \n{%- endfor %}\n
\n
\n

Put the html into templates/dirtree.html file.\nTo test it, run the following code and visit http://localhost:8888/:

\n
import os\nfrom flask import Flask, render_template\n\napp = Flask(__name__)\n\n@app.route('/')\ndef dirtree():\n    path = os.path.expanduser(u'~')\n    return render_template('dirtree.html', tree=make_tree(path))\n\nif __name__=="__main__":\n    app.run(host='localhost', port=8888, debug=True)\n
\n soup wrap:

You could separate the directory tree generation and its rendering as html.

To generate the tree you could use a simple recursive function:

def make_tree(path):
    tree = dict(name=os.path.basename(path), children=[])
    try: lst = os.listdir(path)
    except OSError:
        pass #ignore errors
    else:
        for name in lst:
            fn = os.path.join(path, name)
            if os.path.isdir(fn):
                tree['children'].append(make_tree(fn))
            else:
                tree['children'].append(dict(name=name))
    return tree

To render it as html you could use jinja2's loop recursive feature:


Path: {{ tree.name }}

{{ tree.name }}

    {%- for item in tree.children recursive %}
  • {{ item.name }} {%- if item.children -%}
      {{ loop(item.children) }}
    {%- endif %}
  • {%- endfor %}

Put the html into templates/dirtree.html file. To test it, run the following code and visit http://localhost:8888/:

import os
from flask import Flask, render_template

app = Flask(__name__)

@app.route('/')
def dirtree():
    path = os.path.expanduser(u'~')
    return render_template('dirtree.html', tree=make_tree(path))

if __name__=="__main__":
    app.run(host='localhost', port=8888, debug=True)
qid & accept id: (10963952, 10963969) query: Return the largest value of a given element of tuple keys in a dictionary soup:

Either:

\n
largest_key = max(my_dict, key=lambda x: x[1])\n
\n

Or:

\n
from operator import itemgetter\nlargest_key = max(my_dict, key=itemgetter(1))\n
\n

According to DSM, iterating over a dict directly is faster than retrieving and iterating over keys() or viewkeys().

\n

What I think Ms. Zverina is talking about is converting your data structure from a dict with tuple keys to something like this:

\n
my_dict = {\n    'a': {\n            1: value_1,\n            2: value_3\n         }\n    'b': {\n            1: value_2,\n            2: value_5\n         }\n    'c': {\n            3: value_4\n         }\n}\n
\n

That way, if you wanted find the max of all values with a, you could simply do:

\n
largest_key = max(d['a'])\n
\n

At no extra cost. (Your data is already divided into subsets, so you don't have to waste computation on building subsets each time you do a search).

\n

EDIT

\n

To restrict your search to a given subset, do something like this:

\n
>>> subset = 'a'\n>>> largest_key_within_subset = max((i for i in my_dict if i[0] == subset), key=itemgetter(1))\n
\n

Where (i for i in my_dict if i[0] == subset) is a generator that returns only keys that are in the given subset.

\n soup wrap:

Either:

largest_key = max(my_dict, key=lambda x: x[1])

Or:

from operator import itemgetter
largest_key = max(my_dict, key=itemgetter(1))

According to DSM, iterating over a dict directly is faster than retrieving and iterating over keys() or viewkeys().

What I think Ms. Zverina is talking about is converting your data structure from a dict with tuple keys to something like this:

my_dict = {
    'a': {
            1: value_1,
            2: value_3
         }
    'b': {
            1: value_2,
            2: value_5
         }
    'c': {
            3: value_4
         }
}

That way, if you wanted find the max of all values with a, you could simply do:

largest_key = max(d['a'])

At no extra cost. (Your data is already divided into subsets, so you don't have to waste computation on building subsets each time you do a search).

EDIT

To restrict your search to a given subset, do something like this:

>>> subset = 'a'
>>> largest_key_within_subset = max((i for i in my_dict if i[0] == subset), key=itemgetter(1))

Where (i for i in my_dict if i[0] == subset) is a generator that returns only keys that are in the given subset.

qid & accept id: (10993692, 10993777) query: python: convert to HTML special characters soup:

If you're only concerned about critical special characters like &, < and >:

\n
>>> import cgi\n>>> cgi.escape("")\n'<hello&goodbye>'\n
\n

For other non-ASCII characters:

\n
>>> "Übeltäter".encode("ascii", "xmlcharrefreplace")\nb'Übeltäter'\n
\n

Of course, if necessary, you can combine the two:

\n
>>> cgi.escape("<Übeltäter>").encode("ascii", "xmlcharrefreplace")\nb'<Übeltäter>'\n
\n soup wrap:

If you're only concerned about critical special characters like &, < and >:

>>> import cgi
>>> cgi.escape("")
'<hello&goodbye>'

For other non-ASCII characters:

>>> "Übeltäter".encode("ascii", "xmlcharrefreplace")
b'Übeltäter'

Of course, if necessary, you can combine the two:

>>> cgi.escape("<Übeltäter>").encode("ascii", "xmlcharrefreplace")
b'<Übeltäter>'
qid & accept id: (11040604, 11041179) query: How to uniquefy a list of dicts based on percentage similarity of a value in the dicts soup:

Using your function that determines uniqueness, you can do this:

\n
import difflib\n\ndef similar(seq1, seq2):\n    return difflib.SequenceMatcher(a=seq1.lower(), b=seq2.lower()).ratio() > 0.9\n\ndef unique(mylist, keys):\n    temp = mylist[:]\n    for d in mylist:\n        temp.pop(0)\n        [d2.pop(i) for i in keys if d.has_key(i)\n         for d2 in temp if d2.has_key(i) and similar(d[i], d2[i])] \n    return mylist\n
\n

note that this will modify your dictionaries in place:

\n
mylist = [{"greeting":"HELLO WORLD!"}, {"greeting":"Hello Mars"}, {"greeting":"Hello World!!!"}, {"greeting":"hello world"}]\nunique(mylist, ['greeting'])\n\nprint mylist\n
\n

Output:

\n
[{'greeting': 'HELLO WORLD!'}, {'greeting': 'Hello Mars'}, {}, {}]\n
\n soup wrap:

Using your function that determines uniqueness, you can do this:

import difflib

def similar(seq1, seq2):
    return difflib.SequenceMatcher(a=seq1.lower(), b=seq2.lower()).ratio() > 0.9

def unique(mylist, keys):
    temp = mylist[:]
    for d in mylist:
        temp.pop(0)
        [d2.pop(i) for i in keys if d.has_key(i)
         for d2 in temp if d2.has_key(i) and similar(d[i], d2[i])] 
    return mylist

note that this will modify your dictionaries in place:

mylist = [{"greeting":"HELLO WORLD!"}, {"greeting":"Hello Mars"}, {"greeting":"Hello World!!!"}, {"greeting":"hello world"}]
unique(mylist, ['greeting'])

print mylist

Output:

[{'greeting': 'HELLO WORLD!'}, {'greeting': 'Hello Mars'}, {}, {}]
qid & accept id: (11052920, 11053002) query: Creating a 2d Grid in Python soup:

This implementation does the same as yours for "square" lists of lists:

\n
def makeLRGrid(g):\n    return [row[:] for row in g]\n
\n

A list can be copied by slicing the whole list with [:], and you can use a list comprehension to do this for every row.

\n

Edit: You seem to be actually aiming at transposing the list of lists. This can be done with zip():

\n
def transpose(g):\n    return zip(*g)\n
\n

or, if you really need a list of lists

\n
def transpose(g):\n    return map(list, zip(*g))\n
\n

See also the documentation of zip().

\n soup wrap:

This implementation does the same as yours for "square" lists of lists:

def makeLRGrid(g):
    return [row[:] for row in g]

A list can be copied by slicing the whole list with [:], and you can use a list comprehension to do this for every row.

Edit: You seem to be actually aiming at transposing the list of lists. This can be done with zip():

def transpose(g):
    return zip(*g)

or, if you really need a list of lists

def transpose(g):
    return map(list, zip(*g))

See also the documentation of zip().

qid & accept id: (11065100, 11068153) query: background process in python with -e option on terminal soup:

When testing this, I've found the results to be highly dependent on the program to be launched and the issue has nothing to do with python. I never noticed it, but 'term -e program' only works for some programs, others exit with the behavior I was getting. Some programs don't keep inherited pid/sid while others do.

\n

For example

\n
>>> print os.getpid()\n3556\n>>> os.execl( '/usr/bin/gvim', 'gvim' )\n
\n

a quick 'ps -e' shows

\n
3556 pts/1    00:00:00 gvim \n3557 ?        00:00:00 gvim\n
\n

When the launching terminal closes, all processes with the same sid close. So the 'gvim defunct' disappears but the other persists. Programs which do not obtain a new pid/sid will quit when the launching terminal closes. The solution was to just force a new sid on the process.

\n
import os\n\nif os.fork():\n    # parent\n    do_stuff()\n\nelse:\n    # child\n    os.setsid()\n    os.execl('prog', 'prog')\n
\n soup wrap:

When testing this, I've found the results to be highly dependent on the program to be launched and the issue has nothing to do with python. I never noticed it, but 'term -e program' only works for some programs, others exit with the behavior I was getting. Some programs don't keep inherited pid/sid while others do.

For example

>>> print os.getpid()
3556
>>> os.execl( '/usr/bin/gvim', 'gvim' )

a quick 'ps -e' shows

3556 pts/1    00:00:00 gvim 
3557 ?        00:00:00 gvim

When the launching terminal closes, all processes with the same sid close. So the 'gvim defunct' disappears but the other persists. Programs which do not obtain a new pid/sid will quit when the launching terminal closes. The solution was to just force a new sid on the process.

import os

if os.fork():
    # parent
    do_stuff()

else:
    # child
    os.setsid()
    os.execl('prog', 'prog')
qid & accept id: (11066400, 11066687) query: Remove punctuation from Unicode formatted strings soup:

You could use unicode.translate() method:

\n
import unicodedata\nimport sys\n\ntbl = dict.fromkeys(i for i in xrange(sys.maxunicode)\n                      if unicodedata.category(unichr(i)).startswith('P'))\ndef remove_punctuation(text):\n    return text.translate(tbl)\n
\n

You could also use r'\p{P}' that is supported by regex module:

\n
import regex as re\n\ndef remove_punctuation(text):\n    return re.sub(ur"\p{P}+", "", text)\n
\n soup wrap:

You could use unicode.translate() method:

import unicodedata
import sys

tbl = dict.fromkeys(i for i in xrange(sys.maxunicode)
                      if unicodedata.category(unichr(i)).startswith('P'))
def remove_punctuation(text):
    return text.translate(tbl)

You could also use r'\p{P}' that is supported by regex module:

import regex as re

def remove_punctuation(text):
    return re.sub(ur"\p{P}+", "", text)
qid & accept id: (11078122, 11079590) query: Finding matching submatrices inside a matrix soup:

You can use correlate. You'll need to set your black values to -1 and your white values to 1 (or vice-versa) so that you know the value of the peak of the correlation, and that it only occurs with the correct letter.

\n

The following code does what I think you want.

\n
import numpy\nfrom scipy import signal\n\n# Set up the inputs\na = numpy.random.randn(100, 200)\na[a<0] = 0\na[a>0] = 255\n\nb = numpy.random.randn(20, 20)\nb[b<0] = 0\nb[b>0] = 255\n\n# put b somewhere in a\na[37:37+b.shape[0], 84:84+b.shape[1]] = b\n\n# Now the actual solution...\n\n# Set the black values to -1\na[a==0] = -1\nb[b==0] = -1\n\n# and the white values to 1\na[a==255] = 1\nb[b==255] = 1\n\nmax_peak = numpy.prod(b.shape)\n\n# c will contain max_peak where the overlap is perfect\nc = signal.correlate(a, b, 'valid')\n\noverlaps = numpy.where(c == max_peak)\n\nprint overlaps\n
\n

This outputs (array([37]), array([84])), the locations of the offsets set in the code.

\n

You will likely find that if your letter size multiplied by your big array size is bigger than roughly Nlog(N), where N is corresponding size of the big array in which you're searching (for each dimension), then you will probably get a speed up by using an fft based algorithm like scipy.signal.fftconvolve (bearing in mind that you'll need to flip each axis of one of the datasets if you're using a convolution rather than a correlation - flipud and fliplr). The only modification would be to assigning c:

\n
c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')\n
\n

Comparing the timings on the sizes above:

\n
In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')\n100 loops, best of 3: 6.78 ms per loop\n\nIn [6]: timeit c = signal.correlate(a, b, 'valid')\n10 loops, best of 3: 151 ms per loop\n
\n soup wrap:

You can use correlate. You'll need to set your black values to -1 and your white values to 1 (or vice-versa) so that you know the value of the peak of the correlation, and that it only occurs with the correct letter.

The following code does what I think you want.

import numpy
from scipy import signal

# Set up the inputs
a = numpy.random.randn(100, 200)
a[a<0] = 0
a[a>0] = 255

b = numpy.random.randn(20, 20)
b[b<0] = 0
b[b>0] = 255

# put b somewhere in a
a[37:37+b.shape[0], 84:84+b.shape[1]] = b

# Now the actual solution...

# Set the black values to -1
a[a==0] = -1
b[b==0] = -1

# and the white values to 1
a[a==255] = 1
b[b==255] = 1

max_peak = numpy.prod(b.shape)

# c will contain max_peak where the overlap is perfect
c = signal.correlate(a, b, 'valid')

overlaps = numpy.where(c == max_peak)

print overlaps

This outputs (array([37]), array([84])), the locations of the offsets set in the code.

You will likely find that if your letter size multiplied by your big array size is bigger than roughly Nlog(N), where N is corresponding size of the big array in which you're searching (for each dimension), then you will probably get a speed up by using an fft based algorithm like scipy.signal.fftconvolve (bearing in mind that you'll need to flip each axis of one of the datasets if you're using a convolution rather than a correlation - flipud and fliplr). The only modification would be to assigning c:

c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')

Comparing the timings on the sizes above:

In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')
100 loops, best of 3: 6.78 ms per loop

In [6]: timeit c = signal.correlate(a, b, 'valid')
10 loops, best of 3: 151 ms per loop
qid & accept id: (11102829, 11104077) query: Code a loop on a list of delimiters? soup:

The problem isn't difficult if you use the alternation operator, |.

\n
    (d1|d2|d3|d4|d25)(.*?)(?=d1|d2|d3|d4|d25)\n
\n

This way,

\n
    \n
  1. you will capture the starting delimiter in case you need it, in group 1;
  2. \n
  3. you will non-greedily capture "some stuff" in group 2;
  4. \n
  5. and by using a lookahead assertion, you won't "eat up" the next delimiter quite yet, so that you can continue matching the rest of your data with the same regex.
  6. \n
\n

See a demo of this regex here: http://rubular.com/r/DJVegfD3Ul.

\n

Note: Sadly I don't know Python, so I won't try to write any code. But it should be a trivial task to join all your delimiters into the form above. See caveat in comment below.

\n

UPDATE

\n

This is my first time writing Python, ever, so forgive my mistakes.

\n
    # start with an array of delimeters\n    delimeters = [d1, d2, d3]\n\n    # start with a blank string\n    regex_delim = ''\n\n    # build the "delimiters regex" using alternation\n    for delimeter in delimeters:\n        regex_delim += re.escape(delimeter) + '|'\n\n    # remove the extra '|' at the end\n    regex_delim = regex_delim[:-1]\n\n    # compile the regex\n    regex_obj = re.compile('(' + regex_delim + ')(.*?)(?=' + regex_delim + ')')\n\n    # and that should be it!\n    for match in regex_obj.finditer(html_str):\n        print match.group(2)\n
\n

The re.escape(delimiter) is necessary in case your delimiters have special characters in them. For example, if your delimiter was *, re.escape(...) returns \*, so that your delimiter isn't translated as a regex quantifier.

\n soup wrap:

The problem isn't difficult if you use the alternation operator, |.

    (d1|d2|d3|d4|d25)(.*?)(?=d1|d2|d3|d4|d25)

This way,

  1. you will capture the starting delimiter in case you need it, in group 1;
  2. you will non-greedily capture "some stuff" in group 2;
  3. and by using a lookahead assertion, you won't "eat up" the next delimiter quite yet, so that you can continue matching the rest of your data with the same regex.

See a demo of this regex here: http://rubular.com/r/DJVegfD3Ul.

Note: Sadly I don't know Python, so I won't try to write any code. But it should be a trivial task to join all your delimiters into the form above. See caveat in comment below.

UPDATE

This is my first time writing Python, ever, so forgive my mistakes.

    # start with an array of delimeters
    delimeters = [d1, d2, d3]

    # start with a blank string
    regex_delim = ''

    # build the "delimiters regex" using alternation
    for delimeter in delimeters:
        regex_delim += re.escape(delimeter) + '|'

    # remove the extra '|' at the end
    regex_delim = regex_delim[:-1]

    # compile the regex
    regex_obj = re.compile('(' + regex_delim + ')(.*?)(?=' + regex_delim + ')')

    # and that should be it!
    for match in regex_obj.finditer(html_str):
        print match.group(2)

The re.escape(delimiter) is necessary in case your delimiters have special characters in them. For example, if your delimiter was *, re.escape(...) returns \*, so that your delimiter isn't translated as a regex quantifier.

qid & accept id: (11140628, 11141206) query: Django - access foreign key data in an annotated query soup:

If you want the user you need to access it the other way around by querying the User model and joining Relationship. Here's the relevant documentation

\n

should be something like this:

\n
from django.db.models import Count\n\nusers = User.objects.annotate(num_followers=Count('to_users')).order_by('-num_followers')\n
\n

this will give you the users and each of them will have an extra property num_followers

\n

model.py

\n
from django.contrib.auth.models import User\nfrom django.db import models\n\nclass Relationship(models.Model):\n    from_user = models.ForeignKey(User, related_name='from_users')\n    to_user = models.ForeignKey(User, related_name='to_users')\n
\n

test

\n
>>> from so.models import *\n>>> from django.contrib.auth.models import User\n>>> u1 = User()\n>>> u1.username='user1'\n>>> u1.save()\n>>> u2 = User()\n>>> u2.username='user2'\n>>> u2.save()\n>>> u3=User()\n>>> u3.username='user3'\n>>> u3.save()\n>>> # so we have 3 users now\n>>> r1 = Relationship()\n>>> r1.from_user=u1\n>>> r1.to_user=u2\n>>> r1.save()\n>>> r2=Relationship()\n>>> r2.from_user=u1\n>>> r2.to_user=u3\n>>> r2.save()\n>>> r3=Relationship()\n>>> r3.from_user=u2\n>>> r3.to_user=u3\n>>> r3.save()\n>>> rels = Relationship.objects.all()\n>>> rels.count()\n3\n>>> # we have 3 relationships: user1 follows user2, user1 follows user3, user2 follows user3\n>>> users = User.objects.annotate(num_followers=Count('to_users')).order_by('-num_followers')\n>>> for user in users:\n>>>     print user.username, user.num_followers\nuser3 2\nuser2 1\nuser1 0\n
\n

EDIT2 fixed the typos, added the test

\n soup wrap:

If you want the user you need to access it the other way around by querying the User model and joining Relationship. Here's the relevant documentation

should be something like this:

from django.db.models import Count

users = User.objects.annotate(num_followers=Count('to_users')).order_by('-num_followers')

this will give you the users and each of them will have an extra property num_followers

model.py

from django.contrib.auth.models import User
from django.db import models

class Relationship(models.Model):
    from_user = models.ForeignKey(User, related_name='from_users')
    to_user = models.ForeignKey(User, related_name='to_users')

test

>>> from so.models import *
>>> from django.contrib.auth.models import User
>>> u1 = User()
>>> u1.username='user1'
>>> u1.save()
>>> u2 = User()
>>> u2.username='user2'
>>> u2.save()
>>> u3=User()
>>> u3.username='user3'
>>> u3.save()
>>> # so we have 3 users now
>>> r1 = Relationship()
>>> r1.from_user=u1
>>> r1.to_user=u2
>>> r1.save()
>>> r2=Relationship()
>>> r2.from_user=u1
>>> r2.to_user=u3
>>> r2.save()
>>> r3=Relationship()
>>> r3.from_user=u2
>>> r3.to_user=u3
>>> r3.save()
>>> rels = Relationship.objects.all()
>>> rels.count()
3
>>> # we have 3 relationships: user1 follows user2, user1 follows user3, user2 follows user3
>>> users = User.objects.annotate(num_followers=Count('to_users')).order_by('-num_followers')
>>> for user in users:
>>>     print user.username, user.num_followers
user3 2
user2 1
user1 0

EDIT2 fixed the typos, added the test

qid & accept id: (11159668, 11160217) query: Insertions algorithm in sequence python soup:

You can do this by sorting on location and applying in reverse order. Is order important in case of ties? Then sort only by location, not location and sequence, so they will insert in the correct order. For example, if inserting 999@1 then 888@1, if you sorted on both values you'd get 888@1,999@1.

\n
12345\n18889992345\n
\n

But sorting only by location with a stable sort gives 999@1,888@1

\n
12345\n1999888345\n
\n

Here's the code:

\n
import random\nimport operator\n\n# Easier to use a mutable list than an immutable string for insertion.\nsequence = list('123456789123456789')\ninsertions = '999 888 777 666 555 444 333 222 111'.split()\nlocations = [random.randrange(len(sequence)) for i in xrange(10)]\nmodifications = zip(locations,insertions)\nprint modifications\n# sort them by location.\n# Since Python 2.2, sorts are guaranteed to be stable,\n# so if you insert 999 into 1, then 222 into 1, this will keep them\n# in the right order\nmodifications.sort(key=operator.itemgetter(0))\nprint modifications\n# apply in reverse order\nfor i,seq in reversed(modifications):\n    print 'insert {} into {}'.format(seq,i)\n    # Here's where using a mutable list helps\n    sequence[i:i] = list(seq)\n    print ''.join(sequence)\n
\n

Result:

\n
[(11, '999'), (8, '888'), (7, '777'), (15, '666'), (12, '555'), (11, '444'), (0, '333'), (0, '222'), (15, '111')]\n[(0, '333'), (0, '222'), (7, '777'), (8, '888'), (11, '999'), (11, '444'), (12, '555'), (15, '666'), (15, '111')]\ninsert 111 into 15\n123456789123456111789\ninsert 666 into 15\n123456789123456666111789\ninsert 555 into 12\n123456789123555456666111789\ninsert 444 into 11\n123456789124443555456666111789\ninsert 999 into 11\n123456789129994443555456666111789\ninsert 888 into 8\n123456788889129994443555456666111789\ninsert 777 into 7\n123456777788889129994443555456666111789\ninsert 222 into 0\n222123456777788889129994443555456666111789\ninsert 333 into 0\n333222123456777788889129994443555456666111789\n
\n soup wrap:

You can do this by sorting on location and applying in reverse order. Is order important in case of ties? Then sort only by location, not location and sequence, so they will insert in the correct order. For example, if inserting 999@1 then 888@1, if you sorted on both values you'd get 888@1,999@1.

12345
18889992345

But sorting only by location with a stable sort gives 999@1,888@1

12345
1999888345

Here's the code:

import random
import operator

# Easier to use a mutable list than an immutable string for insertion.
sequence = list('123456789123456789')
insertions = '999 888 777 666 555 444 333 222 111'.split()
locations = [random.randrange(len(sequence)) for i in xrange(10)]
modifications = zip(locations,insertions)
print modifications
# sort them by location.
# Since Python 2.2, sorts are guaranteed to be stable,
# so if you insert 999 into 1, then 222 into 1, this will keep them
# in the right order
modifications.sort(key=operator.itemgetter(0))
print modifications
# apply in reverse order
for i,seq in reversed(modifications):
    print 'insert {} into {}'.format(seq,i)
    # Here's where using a mutable list helps
    sequence[i:i] = list(seq)
    print ''.join(sequence)

Result:

[(11, '999'), (8, '888'), (7, '777'), (15, '666'), (12, '555'), (11, '444'), (0, '333'), (0, '222'), (15, '111')]
[(0, '333'), (0, '222'), (7, '777'), (8, '888'), (11, '999'), (11, '444'), (12, '555'), (15, '666'), (15, '111')]
insert 111 into 15
123456789123456111789
insert 666 into 15
123456789123456666111789
insert 555 into 12
123456789123555456666111789
insert 444 into 11
123456789124443555456666111789
insert 999 into 11
123456789129994443555456666111789
insert 888 into 8
123456788889129994443555456666111789
insert 777 into 7
123456777788889129994443555456666111789
insert 222 into 0
222123456777788889129994443555456666111789
insert 333 into 0
333222123456777788889129994443555456666111789
qid & accept id: (11179879, 11179950) query: Simple OOP: Python, saving object names via __init__ v2 soup:
lis = []\nclass Object():\n    def __init__(self, var):\n        self.something = var \n        lis.append(self)  #here self is the reference to the instance being created and you can save it in a list to access it later\nxxx = Object('123')\nxx = Object('12')\nx = Object('1')\n\nfor x in lis:\n    print(x.something)\n
\n

output:

\n
123\n12\n1\n
\n soup wrap:
lis = []
class Object():
    def __init__(self, var):
        self.something = var 
        lis.append(self)  #here self is the reference to the instance being created and you can save it in a list to access it later
xxx = Object('123')
xx = Object('12')
x = Object('1')

for x in lis:
    print(x.something)

output:

123
12
1
qid & accept id: (11185516, 11186421) query: Pythonic solution for conditional arguments passing soup:

If you don't want to change anything in func then the sensible option would be passing a dict of arguments to the function:

\n
>>> def func(a=0,b=10):\n...  return a+b\n...\n>>> args = {'a':15,'b':15}\n>>> func(**args)\n30\n>>> args={'a':15}\n>>> func(**args)\n25\n>>> args={'b':6}\n>>> func(**args)\n6\n>>> args = {}\n>>> func(**args)\n10\n
\n

or just:

\n
>>>func(**{'a':7})\n17\n
\n soup wrap:

If you don't want to change anything in func then the sensible option would be passing a dict of arguments to the function:

>>> def func(a=0,b=10):
...  return a+b
...
>>> args = {'a':15,'b':15}
>>> func(**args)
30
>>> args={'a':15}
>>> func(**args)
25
>>> args={'b':6}
>>> func(**args)
6
>>> args = {}
>>> func(**args)
10

or just:

>>>func(**{'a':7})
17
qid & accept id: (11198718, 11198743) query: Writing to a file in a for loop soup:

That is because you are opening , writing and closing the file 10 times inside your for loop

\n
myfile = open('xyz.txt', 'w')\nmyfile.writelines(var1)\nmyfile.close()\n
\n

You should open and close your file outside for loop.

\n
myfile = open('xyz.txt', 'w')\nfor line in lines:\n    var1, var2 = line.split(",");\n    myfile.write("%s\n" % var1)\n\nmyfile.close()\ntext_file.close()\n
\n

You should also notice to use write and not writelines.

\n

writelines writes a list of lines to your file.

\n

Also you should check out the answers posted by folks here that uses with statement. That is the elegant way to do file read/write operations in Python

\n soup wrap:

That is because you are opening , writing and closing the file 10 times inside your for loop

myfile = open('xyz.txt', 'w')
myfile.writelines(var1)
myfile.close()

You should open and close your file outside for loop.

myfile = open('xyz.txt', 'w')
for line in lines:
    var1, var2 = line.split(",");
    myfile.write("%s\n" % var1)

myfile.close()
text_file.close()

You should also notice to use write and not writelines.

writelines writes a list of lines to your file.

Also you should check out the answers posted by folks here that uses with statement. That is the elegant way to do file read/write operations in Python

qid & accept id: (11207302, 11207442) query: How to search a string with the url patterns in django? soup:

You can simply try to resolve the address to a view:

\n
from django.core.urlresolvers import resolve\nfrom myapp.views import user_profile_view\n\ntry:\n    my_view = resolve("/%s/" % user_name)\n    if my_view == user_profile_view:\n        # We match the user_profile_view, so that's OK.\n    else:\n        # oops, we have another view that is mapped on that URL\n    # you already have something mapped on this address\nexcept:\n    # app doesn't have such path\n
\n

EDIT:

\n

you can also make the check in a different way:

\n
def user_profile_view(request, user_name):\n    # some code here\n\nuser_profile_view.name = "User Profile View"\n
\n

and then the check above could be:

\n
if getattr(my_view, "name", None) == "User Profile View":\n    ...\n
\n soup wrap:

You can simply try to resolve the address to a view:

from django.core.urlresolvers import resolve
from myapp.views import user_profile_view

try:
    my_view = resolve("/%s/" % user_name)
    if my_view == user_profile_view:
        # We match the user_profile_view, so that's OK.
    else:
        # oops, we have another view that is mapped on that URL
    # you already have something mapped on this address
except:
    # app doesn't have such path

EDIT:

you can also make the check in a different way:

def user_profile_view(request, user_name):
    # some code here

user_profile_view.name = "User Profile View"

and then the check above could be:

if getattr(my_view, "name", None) == "User Profile View":
    ...
qid & accept id: (11239815, 11239899) query: To sum column with condition soup:
with open('data.txt') as f:\n    next(f)\n    d=dict()\n    for x in f:\n        if x.split()[0] not in d:\n            d[x.split()[0]]=float(x.split()[2])\n        else:\n            d[x.split()[0]]+=float(x.split()[2])\n
\n

output:

\n
{'11': 9.7756, '10': 9.791699999999999, '12': 9.7925}\n
\n soup wrap:
with open('data.txt') as f:
    next(f)
    d=dict()
    for x in f:
        if x.split()[0] not in d:
            d[x.split()[0]]=float(x.split()[2])
        else:
            d[x.split()[0]]+=float(x.split()[2])

output:

{'11': 9.7756, '10': 9.791699999999999, '12': 9.7925}
qid & accept id: (11242667, 11242838) query: How to parse Apple's IAP receipt mal-formatted JSON? soup:

That is indeed rather messed up. A quick fix would be to replace the offending separators with a regular expression:

\n
line = re.compile(r'("[^"]*")\s*=\s*("[^"]*");')\nresult = line.sub(r'\1: \2,', result)\n
\n

You'll also need to remove the last comma:

\n
trailingcomma = re.compile(r',(\s*})')\nresult = trailingcomma.sub(r'\1', result)\n
\n

With these operations the example loads as json:

\n
>>> import json, re\n>>> line = re.compile('("[^"]*")\s*=\s*("[^"]*");')\n>>> result = '''\\n... {\n...     "original-purchase-date-pst" = "2012-06-28 02:46:02 America/Los_Angeles";\n...     "original-transaction-id" = "1000000051960431";\n...     "bvrs" = "1.0";\n...     "transaction-id" = "1000000051960431";\n...     "quantity" = "1";\n...     "original-purchase-date-ms" = "1340876762450";\n...     "product-id" = "com.x";\n...     "item-id" = "523404215";\n...     "bid" = "com.x";\n...     "purchase-date-ms" = "1340876762450";\n...     "purchase-date" = "2012-06-28 09:46:02 Etc/GMT";\n...     "purchase-date-pst" = "2012-06-28 02:46:02 America/Los_Angeles";\n...     "original-purchase-date" = "2012-06-28 09:46:02 Etc/GMT";\n... }\n... '''\n>>> line = re.compile(r'("[^"]*")\s*=\s*("[^"]*");')\n>>> trailingcomma = re.compile(r',(\s*})')\n>>> corrected = trailingcomma.sub(r'\1', line.sub(r'\1: \2,', result))\n>>> json.loads(corrected)\n{u'product-id': u'com.x', u'purchase-date-pst': u'2012-06-28 02:46:02 America/Los_Angeles', u'transaction-id': u'1000000051960431', u'original-purchase-date-pst': u'2012-06-28 02:46:02 America/Los_Angeles', u'bid': u'com.x', u'purchase-date-ms': u'1340876762450', u'original-transaction-id': u'1000000051960431', u'bvrs': u'1.0', u'original-purchase-date-ms': u'1340876762450', u'purchase-date': u'2012-06-28 09:46:02 Etc/GMT', u'original-purchase-date': u'2012-06-28 09:46:02 Etc/GMT', u'item-id': u'523404215', u'quantity': u'1'}\n
\n

It should handle nested mappings as well. This does assume there are no escaped " quotes in the values themselves though. If there are you'll need a parser anyway.

\n soup wrap:

That is indeed rather messed up. A quick fix would be to replace the offending separators with a regular expression:

line = re.compile(r'("[^"]*")\s*=\s*("[^"]*");')
result = line.sub(r'\1: \2,', result)

You'll also need to remove the last comma:

trailingcomma = re.compile(r',(\s*})')
result = trailingcomma.sub(r'\1', result)

With these operations the example loads as json:

>>> import json, re
>>> line = re.compile('("[^"]*")\s*=\s*("[^"]*");')
>>> result = '''\
... {
...     "original-purchase-date-pst" = "2012-06-28 02:46:02 America/Los_Angeles";
...     "original-transaction-id" = "1000000051960431";
...     "bvrs" = "1.0";
...     "transaction-id" = "1000000051960431";
...     "quantity" = "1";
...     "original-purchase-date-ms" = "1340876762450";
...     "product-id" = "com.x";
...     "item-id" = "523404215";
...     "bid" = "com.x";
...     "purchase-date-ms" = "1340876762450";
...     "purchase-date" = "2012-06-28 09:46:02 Etc/GMT";
...     "purchase-date-pst" = "2012-06-28 02:46:02 America/Los_Angeles";
...     "original-purchase-date" = "2012-06-28 09:46:02 Etc/GMT";
... }
... '''
>>> line = re.compile(r'("[^"]*")\s*=\s*("[^"]*");')
>>> trailingcomma = re.compile(r',(\s*})')
>>> corrected = trailingcomma.sub(r'\1', line.sub(r'\1: \2,', result))
>>> json.loads(corrected)
{u'product-id': u'com.x', u'purchase-date-pst': u'2012-06-28 02:46:02 America/Los_Angeles', u'transaction-id': u'1000000051960431', u'original-purchase-date-pst': u'2012-06-28 02:46:02 America/Los_Angeles', u'bid': u'com.x', u'purchase-date-ms': u'1340876762450', u'original-transaction-id': u'1000000051960431', u'bvrs': u'1.0', u'original-purchase-date-ms': u'1340876762450', u'purchase-date': u'2012-06-28 09:46:02 Etc/GMT', u'original-purchase-date': u'2012-06-28 09:46:02 Etc/GMT', u'item-id': u'523404215', u'quantity': u'1'}

It should handle nested mappings as well. This does assume there are no escaped " quotes in the values themselves though. If there are you'll need a parser anyway.

qid & accept id: (11255432, 11281579) query: Python module for playing sound data with progress bar? soup:

If you know the number of audio frames, and the samplerate, you don't need to audiolab to tell you the current location, you can compute it.

\n

Sndfile.frames / Sndfile.samplerate will give you the duration of the file in seconds, you can then use this in conjunction with elapsed time since since sound file start to compute relative current location. To illustrate the principle:

\n
import time\n\nstart_time = time.time()\nduration_s = sndfile.frames / sndfile.samplerate\n\nwhile 1:\n    elapsed_time = time.time() - start_time\n    current_location = elapsed_time / float(duration_s)\n    if current_location >= 1:\n         break\n    time.sleep(.01)\n
\n

To implement this in practice, you could use Python threading, to play the sound file asynchronously, and then compute the current location (as above) in the parent thread. To handle the case where playback fails, wrap your call to scikits.audiolab.play() in an exception handler, and then use threading.Event to pass an event to the parent thread if/when the play() call fails.

\n

In the parent thread you would then need to check event.isSet() accordingly:

\n
if current_location >= 1 or fail_event.isSet():\n    break\n
\n soup wrap:

If you know the number of audio frames, and the samplerate, you don't need to audiolab to tell you the current location, you can compute it.

Sndfile.frames / Sndfile.samplerate will give you the duration of the file in seconds, you can then use this in conjunction with elapsed time since since sound file start to compute relative current location. To illustrate the principle:

import time

start_time = time.time()
duration_s = sndfile.frames / sndfile.samplerate

while 1:
    elapsed_time = time.time() - start_time
    current_location = elapsed_time / float(duration_s)
    if current_location >= 1:
         break
    time.sleep(.01)

To implement this in practice, you could use Python threading, to play the sound file asynchronously, and then compute the current location (as above) in the parent thread. To handle the case where playback fails, wrap your call to scikits.audiolab.play() in an exception handler, and then use threading.Event to pass an event to the parent thread if/when the play() call fails.

In the parent thread you would then need to check event.isSet() accordingly:

if current_location >= 1 or fail_event.isSet():
    break
qid & accept id: (11265670, 11266096) query: Difference between two time intervals in series soup:

Just store duration, start and end times in the database. You can always generate time intervals later:

\n
def time_range(start, end, duration):\n    dt = start\n    while dt < end: #note: `end` is not included in the range\n        yield dt\n        dt += duration\n
\n

Example

\n
from datetime import datetime, timedelta\n\n# dummy data\nduration = timedelta(minutes=10)\nstart = datetime.utcnow()\nend = start + timedelta(hours=16)\n\n# use list instead of tee(), islice() for simplicity \nlst = [dt.strftime('%H:%M') for dt in time_range(start, end, duration)] \nfor interval in zip(lst, lst[1:]):\n    print "%s-%s," % interval,\nprint\n
\n soup wrap:

Just store duration, start and end times in the database. You can always generate time intervals later:

def time_range(start, end, duration):
    dt = start
    while dt < end: #note: `end` is not included in the range
        yield dt
        dt += duration

Example

from datetime import datetime, timedelta

# dummy data
duration = timedelta(minutes=10)
start = datetime.utcnow()
end = start + timedelta(hours=16)

# use list instead of tee(), islice() for simplicity 
lst = [dt.strftime('%H:%M') for dt in time_range(start, end, duration)] 
for interval in zip(lst, lst[1:]):
    print "%s-%s," % interval,
print
qid & accept id: (11269104, 11269187) query: Loop through dictionary with django soup:

You should turn this:

\n
d['dict1'] = [('value1', '1'), ('value2', '2')]\nd['dict2'] = [('value1', '3'), ('value2', '4')]\n
\n

into this:

\n
result = [('value1', '1', '3'), ('value2', '2', '4')]\n
\n

You can do this in your view. You are basically preparing your data to be displayed in the template.

\n

You can then iterate over the values easily:

\n
{% for name, v1, v2 in result %}\n{{ v1 }}\n{{ v2 }}\n{% endfor %}\n
\n soup wrap:

You should turn this:

d['dict1'] = [('value1', '1'), ('value2', '2')]
d['dict2'] = [('value1', '3'), ('value2', '4')]

into this:

result = [('value1', '1', '3'), ('value2', '2', '4')]

You can do this in your view. You are basically preparing your data to be displayed in the template.

You can then iterate over the values easily:

{% for name, v1, v2 in result %}
{{ v1 }}
{{ v2 }}
{% endfor %}
qid & accept id: (11313599, 11314571) query: How can I treat a section of a file as though it's a file itself? soup:

I know you were searching for a library, but as soon as I read this question I thought I'd write my own. So here it is:

\n
import os\n\nclass View:\n    def __init__(self, f, offset, length):\n        self.f = f\n        self.f_offset = offset\n        self.offset = 0\n        self.length = length\n\n    def seek(self, offset, whence=0):\n        if whence == os.SEEK_SET:\n            self.offset = offset\n        elif whence == os.SEEK_CUR:\n            self.offset += offset\n        elif whence == os.SEEK_END:\n            self.offset = self.length+offset\n        else:\n            # Other values of whence should raise an IOError\n            return self.f.seek(offset, whence)\n        return self.f.seek(self.offset+self.f_offset, os.SEEK_SET)\n\n    def tell(self):\n        return self.offset\n\n    def read(self, size=-1):\n        self.seek(self.offset)\n        if size<0:\n            size = self.length-self.offset\n        size = max(0, min(size, self.length-self.offset))\n        self.offset += size\n        return self.f.read(size)\n\nif __name__ == "__main__":\n    f = open('test.txt', 'r')\n\n    views = []\n    offsets = [i*11 for i in range(10)]\n\n    for o in offsets:\n        f.seek(o+1)\n        length = int(f.read(1))\n        views.append(View(f, o+2, length))\n\n    f.seek(0)\n\n    completes = {}\n    for v in views:\n        completes[v.f_offset] = v.read()\n        v.seek(0)\n\n    import collections\n    strs = collections.defaultdict(str)\n    for i in range(3):\n        for v in views:\n            strs[v.f_offset] += v.read(3)\n    strs = dict(strs) # We want it to raise KeyErrors after that.\n\n    for offset, s in completes.iteritems():\n        print offset, strs[offset], completes[offset]\n        assert strs[offset] == completes[offset], "Something went wrong!"\n
\n

And I wrote another script to generate the "test.txt" file:

\n
import string, random\n\nf = open('test.txt', 'w')\n\nfor i in range(10):\n    rand_list = list(string.ascii_letters)\n    random.shuffle(rand_list)\n    rand_str = "".join(rand_list[:9])\n    f.write(".%d%s" % (len(rand_str), rand_str))\n
\n

It worked for me. The files I tested on are not binary files like yours, and they're not as big as yours, but this might be useful, I hope. If not, then thank you, that was a good challenge :D

\n

Also, I was wondering, if these are actually multiple files, why not use some kind of an archive file format, and use their libraries to read them?

\n

Hope it helps.

\n soup wrap:

I know you were searching for a library, but as soon as I read this question I thought I'd write my own. So here it is:

import os

class View:
    def __init__(self, f, offset, length):
        self.f = f
        self.f_offset = offset
        self.offset = 0
        self.length = length

    def seek(self, offset, whence=0):
        if whence == os.SEEK_SET:
            self.offset = offset
        elif whence == os.SEEK_CUR:
            self.offset += offset
        elif whence == os.SEEK_END:
            self.offset = self.length+offset
        else:
            # Other values of whence should raise an IOError
            return self.f.seek(offset, whence)
        return self.f.seek(self.offset+self.f_offset, os.SEEK_SET)

    def tell(self):
        return self.offset

    def read(self, size=-1):
        self.seek(self.offset)
        if size<0:
            size = self.length-self.offset
        size = max(0, min(size, self.length-self.offset))
        self.offset += size
        return self.f.read(size)

if __name__ == "__main__":
    f = open('test.txt', 'r')

    views = []
    offsets = [i*11 for i in range(10)]

    for o in offsets:
        f.seek(o+1)
        length = int(f.read(1))
        views.append(View(f, o+2, length))

    f.seek(0)

    completes = {}
    for v in views:
        completes[v.f_offset] = v.read()
        v.seek(0)

    import collections
    strs = collections.defaultdict(str)
    for i in range(3):
        for v in views:
            strs[v.f_offset] += v.read(3)
    strs = dict(strs) # We want it to raise KeyErrors after that.

    for offset, s in completes.iteritems():
        print offset, strs[offset], completes[offset]
        assert strs[offset] == completes[offset], "Something went wrong!"

And I wrote another script to generate the "test.txt" file:

import string, random

f = open('test.txt', 'w')

for i in range(10):
    rand_list = list(string.ascii_letters)
    random.shuffle(rand_list)
    rand_str = "".join(rand_list[:9])
    f.write(".%d%s" % (len(rand_str), rand_str))

It worked for me. The files I tested on are not binary files like yours, and they're not as big as yours, but this might be useful, I hope. If not, then thank you, that was a good challenge :D

Also, I was wondering, if these are actually multiple files, why not use some kind of an archive file format, and use their libraries to read them?

Hope it helps.

qid & accept id: (11314980, 11316764) query: How to recursively call a macro in jinja2? soup:

You can use macros, write a macro for class rendering, and then call it recursively:

\n
{% macro render_class(class) -%}\nclass {{ class.name }}\n{\n{% for field in class.fields %}\n    int {{ field }};\n{% endfor %}\n{% for subclass in class.subclasses %}\n{{ render_class(subclass) }}\n{% endfor %}\n}\n{%- endmacro %}\n\n{% for class in classes %}\n{{ render_class(class) }}\n{% endfor %}\n
\n

This works well, but doesn't deal with the proper indentation of subclasses, yielding code like this:

\n
class Bar\n{\n    int meow;\n    int bark;\n\nclass SubBar\n{\n    int joe;\n    int pete;\n}\n}\n
\n soup wrap:

You can use macros, write a macro for class rendering, and then call it recursively:

{% macro render_class(class) -%}
class {{ class.name }}
{
{% for field in class.fields %}
    int {{ field }};
{% endfor %}
{% for subclass in class.subclasses %}
{{ render_class(subclass) }}
{% endfor %}
}
{%- endmacro %}

{% for class in classes %}
{{ render_class(class) }}
{% endfor %}

This works well, but doesn't deal with the proper indentation of subclasses, yielding code like this:

class Bar
{
    int meow;
    int bark;

class SubBar
{
    int joe;
    int pete;
}
}
qid & accept id: (11340765, 11342481) query: Default window colour Tkinter and hex colour codes soup:

Not sure exactly what you're looking for, but will this work?

\n
import Tkinter\n\nmycolor = '#%02x%02x%02x' % (64, 204, 208)  # set your favourite rgb color\nmycolor2 = '#40E0D0'  # or use hex if you prefer \nroot = Tkinter.Tk()\nroot.configure(bg=mycolor)\nTkinter.Button(root, text="Press me!", bg=mycolor, fg='black',\n               activebackground='black', activeforeground=mycolor2).pack()\nroot.mainloop()\n
\n

If you just want to find the current value of the window, and set widgets to use it, cget might be what you want:

\n
import Tkinter\n\nroot = Tkinter.Tk()\ndefaultbg = root.cget('bg')\nTkinter.Button(root,text="Press me!", bg=defaultbg).pack()\nroot.mainloop()\n
\n

If you want to set the default background color for new widgets, you can use the tk_setPalette(self, *args, **kw) method:

\n
root.tk_setPalette(background='#40E0D0', foreground='black',\n               activeBackground='black', activeForeground=mycolor2)\nTkinter.Button(root, text="Press me!").pack()\n
\n

Then your widgets would have this background color by default, without having to set it in the widget parameters. There's a lot of useful information provided with the inline help functions import Tkinter; help(Tkinter.Tk)

\n soup wrap:

Not sure exactly what you're looking for, but will this work?

import Tkinter

mycolor = '#%02x%02x%02x' % (64, 204, 208)  # set your favourite rgb color
mycolor2 = '#40E0D0'  # or use hex if you prefer 
root = Tkinter.Tk()
root.configure(bg=mycolor)
Tkinter.Button(root, text="Press me!", bg=mycolor, fg='black',
               activebackground='black', activeforeground=mycolor2).pack()
root.mainloop()

If you just want to find the current value of the window, and set widgets to use it, cget might be what you want:

import Tkinter

root = Tkinter.Tk()
defaultbg = root.cget('bg')
Tkinter.Button(root,text="Press me!", bg=defaultbg).pack()
root.mainloop()

If you want to set the default background color for new widgets, you can use the tk_setPalette(self, *args, **kw) method:

root.tk_setPalette(background='#40E0D0', foreground='black',
               activeBackground='black', activeForeground=mycolor2)
Tkinter.Button(root, text="Press me!").pack()

Then your widgets would have this background color by default, without having to set it in the widget parameters. There's a lot of useful information provided with the inline help functions import Tkinter; help(Tkinter.Tk)

qid & accept id: (11345160, 11345241) query: accessing files in a folder using python soup:

Just pass the folder name as a parameter to your python script:

\n
python myscript.py FolderName\n
\n

In myscript.py:

\n
import sys\nprint sys.argv[1]\n
\n

sys.argv gives you all the parameters.

\n soup wrap:

Just pass the folder name as a parameter to your python script:

python myscript.py FolderName

In myscript.py:

import sys
print sys.argv[1]

sys.argv gives you all the parameters.

qid & accept id: (11382536, 11383509) query: Search for a variable in a file and get its value with python soup:

This approach might be one way assuming your file contents is somewhat consistent:

\n

Updated: I added the code necessary to parse the lists which previously wasn't provided.

\n

The code takes all of the data in your file and assigns it to the variables as appropriate types (i.e., float and lists). The list parsing isn't particularly pretty, but it is functional.

\n
import re\nwith open('data.txt') as inf:\n    salary = 0\n    for line in inf:\n        line = line.split('=')\n        line[0] = line[0].strip()\n        if line[0] == 'employee':\n            employee = re.sub(r'[]\[\' ]','', line[1].strip()).split(',')\n        elif line[0] == 'salary':\n            salary = float(line[1])\n        elif line[0] == 'managers':\n            managers = re.sub(r'[]\[\' ]','', line[1].strip()).split(',')\n\nprint employee\nprint salary\nprint managers\n
\n

yields:

\n
['Tom', 'Bob', 'Anny']\n200.0\n['Saly', 'Alice']\n
\n soup wrap:

This approach might be one way assuming your file contents is somewhat consistent:

Updated: I added the code necessary to parse the lists which previously wasn't provided.

The code takes all of the data in your file and assigns it to the variables as appropriate types (i.e., float and lists). The list parsing isn't particularly pretty, but it is functional.

import re
with open('data.txt') as inf:
    salary = 0
    for line in inf:
        line = line.split('=')
        line[0] = line[0].strip()
        if line[0] == 'employee':
            employee = re.sub(r'[]\[\' ]','', line[1].strip()).split(',')
        elif line[0] == 'salary':
            salary = float(line[1])
        elif line[0] == 'managers':
            managers = re.sub(r'[]\[\' ]','', line[1].strip()).split(',')

print employee
print salary
print managers

yields:

['Tom', 'Bob', 'Anny']
200.0
['Saly', 'Alice']
qid & accept id: (11388032, 11388156) query: How to figure out if a word in spelled in alphabetical order in Python soup:

Believe it or not, all characters are already implicitly assigned a number: their ASCII character codes. You can access them by using the ord() function, or compare them directly:

\n
>>> "a" > "b"\nFalse\n\n>>> "b" > "a"\nTrue\n
\n

Beware though, capital letters are coded 65 - 90, while lowercase letters are coded 97 - 122, so:

\n
>>> "C" > "b"\nFalse\n
\n

You have to ensure that you are comparing all uppercase or all lowercase letters.

\n

Here's one possible function that uses the above information to check if a given string is in alphabetical order, just to get you started:

\n
def isAlphabetical(word):\n    for i in xrange(len(word) - 1):\n        if word[i] > word[i+1]:\n            return False\n    return True\n
\n soup wrap:

Believe it or not, all characters are already implicitly assigned a number: their ASCII character codes. You can access them by using the ord() function, or compare them directly:

>>> "a" > "b"
False

>>> "b" > "a"
True

Beware though, capital letters are coded 65 - 90, while lowercase letters are coded 97 - 122, so:

>>> "C" > "b"
False

You have to ensure that you are comparing all uppercase or all lowercase letters.

Here's one possible function that uses the above information to check if a given string is in alphabetical order, just to get you started:

def isAlphabetical(word):
    for i in xrange(len(word) - 1):
        if word[i] > word[i+1]:
            return False
    return True
qid & accept id: (11390421, 11390858) query: Put all files with same name in a folder soup:

First, create a dict (a defaultdict was even more convenient here) that will gather the files for a date (it's good to use re, but given the names of your files using split was easier):

\n
>>> import os\n>>> import re\n>>> pat = r'(\d+)(?:_\d+)?_(\w+?)[\._].*'\n>>> from collections import defaultdict\n>>> dict_date = defaultdict(lambda : defaultdict(list))\n>>> for fil in os.listdir(path):\n    if os.path.isfile(os.path.join(path, fil)):\n        date, animal = re.match(pat, fil).groups()\n        dict_date[date][animal].append(fil)\n\n\n>>> dict_date['20120807']\ndefaultdict(, {'first': ['20120807_first_day_pic.jpg', '20120807_first_day_sheet.jpg', '20120807_first_day_sheet2.jpg']})\n
\n

Then for each date, create a subfolder and copy the corresponding files there:

\n
>>> from shutil import copyfile\n>>> for date in dict_date:\n        for animal in dict_date[date]:\n        try:\n            os.makedirs(os.path.join(path, date, animal))\n        except os.error:\n            pass\n        for fil in dict_date[date][animal]:\n            copyfile(os.path.join(path, fil), os.path.join(path, date, animal, fil))\n
\n

EDIT: took into account OP's new requirements, and Khalid's remark.

\n soup wrap:

First, create a dict (a defaultdict was even more convenient here) that will gather the files for a date (it's good to use re, but given the names of your files using split was easier):

>>> import os
>>> import re
>>> pat = r'(\d+)(?:_\d+)?_(\w+?)[\._].*'
>>> from collections import defaultdict
>>> dict_date = defaultdict(lambda : defaultdict(list))
>>> for fil in os.listdir(path):
    if os.path.isfile(os.path.join(path, fil)):
        date, animal = re.match(pat, fil).groups()
        dict_date[date][animal].append(fil)


>>> dict_date['20120807']
defaultdict(, {'first': ['20120807_first_day_pic.jpg', '20120807_first_day_sheet.jpg', '20120807_first_day_sheet2.jpg']})

Then for each date, create a subfolder and copy the corresponding files there:

>>> from shutil import copyfile
>>> for date in dict_date:
        for animal in dict_date[date]:
        try:
            os.makedirs(os.path.join(path, date, animal))
        except os.error:
            pass
        for fil in dict_date[date][animal]:
            copyfile(os.path.join(path, fil), os.path.join(path, date, animal, fil))

EDIT: took into account OP's new requirements, and Khalid's remark.

qid & accept id: (11444222, 11444288) query: How to set the alpha value for each element of a numpy array soup:

The docs ( http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.imshow ) say that you can pass an MxNx4 array of RGBA values to imshow. So, assuming ca_map is MxNx3, you could do something like:

\n
plt.imshow(np.dstack([ca_map, alpha], ...)\n
\n

Or if ca_map is MxN, then:

\n
plt.imshow(np.dstack([ca_map, ca_map, ca_map, alpha], ...)\n
\n soup wrap:

The docs ( http://matplotlib.sourceforge.net/api/pyplot_api.html#matplotlib.pyplot.imshow ) say that you can pass an MxNx4 array of RGBA values to imshow. So, assuming ca_map is MxNx3, you could do something like:

plt.imshow(np.dstack([ca_map, alpha], ...)

Or if ca_map is MxN, then:

plt.imshow(np.dstack([ca_map, ca_map, ca_map, alpha], ...)
qid & accept id: (11457839, 11458784) query: Populating a table in PyQt with file attributes soup:

See populate method. Also there is some examples in documentation

\n
#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport sys\nfrom PyQt4 import QtCore, QtGui\n\n\nclass MainWindow(QtGui.QWidget):\n\n    def __init__(self, parent=None):\n\n        self.fileheader_fields=(\n            "filetype","fileversion","numframes",\n            "framerate","resolution","numbeams",\n            "samplerate","samplesperchannel","receivergain",\n            "windowstart","winlengthsindex","reverse",\n            "serialnumber","date","idstring","ID1","ID2",\n            "ID3","ID4","framestart","frameend","timelapse",\n            "recordInterval","radioseconds","frameinterval","userassigned"\n        )\n        # just for test\n        self.fileheader = {field: 'value of ' + field \n                           for field in self.fileheader_fields}\n        super(MainWindow, self).__init__(parent)\n        self.table_widget = QtGui.QTableWidget()\n        layout = QtGui.QVBoxLayout()\n        layout.addWidget(self.table_widget)\n        self.setLayout(layout)\n        self.populate()\n\n    def populate(self):\n        self.table_widget.setRowCount(len(self.fileheader_fields))\n        self.table_widget.setColumnCount(2)\n        self.table_widget.setHorizontalHeaderLabels(['name', 'value'])\n        for i, field in enumerate(self.fileheader_fields):\n            name = QtGui.QTableWidgetItem(field)\n            value = QtGui.QTableWidgetItem(self.fileheader[field])\n            self.table_widget.setItem(i, 0, name)\n            self.table_widget.setItem(i, 1, value)\n\n\nif __name__ == "__main__":\n    app = QtGui.QApplication(sys.argv)\n    wnd = MainWindow()\n    wnd.resize(640, 480)\n    wnd.show()\n    sys.exit(app.exec_())\n
\n

UPD

\n

Code for your concrete case:

\n
from fileheader import FileHeader, Frame\nfrom echogram import QEchogram\nfrom PyQt4.QtGui import *\nfrom PyQt4.QtCore import *\nimport os, sys\n\n\nclass MainWindow(QWidget):\n\n    def __init__(self, filename, parent=None):\n        super(MainWindow, self).__init__(parent)\n        # here we are loading file\n        # now self.fileheader contains attributes\n        self.fileheader = FileHeader(filename)\n        self.fileheader_table = QTableWidget()\n        layout = QVBoxLayout()\n        layout.addWidget(self.fileheader_table)\n        self.setLayout(layout)\n        self.populate()\n\n    def populate(self):\n        self.fileheader_table.setRowCount(len(self.fileheader.fileheader_fields))\n        self.fileheader_table.sestColumnCount(2)\n        self.fileheader_table.setHorizontalHeaderLabels(['name','value'])\n        for i,field in enumerate(self.fileheader.fileheader_fields):\n            name=QTableWidgetItem(field)\n            value=QTableWidgetItem(getattr(self.fileheader, field))\n            self.fileheader_table.setItem(i,0,name)\n            self.fileheader_table.setItem(i,1,value)\n\n    if __name__=="__main__":\n        app=QApplication(sys.argv)\n        filename=str(QFileDialog.getOpenFileName(None,"open  file","C:/vprice/DIDSON/DIDSON  Data","*.ddf"))\n        wnd=MainWindow(filename)\n        wnd.resize(640,480)\n        wnd.show()\n        #echoGram=QEchogram()\n        #echoGram.initFromFile(filename)\n        #fileName="test.png"\n        #echoGram.processEchogram()\n        #dataH=echoGram.data\n        #print "Horizontal data", dataH\n
\n soup wrap:

See populate method. Also there is some examples in documentation

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
from PyQt4 import QtCore, QtGui


class MainWindow(QtGui.QWidget):

    def __init__(self, parent=None):

        self.fileheader_fields=(
            "filetype","fileversion","numframes",
            "framerate","resolution","numbeams",
            "samplerate","samplesperchannel","receivergain",
            "windowstart","winlengthsindex","reverse",
            "serialnumber","date","idstring","ID1","ID2",
            "ID3","ID4","framestart","frameend","timelapse",
            "recordInterval","radioseconds","frameinterval","userassigned"
        )
        # just for test
        self.fileheader = {field: 'value of ' + field 
                           for field in self.fileheader_fields}
        super(MainWindow, self).__init__(parent)
        self.table_widget = QtGui.QTableWidget()
        layout = QtGui.QVBoxLayout()
        layout.addWidget(self.table_widget)
        self.setLayout(layout)
        self.populate()

    def populate(self):
        self.table_widget.setRowCount(len(self.fileheader_fields))
        self.table_widget.setColumnCount(2)
        self.table_widget.setHorizontalHeaderLabels(['name', 'value'])
        for i, field in enumerate(self.fileheader_fields):
            name = QtGui.QTableWidgetItem(field)
            value = QtGui.QTableWidgetItem(self.fileheader[field])
            self.table_widget.setItem(i, 0, name)
            self.table_widget.setItem(i, 1, value)


if __name__ == "__main__":
    app = QtGui.QApplication(sys.argv)
    wnd = MainWindow()
    wnd.resize(640, 480)
    wnd.show()
    sys.exit(app.exec_())

UPD

Code for your concrete case:

from fileheader import FileHeader, Frame
from echogram import QEchogram
from PyQt4.QtGui import *
from PyQt4.QtCore import *
import os, sys


class MainWindow(QWidget):

    def __init__(self, filename, parent=None):
        super(MainWindow, self).__init__(parent)
        # here we are loading file
        # now self.fileheader contains attributes
        self.fileheader = FileHeader(filename)
        self.fileheader_table = QTableWidget()
        layout = QVBoxLayout()
        layout.addWidget(self.fileheader_table)
        self.setLayout(layout)
        self.populate()

    def populate(self):
        self.fileheader_table.setRowCount(len(self.fileheader.fileheader_fields))
        self.fileheader_table.sestColumnCount(2)
        self.fileheader_table.setHorizontalHeaderLabels(['name','value'])
        for i,field in enumerate(self.fileheader.fileheader_fields):
            name=QTableWidgetItem(field)
            value=QTableWidgetItem(getattr(self.fileheader, field))
            self.fileheader_table.setItem(i,0,name)
            self.fileheader_table.setItem(i,1,value)

    if __name__=="__main__":
        app=QApplication(sys.argv)
        filename=str(QFileDialog.getOpenFileName(None,"open  file","C:/vprice/DIDSON/DIDSON  Data","*.ddf"))
        wnd=MainWindow(filename)
        wnd.resize(640,480)
        wnd.show()
        #echoGram=QEchogram()
        #echoGram.initFromFile(filename)
        #fileName="test.png"
        #echoGram.processEchogram()
        #dataH=echoGram.data
        #print "Horizontal data", dataH
qid & accept id: (11488877, 11488902) query: Periodically execute function in thread in real time, every N seconds soup:

The simple solution

\n
import threading\n\ndef work (): \n  threading.Timer(0.25, work).start ()\n  print "stackoverflow"\n\nwork ()\n
\n
\n

The above will make sure that work is run with an interval of four times per second, the theory behind this is that it will "queue" a call to itself that will be run 0.25 seconds into the future, without hanging around waiting for that to happen.

\n

Because of this it can do it's work (almost) entirely uninterrupted, and we are extremely close to executing the function exactly 4 times per second.

\n
\n

More about threading.Timer can be read by following the below link to the python documentation:

\n\n
\n

RECOMMENDED] The more advanced/dynamic solution

\n

Even though the previous function works as expected you could create a helper function to aid in dealing with future timed events.

\n

Something as the below will be sufficient for this example, hopefully the code will speak for itself - it is not as advanced as it might appear.

\n

See this as an inspiration when you might implement your own wrapper to fit your exact needs.

\n
import threading\n\ndef do_every (interval, worker_func, iterations = 0):\n  if iterations != 1:\n    threading.Timer (\n      interval,\n      do_every, [interval, worker_func, 0 if iterations == 0 else iterations-1]\n    ).start ()\n\n  worker_func ()\n\ndef print_hw ():\n  print "hello world"\n\ndef print_so ():\n  print "stackoverflow"\n\n\n# call print_so every second, 5 times total\ndo_every (1, print_so, 5)\n\n# call print_hw two times per second, forever\ndo_every (0.5, print_hw)\n
\n
\n soup wrap:

The simple solution

import threading

def work (): 
  threading.Timer(0.25, work).start ()
  print "stackoverflow"

work ()

The above will make sure that work is run with an interval of four times per second, the theory behind this is that it will "queue" a call to itself that will be run 0.25 seconds into the future, without hanging around waiting for that to happen.

Because of this it can do it's work (almost) entirely uninterrupted, and we are extremely close to executing the function exactly 4 times per second.


More about threading.Timer can be read by following the below link to the python documentation:


RECOMMENDED] The more advanced/dynamic solution

Even though the previous function works as expected you could create a helper function to aid in dealing with future timed events.

Something as the below will be sufficient for this example, hopefully the code will speak for itself - it is not as advanced as it might appear.

See this as an inspiration when you might implement your own wrapper to fit your exact needs.

import threading

def do_every (interval, worker_func, iterations = 0):
  if iterations != 1:
    threading.Timer (
      interval,
      do_every, [interval, worker_func, 0 if iterations == 0 else iterations-1]
    ).start ()

  worker_func ()

def print_hw ():
  print "hello world"

def print_so ():
  print "stackoverflow"


# call print_so every second, 5 times total
do_every (1, print_so, 5)

# call print_hw two times per second, forever
do_every (0.5, print_hw)

qid & accept id: (11514277, 11514844) query: How can I use pyparsing to data from VC++ autoexp.dat? soup:

Use Group to create a hierarchy in the parsed tokens. Then you can navigate lists of groups and access named fields within them. Something like this:

\n
from pyparsing import *\n\nLPAR,RPAR,COMMA,HASH,COLON = map(Suppress, '(),#:')\nident = Word(alphas+'_', alphanums+'_')\nfnumber = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))\ninumber = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))\nnumber = fnumber | inumber\nref_name = Combine("$" + delimitedList(ident, delim=oneOf(". ->"), combine=True))\nnamed_ref = Group(ident("name") + COLON + ref_name("ref"))\nunnamed_ref = Group(ref_name("ref"))\n\nIF, ELSE = map(Keyword, "if else".split())\n\nstmt = Forward()\n\ndecl = named_ref | unnamed_ref\n\ndef setType(typ):\n    def parseAction(tokens):\n        tokens['type'] = typ\n    return parseAction\ncond = operatorPrecedence(ref_name | number,\n            [\n            (oneOf("< == > <= >= !="), 2, opAssoc.LEFT),\n            ])\nif_else = Group(HASH + IF + LPAR + cond("condition") + RPAR + \n                    LPAR + stmt("then") + RPAR + \n                    Optional(HASH + ELSE + LPAR + stmt("else") + RPAR))\n\nif_else.setParseAction(setType("IF_ELSE"))\ndecl.setParseAction(setType("DECL"))\n\nstmt << Group(decl | if_else | (HASH + LPAR + delimitedList(stmt) + RPAR))\n\nsection = Group(ident("section_name") + LPAR + Group(ZeroOrMore(stmt))("section_body") + RPAR)\n\nparser = OneOrMore(section)\n\n\nsource = """\npreview \n( \n    #if ($e.d.stateFlags == 0) ( \n        $e.d \n    ) #else ( \n        #( $e.d->scheme, $e.d->host, $e.d->path ) \n    ) \n) \nchildren \n( \n    #( \n        scheme: $c.d->scheme, \n        host: $c.d->host, \n        path: $c.d->path, \n        username: $c.d->userName, \n        password: $c.d->password, \n        encodedOriginal: $c.d->encodedOriginal, \n        query: $c.d->query, \n        fragment: $c.d->fragment \n    ) \n)"""\n\n\ndef dump_stmt(tokens, depth=0):\n    if 'type' in tokens:\n        print tokens.type\n        print tokens[0].dump(depth=depth)\n    else:\n        for stmt in tokens:\n            dump_stmt(stmt, depth=depth+1)\n\nfor sec in parser.parseString(source):\n    print sec.dump()\n    print sec.section_name\n    for statement in sec.section_body:\n        dump_stmt(statement)\n    print\n
\n

Prints:

\n
['preview', [[['if', ['$e.d.stateFlags', '==', 0], [['$e.d']], 'else', [[['$e.d->scheme']], [['$e.d->host']], [['$e.d->path']]]]]]]\n- section_body: [[['if', ['$e.d.stateFlags', '==', 0], [['$e.d']], 'else', [[['$e.d->scheme']], [['$e.d->host']], [['$e.d->path']]]]]]\n- section_name: preview\npreview\nIF_ELSE\n['if', ['$e.d.stateFlags', '==', 0], [['$e.d']], 'else', [[['$e.d->scheme']], [['$e.d->host']], [['$e.d->path']]]]\n- condition: ['$e.d.stateFlags', '==', 0]\n- else: [[['$e.d->scheme']], [['$e.d->host']], [['$e.d->path']]]\n- then: [['$e.d']]\n  - type: DECL\n\n['children', [[[['scheme', '$c.d->scheme']], [['host', '$c.d->host']], [['path', '$c.d->path']], [['username', '$c.d->userName']], [['password', '$c.d->password']], [['encodedOriginal', '$c.d->encodedOriginal']], [['query', '$c.d->query']], [['fragment', '$c.d->fragment']]]]]\n- section_body: [[[['scheme', '$c.d->scheme']], [['host', '$c.d->host']], [['path', '$c.d->path']], [['username', '$c.d->userName']], [['password', '$c.d->password']], [['encodedOriginal', '$c.d->encodedOriginal']], [['query', '$c.d->query']], [['fragment', '$c.d->fragment']]]]\n- section_name: children\nchildren\nDECL\n['scheme', '$c.d->scheme']\n  - name: scheme\n  - ref: $c.d->scheme\nDECL\n['host', '$c.d->host']\n  - name: host\n  - ref: $c.d->host\nDECL\n['path', '$c.d->path']\n  - name: path\n  - ref: $c.d->path\nDECL\n['username', '$c.d->userName']\n  - name: username\n  - ref: $c.d->userName\nDECL\n['password', '$c.d->password']\n  - name: password\n  - ref: $c.d->password\nDECL\n['encodedOriginal', '$c.d->encodedOriginal']\n  - name: encodedOriginal\n  - ref: $c.d->encodedOriginal\nDECL\n['query', '$c.d->query']\n  - name: query\n  - ref: $c.d->query\nDECL\n['fragment', '$c.d->fragment']\n  - name: fragment\n  - ref: $c.d->fragment\n
\n soup wrap:

Use Group to create a hierarchy in the parsed tokens. Then you can navigate lists of groups and access named fields within them. Something like this:

from pyparsing import *

LPAR,RPAR,COMMA,HASH,COLON = map(Suppress, '(),#:')
ident = Word(alphas+'_', alphanums+'_')
fnumber = Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
inumber = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
number = fnumber | inumber
ref_name = Combine("$" + delimitedList(ident, delim=oneOf(". ->"), combine=True))
named_ref = Group(ident("name") + COLON + ref_name("ref"))
unnamed_ref = Group(ref_name("ref"))

IF, ELSE = map(Keyword, "if else".split())

stmt = Forward()

decl = named_ref | unnamed_ref

def setType(typ):
    def parseAction(tokens):
        tokens['type'] = typ
    return parseAction
cond = operatorPrecedence(ref_name | number,
            [
            (oneOf("< == > <= >= !="), 2, opAssoc.LEFT),
            ])
if_else = Group(HASH + IF + LPAR + cond("condition") + RPAR + 
                    LPAR + stmt("then") + RPAR + 
                    Optional(HASH + ELSE + LPAR + stmt("else") + RPAR))

if_else.setParseAction(setType("IF_ELSE"))
decl.setParseAction(setType("DECL"))

stmt << Group(decl | if_else | (HASH + LPAR + delimitedList(stmt) + RPAR))

section = Group(ident("section_name") + LPAR + Group(ZeroOrMore(stmt))("section_body") + RPAR)

parser = OneOrMore(section)


source = """
preview 
( 
    #if ($e.d.stateFlags == 0) ( 
        $e.d 
    ) #else ( 
        #( $e.d->scheme, $e.d->host, $e.d->path ) 
    ) 
) 
children 
( 
    #( 
        scheme: $c.d->scheme, 
        host: $c.d->host, 
        path: $c.d->path, 
        username: $c.d->userName, 
        password: $c.d->password, 
        encodedOriginal: $c.d->encodedOriginal, 
        query: $c.d->query, 
        fragment: $c.d->fragment 
    ) 
)"""


def dump_stmt(tokens, depth=0):
    if 'type' in tokens:
        print tokens.type
        print tokens[0].dump(depth=depth)
    else:
        for stmt in tokens:
            dump_stmt(stmt, depth=depth+1)

for sec in parser.parseString(source):
    print sec.dump()
    print sec.section_name
    for statement in sec.section_body:
        dump_stmt(statement)
    print

Prints:

['preview', [[['if', ['$e.d.stateFlags', '==', 0], [['$e.d']], 'else', [[['$e.d->scheme']], [['$e.d->host']], [['$e.d->path']]]]]]]
- section_body: [[['if', ['$e.d.stateFlags', '==', 0], [['$e.d']], 'else', [[['$e.d->scheme']], [['$e.d->host']], [['$e.d->path']]]]]]
- section_name: preview
preview
IF_ELSE
['if', ['$e.d.stateFlags', '==', 0], [['$e.d']], 'else', [[['$e.d->scheme']], [['$e.d->host']], [['$e.d->path']]]]
- condition: ['$e.d.stateFlags', '==', 0]
- else: [[['$e.d->scheme']], [['$e.d->host']], [['$e.d->path']]]
- then: [['$e.d']]
  - type: DECL

['children', [[[['scheme', '$c.d->scheme']], [['host', '$c.d->host']], [['path', '$c.d->path']], [['username', '$c.d->userName']], [['password', '$c.d->password']], [['encodedOriginal', '$c.d->encodedOriginal']], [['query', '$c.d->query']], [['fragment', '$c.d->fragment']]]]]
- section_body: [[[['scheme', '$c.d->scheme']], [['host', '$c.d->host']], [['path', '$c.d->path']], [['username', '$c.d->userName']], [['password', '$c.d->password']], [['encodedOriginal', '$c.d->encodedOriginal']], [['query', '$c.d->query']], [['fragment', '$c.d->fragment']]]]
- section_name: children
children
DECL
['scheme', '$c.d->scheme']
  - name: scheme
  - ref: $c.d->scheme
DECL
['host', '$c.d->host']
  - name: host
  - ref: $c.d->host
DECL
['path', '$c.d->path']
  - name: path
  - ref: $c.d->path
DECL
['username', '$c.d->userName']
  - name: username
  - ref: $c.d->userName
DECL
['password', '$c.d->password']
  - name: password
  - ref: $c.d->password
DECL
['encodedOriginal', '$c.d->encodedOriginal']
  - name: encodedOriginal
  - ref: $c.d->encodedOriginal
DECL
['query', '$c.d->query']
  - name: query
  - ref: $c.d->query
DECL
['fragment', '$c.d->fragment']
  - name: fragment
  - ref: $c.d->fragment
qid & accept id: (11519787, 11519819) query: How to find the list in a list of lists whose sum of elements is the greatest? soup:

max takes a key argument, with it you can tell max how to calculate the value for each item in an iterable. sum will do nicely here:

\n
max(x, key=sum)\n
\n

Demo:

\n
>>> x = [[1,2,3], [4,5,6], [7,8,9], [2,2,0]]\n>>> max(x, key=sum)\n[7, 8, 9]\n
\n

If you need to use a different method of summing your items, you can specify your own functions too; this is not limited to the python built-in functions:

\n
>>> def mymaxfunction(item):\n...     return sum(map(int, item))\n...\n>>> max([['1', '2', '3'], ['7', '8', '9']], key=mymaxfunction)\n['7', '8', '9']\n
\n soup wrap:

max takes a key argument, with it you can tell max how to calculate the value for each item in an iterable. sum will do nicely here:

max(x, key=sum)

Demo:

>>> x = [[1,2,3], [4,5,6], [7,8,9], [2,2,0]]
>>> max(x, key=sum)
[7, 8, 9]

If you need to use a different method of summing your items, you can specify your own functions too; this is not limited to the python built-in functions:

>>> def mymaxfunction(item):
...     return sum(map(int, item))
...
>>> max([['1', '2', '3'], ['7', '8', '9']], key=mymaxfunction)
['7', '8', '9']
qid & accept id: (11576779, 11588376) query: How to extract literal words from a consecutive string efficiently? soup:

I'm not really sure a naive algorithm would serve your purpose well, as pointed out by eumiro, so I'll describe a slightly more complex one.

\n

The idea

\n

The best way to proceed is to model the distribution of the output. A good first approximation is to assume all words are independently distributed. Then you only need to know the relative frequency of all words. It is reasonable to assume that they follow Zipf's law, that is the word with rank n in the list of words has probability roughly 1/(n log N) where N is the number of words in the dictionary.

\n

Once you have fixed the model, you can use dynamic programming to infer the position of the spaces. The most likely sentence is the one that maximizes the product of the probability of each individual word, and it's easy to compute it with dynamic programming. Instead of directly using the probability we use a cost defined as the logarithm of the inverse of the probability to avoid overflows.

\n

The code

\n
import math\n\n# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).\nwords = open("words-by-frequency.txt").read().split()\nwordcost = dict((k,math.log((i+1)*math.log(len(words)))) for i,k in enumerate(words))\nmaxword = max(len(x) for x in words)\n\ndef infer_spaces(s):\n    """Uses dynamic programming to infer the location of spaces in a string\n    without spaces."""\n\n    # Find the best match for the i first characters, assuming cost has\n    # been built for the i-1 first characters.\n    # Returns a pair (match_cost, match_length).\n    def best_match(i):\n        candidates = enumerate(reversed(cost[max(0, i-maxword):i]))\n        return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)\n\n    # Build the cost array.\n    cost = [0]\n    for i in range(1,len(s)+1):\n        c,k = best_match(i)\n        cost.append(c)\n\n    # Backtrack to recover the minimal-cost string.\n    out = []\n    i = len(s)\n    while i>0:\n        c,k = best_match(i)\n        assert c == cost[i]\n        out.append(s[i-k:i])\n        i -= k\n\n    return " ".join(reversed(out))\n
\n

which you can use with

\n
s = 'thumbgreenappleactiveassignmentweeklymetaphor'\nprint(infer_spaces(s))\n
\n
\n

Examples

\n

I am using this quick-and-dirty 125k-word dictionary I put together from a small subset of Wikipedia.

\n
\n

Before: thumbgreenappleactiveassignmentweeklymetaphor.
\nAfter: thumb green apple active assignment weekly metaphor.

\n
\n

\n
\n

Before: thereismassesoftextinformationofpeoplescommentswhichisparsedfromhtmlbuttherearen\n odelimitedcharactersinthemforexamplethumbgreenappleactiveassignmentweeklymetapho\n rapparentlytherearethumbgreenappleetcinthestringialsohavealargedictionarytoquery\n whetherthewordisreasonablesowhatsthefastestwayofextractionthxalot.

\n

After: there is masses of text information of peoples comments which is parsed from html but there are no delimited characters in them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple etc in the string i also have a large dictionary to query whether the word is reasonable so what s the fastest way of extraction thx a lot.

\n
\n

\n
\n

Before: itwasadarkandstormynighttherainfellintorrentsexceptatoccasionalintervalswhenitwascheckedbyaviolentgustofwindwhichsweptupthestreetsforitisinlondonthatoursceneliesrattlingalongthehousetopsandfiercelyagitatingthescantyflameofthelampsthatstruggledagainstthedarkness.

\n

After: it was a dark and stormy night the rain fell in torrents except at occasional intervals when it was checked by a violent gust of wind which swept up the streets for it is in london that our scene lies rattling along the housetops and fiercely agitating the scanty flame of the lamps that struggled against the darkness.

\n
\n

As you can see it is essentially flawless. The most important part is to make sure your word list was trained to a corpus similar to what you will actually encounter, otherwise the results will be very bad.

\n
\n

Optimization

\n

The implementation consumes a linear amount of time and memory, so it is reasonably efficient. If you need further speedups, you can build a suffix tree from the word list to reduce the size of the set of candidates.

\n

If you need to process a very large consecutive string it would be reasonable to split the string to avoid excessive memory usage. For example you could process the text in blocks of 10000 characters plus a margin of 1000 characters on either side to avoid boundary effects. This will keep memory usage to a minimum and will have almost certainly no effect on the quality.

\n soup wrap:

I'm not really sure a naive algorithm would serve your purpose well, as pointed out by eumiro, so I'll describe a slightly more complex one.

The idea

The best way to proceed is to model the distribution of the output. A good first approximation is to assume all words are independently distributed. Then you only need to know the relative frequency of all words. It is reasonable to assume that they follow Zipf's law, that is the word with rank n in the list of words has probability roughly 1/(n log N) where N is the number of words in the dictionary.

Once you have fixed the model, you can use dynamic programming to infer the position of the spaces. The most likely sentence is the one that maximizes the product of the probability of each individual word, and it's easy to compute it with dynamic programming. Instead of directly using the probability we use a cost defined as the logarithm of the inverse of the probability to avoid overflows.

The code

import math

# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k,math.log((i+1)*math.log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)

def infer_spaces(s):
    """Uses dynamic programming to infer the location of spaces in a string
    without spaces."""

    # Find the best match for the i first characters, assuming cost has
    # been built for the i-1 first characters.
    # Returns a pair (match_cost, match_length).
    def best_match(i):
        candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
        return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)

    # Build the cost array.
    cost = [0]
    for i in range(1,len(s)+1):
        c,k = best_match(i)
        cost.append(c)

    # Backtrack to recover the minimal-cost string.
    out = []
    i = len(s)
    while i>0:
        c,k = best_match(i)
        assert c == cost[i]
        out.append(s[i-k:i])
        i -= k

    return " ".join(reversed(out))

which you can use with

s = 'thumbgreenappleactiveassignmentweeklymetaphor'
print(infer_spaces(s))

Examples

I am using this quick-and-dirty 125k-word dictionary I put together from a small subset of Wikipedia.

Before: thumbgreenappleactiveassignmentweeklymetaphor.
After: thumb green apple active assignment weekly metaphor.

Before: thereismassesoftextinformationofpeoplescommentswhichisparsedfromhtmlbuttherearen odelimitedcharactersinthemforexamplethumbgreenappleactiveassignmentweeklymetapho rapparentlytherearethumbgreenappleetcinthestringialsohavealargedictionarytoquery whetherthewordisreasonablesowhatsthefastestwayofextractionthxalot.

After: there is masses of text information of peoples comments which is parsed from html but there are no delimited characters in them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple etc in the string i also have a large dictionary to query whether the word is reasonable so what s the fastest way of extraction thx a lot.

Before: itwasadarkandstormynighttherainfellintorrentsexceptatoccasionalintervalswhenitwascheckedbyaviolentgustofwindwhichsweptupthestreetsforitisinlondonthatoursceneliesrattlingalongthehousetopsandfiercelyagitatingthescantyflameofthelampsthatstruggledagainstthedarkness.

After: it was a dark and stormy night the rain fell in torrents except at occasional intervals when it was checked by a violent gust of wind which swept up the streets for it is in london that our scene lies rattling along the housetops and fiercely agitating the scanty flame of the lamps that struggled against the darkness.

As you can see it is essentially flawless. The most important part is to make sure your word list was trained to a corpus similar to what you will actually encounter, otherwise the results will be very bad.


Optimization

The implementation consumes a linear amount of time and memory, so it is reasonably efficient. If you need further speedups, you can build a suffix tree from the word list to reduce the size of the set of candidates.

If you need to process a very large consecutive string it would be reasonable to split the string to avoid excessive memory usage. For example you could process the text in blocks of 10000 characters plus a margin of 1000 characters on either side to avoid boundary effects. This will keep memory usage to a minimum and will have almost certainly no effect on the quality.

qid & accept id: (11611183, 11612409) query: Replace single quotes with double quotes in python, for use with insert into database soup:

Based on katrielalex's suggestion, how about this:

\n
>>> import re\n>>> s = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"\n>>> def repl(m):\n    if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):\n        return m.group(0)\n    return m.group(1) + "''" + m.group(2)\n\n>>> re.sub("(.)'(.)", repl, s)\n"INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');"\n
\n

and if you're into negative lookbehind assertions, this is the headache inducing pure regex version:

\n
re.sub("((?
\n soup wrap:

Based on katrielalex's suggestion, how about this:

>>> import re
>>> s = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"
>>> def repl(m):
    if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):
        return m.group(0)
    return m.group(1) + "''" + m.group(2)

>>> re.sub("(.)'(.)", repl, s)
"INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');"

and if you're into negative lookbehind assertions, this is the headache inducing pure regex version:

re.sub("((?
qid & accept id: (11623769, 11636054) query: Retrieving the actual 3D coordinates of a point on a triangle that has been flattened to 2 dimensions soup:

You are right that the problem lies in your depth values not being linear. Fortunately, the solution is simple, but a little expensive if calculated per pixels.

\n

Using your barycentric coordinates, rather than interpolating the three Z components directly, you need to interpolate their inverse and reinverse the result. This is called perspective correction.

\n

Example for Z only :

\n
def GetInterpolatedZ(triangle, u, v):\n    z0 = 1.0 / triangle[0].z\n    z1 = 1.0 / triangle[1].z\n    z2 = 1.0 / triangle[2].z\n    z = z0 + u * (z1-z0) + v * (z2-z0)\n    return 1.0/z\n
\n

With triangle a list of three vectors and u and v the barycentric coordinates for triangle[1] and triangle[2] respectively. You will need to remap your Zs before and after the divisions if they are offset.

\n

If you want to interpolate the actual X and Y coordinates, you do something similar. You will need to interpolate x/z and y/z and relinearize the result by multiplying by z.

\n
def GetInterpolatedZ(tri, u, v):\n    t0 = Vec3(tri[0].x/tri[0].z, tri[0].y/tri[0].z, 1.0/tri[0].z)\n    t1 = Vec3(tri[1].x/tri[1].z, tri[1].y/tri[1].z, 1.0/tri[1].z)\n    t2 = Vec3(tri[2].x/tri[2].z, tri[2].y/tri[2].z, 1.0/tri[2].z)\n\n    inter = t0 + u * (t1-t0) + v * (t2-t0)\n    inter.z = 1.0 / inter.z\n    inter.x *= inter.z\n    inter.y *= inter.z\n    return inter\n
\n

Again, tri is a list of the three vectors and u, v are the barycentric coordinates for tri[1], tri[2]. Vec3 is a regular 3 components Euclidean vector type.

\n soup wrap:

You are right that the problem lies in your depth values not being linear. Fortunately, the solution is simple, but a little expensive if calculated per pixels.

Using your barycentric coordinates, rather than interpolating the three Z components directly, you need to interpolate their inverse and reinverse the result. This is called perspective correction.

Example for Z only :

def GetInterpolatedZ(triangle, u, v):
    z0 = 1.0 / triangle[0].z
    z1 = 1.0 / triangle[1].z
    z2 = 1.0 / triangle[2].z
    z = z0 + u * (z1-z0) + v * (z2-z0)
    return 1.0/z

With triangle a list of three vectors and u and v the barycentric coordinates for triangle[1] and triangle[2] respectively. You will need to remap your Zs before and after the divisions if they are offset.

If you want to interpolate the actual X and Y coordinates, you do something similar. You will need to interpolate x/z and y/z and relinearize the result by multiplying by z.

def GetInterpolatedZ(tri, u, v):
    t0 = Vec3(tri[0].x/tri[0].z, tri[0].y/tri[0].z, 1.0/tri[0].z)
    t1 = Vec3(tri[1].x/tri[1].z, tri[1].y/tri[1].z, 1.0/tri[1].z)
    t2 = Vec3(tri[2].x/tri[2].z, tri[2].y/tri[2].z, 1.0/tri[2].z)

    inter = t0 + u * (t1-t0) + v * (t2-t0)
    inter.z = 1.0 / inter.z
    inter.x *= inter.z
    inter.y *= inter.z
    return inter

Again, tri is a list of the three vectors and u, v are the barycentric coordinates for tri[1], tri[2]. Vec3 is a regular 3 components Euclidean vector type.

qid & accept id: (11624362, 11624445) query: Python: Iterating through a set so we don't compare the same objects multiple times? soup:

If your goal is to just compare all the unique combinations of the set, you could make use of itertools.combinations

\n
from itertools import combinations\n\nfor i, j in combinations(self.objects, 2):\n    if pygame.sprite.collide_rect(i, j):\n        grid.collisions.append(Collision(i, j))\n
\n

Example:

\n
aSet = set([1,2,3,4])\nlist(combinations(aSet, 2))\n# [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]\n
\n

combinations produces a generator which is pretty efficient, compared to managing multiple indexs and temporary lists

\n soup wrap:

If your goal is to just compare all the unique combinations of the set, you could make use of itertools.combinations

from itertools import combinations

for i, j in combinations(self.objects, 2):
    if pygame.sprite.collide_rect(i, j):
        grid.collisions.append(Collision(i, j))

Example:

aSet = set([1,2,3,4])
list(combinations(aSet, 2))
# [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]

combinations produces a generator which is pretty efficient, compared to managing multiple indexs and temporary lists

qid & accept id: (11649577, 11649931) query: How to invert a permutation array in numpy soup:

The inverse of a permutation p of np.arange(n) is the array of indices s that sort p, i.e.

\n
p[s] == np.arange(n)\n
\n

must be all true. Such an s is exactly what np.argsort returns:

\n
>>> p = np.array([3, 2, 0, 1])\n>>> np.argsort(p)\narray([2, 3, 1, 0])\n>>> p[np.argsort(p)]\narray([0, 1, 2, 3])\n
\n soup wrap:

The inverse of a permutation p of np.arange(n) is the array of indices s that sort p, i.e.

p[s] == np.arange(n)

must be all true. Such an s is exactly what np.argsort returns:

>>> p = np.array([3, 2, 0, 1])
>>> np.argsort(p)
array([2, 3, 1, 0])
>>> p[np.argsort(p)]
array([0, 1, 2, 3])
qid & accept id: (11668603, 11700137) query: Applying a function by bins on a vector in Numpy soup:

With pandas groupby this would be

\n
import pandas as pd\n\ndef with_pandas_groupby(func, x, b):\n    grouped = pd.Series(x).groupby(b)\n    return grouped.agg(func)\n
\n

Using the example of the OP:

\n
>>> x = [1,2,3,4,5,6]\n>>> b = ["a","b","a","a","c","c"]\n>>> with_pandas_groupby(np.prod, x, b)\na    12\nb     2\nc    30\n
\n

I was just interessted in the speed and so I compared with_pandas_groupby with some functions given in the answer of senderle.

\n
    \n
  • apply_to_bins_groupby

    \n
     3 levels,      100 values: 175 us per loop\n 3 levels,     1000 values: 1.16 ms per loop\n 3 levels,  1000000 values: 1.21 s per loop\n\n10 levels,      100 values: 304 us per loop\n10 levels,     1000 values: 1.32 ms per loop\n10 levels,  1000000 values: 1.23 s per loop\n\n26 levels,      100 values: 554 us per loop\n26 levels,     1000 values: 1.59 ms per loop\n26 levels,  1000000 values: 1.27 s per loop\n
  • \n
  • apply_to_bins3

    \n
     3 levels,      100 values: 136 us per loop\n 3 levels,     1000 values: 259 us per loop\n 3 levels,  1000000 values: 205 ms per loop\n\n10 levels,      100 values: 297 us per loop\n10 levels,     1000 values: 447 us per loop\n10 levels,  1000000 values: 262 ms per loop\n\n26 levels,      100 values: 617 us per loop\n26 levels,     1000 values: 795 us per loop\n26 levels,  1000000 values: 299 ms per loop\n
  • \n
  • with_pandas_groupby

    \n
     3 levels,      100 values: 365 us per loop\n 3 levels,     1000 values: 443 us per loop\n 3 levels,  1000000 values: 89.4 ms per loop\n\n10 levels,      100 values: 369 us per loop\n10 levels,     1000 values: 453 us per loop\n10 levels,  1000000 values: 88.8 ms per loop\n\n26 levels,      100 values: 382 us per loop\n26 levels,     1000 values: 466 us per loop\n26 levels,  1000000 values: 89.9 ms per loop\n
  • \n
\n

So pandas is the fastest for large item size. Further more the number of levels (bins) has no big influence on computation time.\n(Note that the time is calculated starting from numpy arrays and the time to create the pandas.Series is included)

\n

I generated the data with:

\n
def gen_data(levels, size):\n    choices = 'abcdefghijklmnopqrstuvwxyz'\n    levels = np.asarray([l for l in choices[:nlevels]])\n    index = np.random.random_integers(0, levels.size - 1, size)\n    b = levels[index]\n    x = np.arange(1, size + 1)\n    return x, b\n
\n

And then run the benchmark in ipython like this:

\n
In [174]: for nlevels in (3, 10, 26):\n   .....:     for size in (100, 1000, 10e5):\n   .....:         x, b = gen_data(nlevels, size)\n   .....:         print '%2d levels, ' % nlevels, '%7d values:' % size,\n   .....:         %timeit function_to_time(np.prod, x, b)\n   .....:     print\n
\n soup wrap:

With pandas groupby this would be

import pandas as pd

def with_pandas_groupby(func, x, b):
    grouped = pd.Series(x).groupby(b)
    return grouped.agg(func)

Using the example of the OP:

>>> x = [1,2,3,4,5,6]
>>> b = ["a","b","a","a","c","c"]
>>> with_pandas_groupby(np.prod, x, b)
a    12
b     2
c    30

I was just interessted in the speed and so I compared with_pandas_groupby with some functions given in the answer of senderle.

  • apply_to_bins_groupby

     3 levels,      100 values: 175 us per loop
     3 levels,     1000 values: 1.16 ms per loop
     3 levels,  1000000 values: 1.21 s per loop
    
    10 levels,      100 values: 304 us per loop
    10 levels,     1000 values: 1.32 ms per loop
    10 levels,  1000000 values: 1.23 s per loop
    
    26 levels,      100 values: 554 us per loop
    26 levels,     1000 values: 1.59 ms per loop
    26 levels,  1000000 values: 1.27 s per loop
    
  • apply_to_bins3

     3 levels,      100 values: 136 us per loop
     3 levels,     1000 values: 259 us per loop
     3 levels,  1000000 values: 205 ms per loop
    
    10 levels,      100 values: 297 us per loop
    10 levels,     1000 values: 447 us per loop
    10 levels,  1000000 values: 262 ms per loop
    
    26 levels,      100 values: 617 us per loop
    26 levels,     1000 values: 795 us per loop
    26 levels,  1000000 values: 299 ms per loop
    
  • with_pandas_groupby

     3 levels,      100 values: 365 us per loop
     3 levels,     1000 values: 443 us per loop
     3 levels,  1000000 values: 89.4 ms per loop
    
    10 levels,      100 values: 369 us per loop
    10 levels,     1000 values: 453 us per loop
    10 levels,  1000000 values: 88.8 ms per loop
    
    26 levels,      100 values: 382 us per loop
    26 levels,     1000 values: 466 us per loop
    26 levels,  1000000 values: 89.9 ms per loop
    

So pandas is the fastest for large item size. Further more the number of levels (bins) has no big influence on computation time. (Note that the time is calculated starting from numpy arrays and the time to create the pandas.Series is included)

I generated the data with:

def gen_data(levels, size):
    choices = 'abcdefghijklmnopqrstuvwxyz'
    levels = np.asarray([l for l in choices[:nlevels]])
    index = np.random.random_integers(0, levels.size - 1, size)
    b = levels[index]
    x = np.arange(1, size + 1)
    return x, b

And then run the benchmark in ipython like this:

In [174]: for nlevels in (3, 10, 26):
   .....:     for size in (100, 1000, 10e5):
   .....:         x, b = gen_data(nlevels, size)
   .....:         print '%2d levels, ' % nlevels, '%7d values:' % size,
   .....:         %timeit function_to_time(np.prod, x, b)
   .....:     print
qid & accept id: (11676649, 11676980) query: Change specific repeating element in .xml using Python soup:

XPath is great for this kind of stuff. //TYPE[NUMBER='7721' and DATA] will find all the TYPE nodes that have at least one NUMBER child with text '7721' and at least one DATA child:

\n
from lxml import etree\n\nxmlstr = """\n  \n    \n      \n        \n          \n            7297\n            \n          \n          \n            7721\n            A=1,B=2,C=3,\n          \n        \n      \n    \n  \n"""\n\nhtml_element = etree.fromstring(xmlstr)\n\n# find all the TYPE nodes that have NUMBER=7721 and DATA nodes\ntype_nodes = html_element.xpath("//TYPE[NUMBER='7721' and DATA]")\n\n# the for loop is probably superfluous, but who knows, there might be more than one!\nfor t in type_nodes:\n    d = t.find('DATA')\n    # example: append spamandeggs to the end of the data text\n    if d.text is None:\n        d.text = 'spamandeggs'\n    else:\n        d.text += 'spamandeggs'\nprint etree.tostring(html_element)\n
\n

Outputs:

\n
\n  \n    \n      \n        \n          \n            7297\n            \n          \n          \n            7721\n            A=1,B=2,C=3,spamandeggs\n          \n        \n      \n    \n  \n\n
\n soup wrap:

XPath is great for this kind of stuff. //TYPE[NUMBER='7721' and DATA] will find all the TYPE nodes that have at least one NUMBER child with text '7721' and at least one DATA child:

from lxml import etree

xmlstr = """
  
    
      
        
          
            7297
            
          
          
            7721
            A=1,B=2,C=3,
          
        
      
    
  
"""

html_element = etree.fromstring(xmlstr)

# find all the TYPE nodes that have NUMBER=7721 and DATA nodes
type_nodes = html_element.xpath("//TYPE[NUMBER='7721' and DATA]")

# the for loop is probably superfluous, but who knows, there might be more than one!
for t in type_nodes:
    d = t.find('DATA')
    # example: append spamandeggs to the end of the data text
    if d.text is None:
        d.text = 'spamandeggs'
    else:
        d.text += 'spamandeggs'
print etree.tostring(html_element)

Outputs:


  
    
      
        
          
            7297
            
          
          
            7721
            A=1,B=2,C=3,spamandeggs
          
        
      
    
  

qid & accept id: (11691679, 11692226) query: Update dictionary in xml from csv file in python soup:

First, let's write a function that turns one of your strings (from csv or xml) into a dictionary:

\n
def string_to_dict(string):\n    # Split the string on commas\n    list_of_entries = string.split(',')\n    # Each of these entries needs to be split on '='\n    # We'll use a list comprehension\n    list_of_split_entries = map(lambda e: e.split('='), list_of_entries)\n    # Now we have a list of (key, value) pairs.  We can pass this\n    # to the dict() function to get a dictionary out of this, and \n    # that's what we want to return\n    return dict(list_of_split_entries)\n
\n

Now we want to get this dictionary for both the csv data and the xml data:

\n
csv_dict = string_to_dict(csv_data)\n# csv_dict = {'AAK': '1|2|8', 'AAC': '1|1|1'}\nxml_dict = string_to_dict(d.text)\n# xml_dict = {'ABC': '1|3|5', 'FFK': '33', 'AAC': '7|3|8', 'DAK': '5|1|3'}\n
\n

Using the update function, we can add the values from csv_dict to xml_dict, overwriting where they're the same:

\n
xml_dict.update(csv_dict)\n# xml_dict = {'ABC': '1|3|5', 'FFK': '33', 'AAC': '1|1|1', 'AAK': '1|2|8', 'DAK': '5|1|3'}\n
\n

Now we need to get xml_dict back into a string. The simple way to do this is:

\n
# Let's get a list of key=value strings\nlist_of_items = ['%s=%s' % (k, v) for k, v in xml_dict.iteritems()]\n# Now join those items together\nnew_xml_text = ','.join(list_of_items)\nd.text = new_xml_text\n
\n

If you want to keep them sorted, you can do it this way:

\n
d.text = ','.join('%s=%s' % (k, xml_dict[k]) for k in sorted(xml_dict.keys()))\n
\n soup wrap:

First, let's write a function that turns one of your strings (from csv or xml) into a dictionary:

def string_to_dict(string):
    # Split the string on commas
    list_of_entries = string.split(',')
    # Each of these entries needs to be split on '='
    # We'll use a list comprehension
    list_of_split_entries = map(lambda e: e.split('='), list_of_entries)
    # Now we have a list of (key, value) pairs.  We can pass this
    # to the dict() function to get a dictionary out of this, and 
    # that's what we want to return
    return dict(list_of_split_entries)

Now we want to get this dictionary for both the csv data and the xml data:

csv_dict = string_to_dict(csv_data)
# csv_dict = {'AAK': '1|2|8', 'AAC': '1|1|1'}
xml_dict = string_to_dict(d.text)
# xml_dict = {'ABC': '1|3|5', 'FFK': '33', 'AAC': '7|3|8', 'DAK': '5|1|3'}

Using the update function, we can add the values from csv_dict to xml_dict, overwriting where they're the same:

xml_dict.update(csv_dict)
# xml_dict = {'ABC': '1|3|5', 'FFK': '33', 'AAC': '1|1|1', 'AAK': '1|2|8', 'DAK': '5|1|3'}

Now we need to get xml_dict back into a string. The simple way to do this is:

# Let's get a list of key=value strings
list_of_items = ['%s=%s' % (k, v) for k, v in xml_dict.iteritems()]
# Now join those items together
new_xml_text = ','.join(list_of_items)
d.text = new_xml_text

If you want to keep them sorted, you can do it this way:

d.text = ','.join('%s=%s' % (k, xml_dict[k]) for k in sorted(xml_dict.keys()))
qid & accept id: (11762422, 11773540) query: matplotlib wireframe plot / 3d plot howTo soup:

Regarding the display of a serie of vectors in 3D, I came with following 'almost working' solution:

\n
def visualizeSignals(self, imin, imax):\n\n    times = self.time[imin:imax]\n    nrows = (int)((times[(len(times)-1)] - times[0])/self.mod) + 1\n\n    fig = plt.figure('2d profiles')\n    ax = fig.gca(projection='3d')\n    for i in range(nrows-1):\n        x = self.mat1[i][0] + self.mod * i\n        y = np.array(self.mat1T[i])\n        z = np.array(self.mat2[i])\n        ax.plot(y, z, zs = x, zdir='z')\n\n    plt.show()\n
\n

As for 2D surface or meshgrid plot, I come through using meshgrid. Note that you can reproduce a meshgrid by yourself once you know how a meshgrid is built. For more info on meshgrid, I refer to this post.

\n

Here is the code (cannot use it as such since it refers to class members, but you can build your code based on 3d plot methods from matplotlib I am using)

\n
def visualize(self, imin, imax, typ_ = "wireframe"):\n    """\n    3d plot signal between imin and imax\n    . typ_: type of plot, "wireframce", "surface"\n    """\n\n    times = self.retT[imin:imax]\n    nrows = (int)((times[(len(times)-1)] - times[0])/self.mod) + 1\n\n    self.modulate(imin, imax)\n\n    fig = plt.figure('3d view')\n    ax = fig.gca(projection='3d')\n\n    x = []\n    for i in range(nrows):\n        x.append(self.matRetT[i][0] + self.mod * i)\n\n    y = []\n    for i in range(len(self.matRetT[0])):\n        y.append(self.matRetT[0][i])\n    y = y[:-1]\n\n\n    X,Y = np.meshgrid(x,y)\n\n    z = [tuple(self.matGC2D[i]) for i in range(len(self.matGC))] # matGC a matrix\n\n    zzip = zip(*z)\n\n    for i in range(len(z)):\n        print len(z[i])\n\n    if(typ_ == "wireframe"):\n        ax.plot_wireframe(X,Y,zzip)\n        plt.show()\n    elif(typ_ == "contour"):\n        cset = ax.contour(X, Y, zzip, zdir='z', offset=0)\n        plt.show()\n    elif(typ_ == "surf_contours"):\n        surf = ax.plot_surface(X, Y, zzip, rstride=1, cstride=1, alpha=0.3)\n        cset = ax.contour(X, Y, zzip, zdir='z', offset=-40)\n        cset = ax.contour(X, Y, zzip, zdir='x', offset=-40)\n        cset = ax.contour(X, Y, zzip, zdir='y', offset=-40)\n        plt.show()\n
\n soup wrap:

Regarding the display of a serie of vectors in 3D, I came with following 'almost working' solution:

def visualizeSignals(self, imin, imax):

    times = self.time[imin:imax]
    nrows = (int)((times[(len(times)-1)] - times[0])/self.mod) + 1

    fig = plt.figure('2d profiles')
    ax = fig.gca(projection='3d')
    for i in range(nrows-1):
        x = self.mat1[i][0] + self.mod * i
        y = np.array(self.mat1T[i])
        z = np.array(self.mat2[i])
        ax.plot(y, z, zs = x, zdir='z')

    plt.show()

As for 2D surface or meshgrid plot, I come through using meshgrid. Note that you can reproduce a meshgrid by yourself once you know how a meshgrid is built. For more info on meshgrid, I refer to this post.

Here is the code (cannot use it as such since it refers to class members, but you can build your code based on 3d plot methods from matplotlib I am using)

def visualize(self, imin, imax, typ_ = "wireframe"):
    """
    3d plot signal between imin and imax
    . typ_: type of plot, "wireframce", "surface"
    """

    times = self.retT[imin:imax]
    nrows = (int)((times[(len(times)-1)] - times[0])/self.mod) + 1

    self.modulate(imin, imax)

    fig = plt.figure('3d view')
    ax = fig.gca(projection='3d')

    x = []
    for i in range(nrows):
        x.append(self.matRetT[i][0] + self.mod * i)

    y = []
    for i in range(len(self.matRetT[0])):
        y.append(self.matRetT[0][i])
    y = y[:-1]


    X,Y = np.meshgrid(x,y)

    z = [tuple(self.matGC2D[i]) for i in range(len(self.matGC))] # matGC a matrix

    zzip = zip(*z)

    for i in range(len(z)):
        print len(z[i])

    if(typ_ == "wireframe"):
        ax.plot_wireframe(X,Y,zzip)
        plt.show()
    elif(typ_ == "contour"):
        cset = ax.contour(X, Y, zzip, zdir='z', offset=0)
        plt.show()
    elif(typ_ == "surf_contours"):
        surf = ax.plot_surface(X, Y, zzip, rstride=1, cstride=1, alpha=0.3)
        cset = ax.contour(X, Y, zzip, zdir='z', offset=-40)
        cset = ax.contour(X, Y, zzip, zdir='x', offset=-40)
        cset = ax.contour(X, Y, zzip, zdir='y', offset=-40)
        plt.show()
qid & accept id: (11805535, 11805565) query: Transform comma separated string into a list but ignore comma in quotes soup:

Instead of a regular expression, you might be better off using the csv module since what you are dealing with is a CSV string:

\n
from cStringIO import StringIO\nfrom csv import reader\n\nfile_like_object = StringIO("1,,2,'3,4'")\ncsv_reader = reader(file_like_object, quotechar="'")\nfor row in csv_reader:\n    print row\n
\n

This results in the following output:

\n
['1', '', '2', '3,4']\n
\n soup wrap:

Instead of a regular expression, you might be better off using the csv module since what you are dealing with is a CSV string:

from cStringIO import StringIO
from csv import reader

file_like_object = StringIO("1,,2,'3,4'")
csv_reader = reader(file_like_object, quotechar="'")
for row in csv_reader:
    print row

This results in the following output:

['1', '', '2', '3,4']
qid & accept id: (11816741, 11817003) query: Test subclass behaviour? soup:
import unittest\nclass TestCaseA(unittest.TestCase):\n    def setUp(self):\n        self.thing = A()\n\n    def test_does_x():\n        self.assertTrue(self.thing.does_x())\n
\n

Now define a subclass A2:

\n
class A2(A):\n    ...\n
\n

I would subclass your testcase but only override the setUp/setup method as the rest of the API of the subclass A2 should be consistent with A.

\n

So your test subclass would be just:

\n
class TestA2(TestCaseA):\n    def setUp(self):\n        self.thing = A2()\n
\n soup wrap:
import unittest
class TestCaseA(unittest.TestCase):
    def setUp(self):
        self.thing = A()

    def test_does_x():
        self.assertTrue(self.thing.does_x())

Now define a subclass A2:

class A2(A):
    ...

I would subclass your testcase but only override the setUp/setup method as the rest of the API of the subclass A2 should be consistent with A.

So your test subclass would be just:

class TestA2(TestCaseA):
    def setUp(self):
        self.thing = A2()
qid & accept id: (11827100, 11828266) query: Interleaving two numpy index arrays, one item from each array soup:

Vectorised solution (pedagogical style, easily understandable)

\n

We can vectorise this by augmenting the arrays with a discriminator index, such that a is tagged 0 and b is tagged 1:

\n
a_t = np.vstack((a, np.zeros_like(a)))\nb_t = np.vstack((b, np.ones_like(b)))\n
\n

Now, let's combine and sort:

\n
c = np.hstack((a_t, b_t))[:, np.argsort(np.hstack((a, b)))]\narray([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 13, 14, 15, 17, 19, 21, 23],\n       [ 0,  0,  0,  0,  1,  1,  0,  0,  0,  0,  1,  1,  1,  0,  1,  1,  1]])\n
\n

You can see that now the elements are in order but retaining their tags, so we can see which elements came from a and from b.

\n

So, let's select the first element and each element where the tag changes from 0 (for a) to 1 (for b) and back again:

\n
c[:, np.concatenate(([True], c[1, 1:] != c[1, :-1]))][0]\narray([ 1,  5,  7, 13, 17, 19])\n
\n

Efficient vectorised solution

\n

You can do this slightly more efficiently by keeping the items and their tags in separate (but parallel) arrays:

\n
ab = np.hstack((a, b))\ns = np.argsort(ab)\nt = np.hstack((np.zeros_like(a), np.ones_like(b)))[s]\nab[s][np.concatenate(([True], t[1:] != t[:-1]))]\narray([ 1,  5,  7, 13, 17, 19])\n
\n

This is slightly more efficient than the above solution; I get an average of 45 as opposed to 90 microseconds, although your conditions may vary.

\n soup wrap:

Vectorised solution (pedagogical style, easily understandable)

We can vectorise this by augmenting the arrays with a discriminator index, such that a is tagged 0 and b is tagged 1:

a_t = np.vstack((a, np.zeros_like(a)))
b_t = np.vstack((b, np.ones_like(b)))

Now, let's combine and sort:

c = np.hstack((a_t, b_t))[:, np.argsort(np.hstack((a, b)))]
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 13, 14, 15, 17, 19, 21, 23],
       [ 0,  0,  0,  0,  1,  1,  0,  0,  0,  0,  1,  1,  1,  0,  1,  1,  1]])

You can see that now the elements are in order but retaining their tags, so we can see which elements came from a and from b.

So, let's select the first element and each element where the tag changes from 0 (for a) to 1 (for b) and back again:

c[:, np.concatenate(([True], c[1, 1:] != c[1, :-1]))][0]
array([ 1,  5,  7, 13, 17, 19])

Efficient vectorised solution

You can do this slightly more efficiently by keeping the items and their tags in separate (but parallel) arrays:

ab = np.hstack((a, b))
s = np.argsort(ab)
t = np.hstack((np.zeros_like(a), np.ones_like(b)))[s]
ab[s][np.concatenate(([True], t[1:] != t[:-1]))]
array([ 1,  5,  7, 13, 17, 19])

This is slightly more efficient than the above solution; I get an average of 45 as opposed to 90 microseconds, although your conditions may vary.

qid & accept id: (11830474, 11830535) query: Numpy union arrays in order soup:

You can transpose and flatten the arrays:

\n
d = numpy.array([a, b, c]).T.flatten()\n
\n

An alternative way to combine the arrays is to use numpy.vstack():

\n
d = numpy.vstack((a, b, c)).T.flatten()\n
\n

(I don't know which one is faster, by the way.)

\n

Edit: In response to the answer by Nicolas Barbey, here is how to make do with copying the data only once:

\n
d = numpy.empty((len(a), 3), dtype=a.dtype)\nd[:, 0], d[:, 1], d[:, 2] = a, b, c\nd = d.ravel()\n
\n

This code ensures that the data is layed out in a way that ravel()\ndoes not need to make a copy, and indeed it is quite a bit faster than the original code on my machine:

\n
In [1]: a = numpy.arange(0, 30000, 3)\nIn [2]: b = numpy.arange(1, 30000, 3)\nIn [3]: c = numpy.arange(2, 30000, 3)\nIn [4]: def f(a, b, c):\n   ...:     d = numpy.empty((len(a), 3), dtype=a.dtype)\n   ...:     d[:, 0], d[:, 1], d[:, 2] = a, b, c\n   ...:     return d.ravel()\n   ...: \nIn [5]: def g(a, b, c):\n   ...:     return numpy.vstack((a, b, c)).T.ravel()\n   ...: \nIn [6]: %timeit f(a, b, c)\n10000 loops, best of 3: 34.4 us per loop\nIn [7]: %timeit g(a, b, c)\n10000 loops, best of 3: 177 us per loop\n
\n soup wrap:

You can transpose and flatten the arrays:

d = numpy.array([a, b, c]).T.flatten()

An alternative way to combine the arrays is to use numpy.vstack():

d = numpy.vstack((a, b, c)).T.flatten()

(I don't know which one is faster, by the way.)

Edit: In response to the answer by Nicolas Barbey, here is how to make do with copying the data only once:

d = numpy.empty((len(a), 3), dtype=a.dtype)
d[:, 0], d[:, 1], d[:, 2] = a, b, c
d = d.ravel()

This code ensures that the data is layed out in a way that ravel() does not need to make a copy, and indeed it is quite a bit faster than the original code on my machine:

In [1]: a = numpy.arange(0, 30000, 3)
In [2]: b = numpy.arange(1, 30000, 3)
In [3]: c = numpy.arange(2, 30000, 3)
In [4]: def f(a, b, c):
   ...:     d = numpy.empty((len(a), 3), dtype=a.dtype)
   ...:     d[:, 0], d[:, 1], d[:, 2] = a, b, c
   ...:     return d.ravel()
   ...: 
In [5]: def g(a, b, c):
   ...:     return numpy.vstack((a, b, c)).T.ravel()
   ...: 
In [6]: %timeit f(a, b, c)
10000 loops, best of 3: 34.4 us per loop
In [7]: %timeit g(a, b, c)
10000 loops, best of 3: 177 us per loop
qid & accept id: (11832984, 11833030) query: removing first four and last four characters of strings in list, OR removing specific character patterns soup:
def remove_cruft(s):\n    return s[4:-4]\n\nsites=['www.hattrick.com', 'www.google.com', 'www.wampum.net', 'www.newcom.com']\n[remove_cruft(s) for s in sites]\n
\n

result:

\n
['hattrick', 'google', 'wampum', 'newcom']\n
\n

If you know all of the strings you want to strip out, you can use replace to get rid of them. This is useful if you're not sure that all of your URLs will start with "www.", or if the TLD isn't three characters long.

\n
def remove_bad_substrings(s):\n    badSubstrings = ["www.", ".com", ".net", ".museum"]\n    for badSubstring in badSubstrings:\n        s = s.replace(badSubstring, "")\n    return s\n\nsites=['www.hattrick.com', 'www.google.com', \n'www.wampum.net', 'www.newcom.com', 'smithsonian.museum']\n[remove_bad_substrings(s) for s in sites]\n
\n

result:

\n
['hattrick', 'google', 'wampum', 'newcom', 'smithsonian']\n
\n soup wrap:
def remove_cruft(s):
    return s[4:-4]

sites=['www.hattrick.com', 'www.google.com', 'www.wampum.net', 'www.newcom.com']
[remove_cruft(s) for s in sites]

result:

['hattrick', 'google', 'wampum', 'newcom']

If you know all of the strings you want to strip out, you can use replace to get rid of them. This is useful if you're not sure that all of your URLs will start with "www.", or if the TLD isn't three characters long.

def remove_bad_substrings(s):
    badSubstrings = ["www.", ".com", ".net", ".museum"]
    for badSubstring in badSubstrings:
        s = s.replace(badSubstring, "")
    return s

sites=['www.hattrick.com', 'www.google.com', 
'www.wampum.net', 'www.newcom.com', 'smithsonian.museum']
[remove_bad_substrings(s) for s in sites]

result:

['hattrick', 'google', 'wampum', 'newcom', 'smithsonian']
qid & accept id: (11897977, 11898075) query: Search for brackets in a case insensitive using regular expressions soup:
import re\n\ns = "foo[bar]baz"\nm = re.search("[\[\]]", s)\nprint m.group(0)\n# => '['\n\nt = "foo-bar]baz"\nn = re.search("[\[\]]", t)\nprint n.group(0)\n# => ']'\n
\n

In fact, re.IGNORECASE is unnecessary since bracktes have no case.

\n

Edit:

\n
u = "foo\\-bar]baz"\no = re.search('[\[\]]', u) # Does this match the \ ?\nprint o.group(0)\n# => ']'\n# Behold!\n
\n soup wrap:
import re

s = "foo[bar]baz"
m = re.search("[\[\]]", s)
print m.group(0)
# => '['

t = "foo-bar]baz"
n = re.search("[\[\]]", t)
print n.group(0)
# => ']'

In fact, re.IGNORECASE is unnecessary since bracktes have no case.

Edit:

u = "foo\\-bar]baz"
o = re.search('[\[\]]', u) # Does this match the \ ?
print o.group(0)
# => ']'
# Behold!
qid & accept id: (11902626, 11902705) query: Python Django how to rotate image and remove black color? soup:

To resize automatically, you want the expand kwarg:

\n
src_im = Image.open("test.gif")\nim = src_im.rotate(30, expand=True)\nim.save("result.gif")\n
\n

As the PIL docs explain:

\n
\n

The expand argument, if true, indicates that the output image should be made large enough to hold the rotated image. If omitted or false, the output image has the same size as the input image.

\n
\n

On the black background, you need to specify the transparent colour when saving your GIF image:

\n
transparency = im.info['transparency'] \nim.save('icon.gif', transparency=transparency)\n
\n soup wrap:

To resize automatically, you want the expand kwarg:

src_im = Image.open("test.gif")
im = src_im.rotate(30, expand=True)
im.save("result.gif")

As the PIL docs explain:

The expand argument, if true, indicates that the output image should be made large enough to hold the rotated image. If omitted or false, the output image has the same size as the input image.

On the black background, you need to specify the transparent colour when saving your GIF image:

transparency = im.info['transparency'] 
im.save('icon.gif', transparency=transparency)
qid & accept id: (11939631, 11941164) query: Is there an easy way to parse an HTML document and remove everything except a particular table? soup:

You could use lxml library in Python:

\n
#!/usr/bin/env python\nimport urllib2\nfrom lxml import html # $ apt-get install python-lxml or $ pip install lxml\n\npage = urllib2.urlopen('http://stackoverflow.com/q/11939631')\ndoc = html.parse(page).getroot()\n\ndiv = doc.get_element_by_id('question')\nfor tr in div.find('table').iterchildren('tr'):\n    for td in tr.iterchildren('td'):\n        print(td.text_content()) # process td\n
\n

If you are familiar with jQuery; you could use pyquery. It adds jQuery interface on top of lxml:

\n
#!/usr/bin/env python\nfrom pyquery import PyQuery # $ apt-get install python-pyquery or\n                            # $ pip install pyquery\n\n# d is like the $ in jquery\nd = PyQuery(url='http://stackoverflow.com/q/11939631', parser='html')\nfor tr in d("#question table > tr"):\n    for td in tr.iterchildren('td'):\n        print(td.text_content())\n
\n

Though in this case pyquery doesn't add enough. Here's the same using only lxml:

\n
#!/usr/bin/env python\nimport urllib2\nfrom lxml import html\n\npage = urllib2.urlopen('http://stackoverflow.com/q/11939631')\ndoc = html.parse(page).getroot()\nfor tr in doc.cssselect('#question table > tr'):\n    for td in tr.iterchildren('td'):\n        print(td.text_content()) # process td\n
\n

Note: the last two examples enumerate rows in all tables (not just the first one) inside #question element.

\n soup wrap:

You could use lxml library in Python:

#!/usr/bin/env python
import urllib2
from lxml import html # $ apt-get install python-lxml or $ pip install lxml

page = urllib2.urlopen('http://stackoverflow.com/q/11939631')
doc = html.parse(page).getroot()

div = doc.get_element_by_id('question')
for tr in div.find('table').iterchildren('tr'):
    for td in tr.iterchildren('td'):
        print(td.text_content()) # process td

If you are familiar with jQuery; you could use pyquery. It adds jQuery interface on top of lxml:

#!/usr/bin/env python
from pyquery import PyQuery # $ apt-get install python-pyquery or
                            # $ pip install pyquery

# d is like the $ in jquery
d = PyQuery(url='http://stackoverflow.com/q/11939631', parser='html')
for tr in d("#question table > tr"):
    for td in tr.iterchildren('td'):
        print(td.text_content())

Though in this case pyquery doesn't add enough. Here's the same using only lxml:

#!/usr/bin/env python
import urllib2
from lxml import html

page = urllib2.urlopen('http://stackoverflow.com/q/11939631')
doc = html.parse(page).getroot()
for tr in doc.cssselect('#question table > tr'):
    for td in tr.iterchildren('td'):
        print(td.text_content()) # process td

Note: the last two examples enumerate rows in all tables (not just the first one) inside #question element.

qid & accept id: (11943980, 11944048) query: Python how to get sum of numbers in a list that has strings in it as well soup:

Here is a fairly straight forward way using a dictionary comprehension:

\n
sums = {k: sum(i for i in v if isinstance(i, int)) for k, v in d.items()}\n
\n

Or on Python 2.6 and below:

\n
sums = dict((k, sum(i for i in v if isinstance(i, int))) for k, v in d.items())\n
\n

Example:

\n
>>> {k: sum(i for i in v if isinstance(i, int)) for k, v in d.items()}\n{'a': 6, 'c': 7, 'b': 7, 'e': 4, 'd': 7, 'g': 4, 'f': 4}\n
\n soup wrap:

Here is a fairly straight forward way using a dictionary comprehension:

sums = {k: sum(i for i in v if isinstance(i, int)) for k, v in d.items()}

Or on Python 2.6 and below:

sums = dict((k, sum(i for i in v if isinstance(i, int))) for k, v in d.items())

Example:

>>> {k: sum(i for i in v if isinstance(i, int)) for k, v in d.items()}
{'a': 6, 'c': 7, 'b': 7, 'e': 4, 'd': 7, 'g': 4, 'f': 4}
qid & accept id: (11944826, 11944910) query: python count business weeks soup:

Try using datetime.weekday, datetime.isoweekday to get the current day of the week or use the more complete datetime.isocalendar to also get the current week of the year and using those as offsets to calculate an aligned difference.

\n

So you can have a function like this:

\n
def week_difference(start, end):\n    assert start <= end\n    start_year, start_week, start_dayofweek = start.isocalendar()\n    end_year, end_week, end_dayofweek = end.isocalendar()\n\n    return ((end_year - start_year) * 52) - start_week + end_week\n
\n

With usage like this:

\n
import datetime as dt\n# same week\nIn [1]: week_difference(dt.datetime(2012, 8, 1),  dt.datetime(2012, 8, 1))\nOut[1]: 0\n\n# your example (see note below) \nIn [2]: week_difference(dt.datetime(2012, 8, 1),  dt.datetime(2012, 8, 13))\nOut[2]: 2\n\n# across years\nIn [3]: week_difference(dt.datetime(2011, 8, 1),  dt.datetime(2012, 8, 13))\nOut[3]: 54\n\n# year boundary: second last business week of 2011, to first business week of 2012\n# which is the same business week as the last business week of 2011\nIn [4]: week_difference(dt.datetime(2011, 12, 20),  dt.datetime(2012, 1, 1))\nOut[4]: 1\n\nIn [5]: week_difference(dt.datetime(2011, 12, 18),  dt.datetime(2012, 1, 1))\nOut[5]: 2\n
\n

You can add 1 to your week output depending on your chosen semantic of what a week difference should be.

\n soup wrap:

Try using datetime.weekday, datetime.isoweekday to get the current day of the week or use the more complete datetime.isocalendar to also get the current week of the year and using those as offsets to calculate an aligned difference.

So you can have a function like this:

def week_difference(start, end):
    assert start <= end
    start_year, start_week, start_dayofweek = start.isocalendar()
    end_year, end_week, end_dayofweek = end.isocalendar()

    return ((end_year - start_year) * 52) - start_week + end_week

With usage like this:

import datetime as dt
# same week
In [1]: week_difference(dt.datetime(2012, 8, 1),  dt.datetime(2012, 8, 1))
Out[1]: 0

# your example (see note below) 
In [2]: week_difference(dt.datetime(2012, 8, 1),  dt.datetime(2012, 8, 13))
Out[2]: 2

# across years
In [3]: week_difference(dt.datetime(2011, 8, 1),  dt.datetime(2012, 8, 13))
Out[3]: 54

# year boundary: second last business week of 2011, to first business week of 2012
# which is the same business week as the last business week of 2011
In [4]: week_difference(dt.datetime(2011, 12, 20),  dt.datetime(2012, 1, 1))
Out[4]: 1

In [5]: week_difference(dt.datetime(2011, 12, 18),  dt.datetime(2012, 1, 1))
Out[5]: 2

You can add 1 to your week output depending on your chosen semantic of what a week difference should be.

qid & accept id: (11968976, 11969014) query: List files in ONLY the current directory soup:

Just use os.listdir and os.path.isfile instead of os.walk.

\n

Example:

\n
files = [f for f in os.listdir('.') if os.path.isfile(f)]\nfor f in files:\n    # do something\n
\n
\n

But be careful while applying this to other directory, like

\n
files = [f for f in os.listdir(somedir) if os.path.isfile(f)].\n
\n

which would not work because f is not a full path but relative to the current dir.

\n

Therefore, for filtering on another directory, do os.path.isfile(os.path.join(somedir, f))

\n

(Thanks Causality for the hint)

\n soup wrap:

Just use os.listdir and os.path.isfile instead of os.walk.

Example:

files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
    # do something

But be careful while applying this to other directory, like

files = [f for f in os.listdir(somedir) if os.path.isfile(f)].

which would not work because f is not a full path but relative to the current dir.

Therefore, for filtering on another directory, do os.path.isfile(os.path.join(somedir, f))

(Thanks Causality for the hint)

qid & accept id: (11984684, 11986396) query: display only one logging line soup:

On unix-like terminals, you can try prepending ANSI escape sequences to the text;

\n
import time\nimport sys\n\nprint 'this is a text',\nsys.stdout.flush()\n\ntime.sleep(1)\nprint '\x1b[80D'+'\x1b[K'+'Second text',\nsys.stdout.flush()\n
\n

The character '\x1b' is the escape character. The first sequence moves the cursor up to 80 positions to the left. The second clears the line.

\n

You need the comma at the end of the print statement to prevent it from going to the second line. Then you need to flush the stdout stream otherwise the text won't appear.

\n

Edit: For combinging this with logging, wrap it in a simple function:

\n
def mylog(text):\n    logging.info(text)\n    print '\x1b[80D' + '\x1b[K'+ text,\n    sys.stdout.flush()\n
\n

EDIT 2: Integrating this into the standard logging;

\n
import logging\n# create console handler\nch = logging.StreamHandler()\n# create formatter\nformatter = logging.Formatter('\x1b[80D\x1b[1A\x1b[K%(message)s')\n# add formatter to console handler\nch.setFormatter(formatter)\n# add console handler to logger\nlogger.addHandler(ch)\n
\n

Since the logging module seems to add newlines by itself, I've added an ANSI sequense (\x1b[1A) to go up one line.

\n

Also see the logging howto for more information.

\n soup wrap:

On unix-like terminals, you can try prepending ANSI escape sequences to the text;

import time
import sys

print 'this is a text',
sys.stdout.flush()

time.sleep(1)
print '\x1b[80D'+'\x1b[K'+'Second text',
sys.stdout.flush()

The character '\x1b' is the escape character. The first sequence moves the cursor up to 80 positions to the left. The second clears the line.

You need the comma at the end of the print statement to prevent it from going to the second line. Then you need to flush the stdout stream otherwise the text won't appear.

Edit: For combinging this with logging, wrap it in a simple function:

def mylog(text):
    logging.info(text)
    print '\x1b[80D' + '\x1b[K'+ text,
    sys.stdout.flush()

EDIT 2: Integrating this into the standard logging;

import logging
# create console handler
ch = logging.StreamHandler()
# create formatter
formatter = logging.Formatter('\x1b[80D\x1b[1A\x1b[K%(message)s')
# add formatter to console handler
ch.setFormatter(formatter)
# add console handler to logger
logger.addHandler(ch)

Since the logging module seems to add newlines by itself, I've added an ANSI sequense (\x1b[1A) to go up one line.

Also see the logging howto for more information.

qid & accept id: (12012818, 12013023) query: Attaching a PDF to an email in Appengine (Python) soup:

As per the documentation, the attachments field is a list of tuples in which the first element is the filename and the second the byte string representing the file. So you just need to read the pdf:

\n
pdf_contents = open(os.path.join(os.path.dirname(__file__), 'yourpdf.pdf')).read()\n
\n

this assumes that your pdf and the python file are in the same folder. And then

\n
attachments = [('yourpdf.pdf', pdf_contents)]\n
\n soup wrap:

As per the documentation, the attachments field is a list of tuples in which the first element is the filename and the second the byte string representing the file. So you just need to read the pdf:

pdf_contents = open(os.path.join(os.path.dirname(__file__), 'yourpdf.pdf')).read()

this assumes that your pdf and the python file are in the same folder. And then

attachments = [('yourpdf.pdf', pdf_contents)]
qid & accept id: (12014704, 12014898) query: Iterating over related objects in Django: loop over query set or use one-liner select_related (or prefetch_related) soup:

The approach you are doing now will be heavily inefficient, because it will result in an 1+N number of queries. That is, 1 for the query of all your Newsletters, and then 1 for every single time you evaluate those n.article_set.all() results. So if you have 100 Newletter objects in that first query, you will be doing 101 queries.

\n

This is an excellent reason to use prefetch_related. It will only result in 2 queries. One to get the Newsletters, and 1 to batch get the related Articles. Though you are still perfectly able to keep doing the zip to organize them, they will already be cached, so really you can just pass the query directly to the template and loop on that. :

\n

view

\n
newsletters = Newsletter.objects.prefetch_related('article_set').all()\\n                    .order_by('-year', '-number')\n\nreturn render_to_response('newsletter/newsletter_list.html',\n                          {'newsletter_list': newsletters})\n
\n

template

\n
{% block content %}\n  {% for newsletter in newsletter_list %}\n    

{{ newsletter.label }}

\n

Volume {{ newsletter.volume }}, Number {{ newsletter.number }}

\n

{{ newsletter.article }}

\n
    \n {% for a in newsletter.article_set.all %}\n
  • {{ a.title }}
  • \n {% endfor %}\n
\n {% endfor %}\n{% endblock %}\n
\n soup wrap:

The approach you are doing now will be heavily inefficient, because it will result in an 1+N number of queries. That is, 1 for the query of all your Newsletters, and then 1 for every single time you evaluate those n.article_set.all() results. So if you have 100 Newletter objects in that first query, you will be doing 101 queries.

This is an excellent reason to use prefetch_related. It will only result in 2 queries. One to get the Newsletters, and 1 to batch get the related Articles. Though you are still perfectly able to keep doing the zip to organize them, they will already be cached, so really you can just pass the query directly to the template and loop on that. :

view

newsletters = Newsletter.objects.prefetch_related('article_set').all()\
                    .order_by('-year', '-number')

return render_to_response('newsletter/newsletter_list.html',
                          {'newsletter_list': newsletters})

template

{% block content %}
  {% for newsletter in newsletter_list %}
    

{{ newsletter.label }}

Volume {{ newsletter.volume }}, Number {{ newsletter.number }}

{{ newsletter.article }}

    {% for a in newsletter.article_set.all %}
  • {{ a.title }}
  • {% endfor %}
{% endfor %} {% endblock %}
qid & accept id: (12059634, 12059703) query: Breaking up substrings in Python based on characters soup:

You can extract all the text between pairs of " characters using regular expressions:

\n
import re\ninputString='type="NN" span="123..145" confidence="1.0" '\npat=re.compile('"([^"]*)"')\nwhile True:\n        mat=pat.search(inputString)\n        if mat is None:\n                break\n        strings.append(mat.group(1))\n        inputString=inputString[mat.end():]\nprint strings\n
\n

or, easier:

\n
import re\ninputString='type="NN" span="123..145" confidence="1.0" '\nstrings=re.findall('"([^"]*)"', inputString)\nprint strings\n
\n

Output for both versions:

\n
['NN', '123..145', '1.0']\n
\n soup wrap:

You can extract all the text between pairs of " characters using regular expressions:

import re
inputString='type="NN" span="123..145" confidence="1.0" '
pat=re.compile('"([^"]*)"')
while True:
        mat=pat.search(inputString)
        if mat is None:
                break
        strings.append(mat.group(1))
        inputString=inputString[mat.end():]
print strings

or, easier:

import re
inputString='type="NN" span="123..145" confidence="1.0" '
strings=re.findall('"([^"]*)"', inputString)
print strings

Output for both versions:

['NN', '123..145', '1.0']
qid & accept id: (12081704, 12082914) query: Python regular expression to remove space and capitalize letters where the space was? soup:

Here's an approach to the problem (that doesn't use any regular expressions, although there's one place where it could). We split up the problem into two functions: one function which splits a string into comma-separated pieces and handles each piece (parseTags), and one function which takes a string and processes it into a valid tag (sanitizeTag). The annotated code is as follows:

\n
# This function takes a string with commas separating raw user input, and\n# returns a list of valid tags made by sanitizing the strings between the\n# commas.\ndef parseTags(str):\n    # First, we split the string on commas.\n    rawTags = str.split(',')\n\n    # Then, we sanitize each of the tags.  If sanitizing gives us back None,\n    # then the tag was invalid, so we leave those cases out of our final\n    # list of tags.  We can use None as the predicate because sanitizeTag\n    # will never return '', which is the only falsy string.\n    return filter(None, map(sanitizeTag, rawTags))\n\n# This function takes a single proto-tag---the string in between the commas\n# that will be turned into a valid tag---and sanitizes it.  It either\n# returns an alphanumeric string (if the argument can be made into a valid\n# tag) or None (if the argument cannot be made into a valid tag; i.e., if\n# the argument contains only whitespace and/or punctuation).\ndef sanitizeTag(str):\n    # First, we turn non-alphanumeric characters into whitespace.  You could\n    # also use a regular expression here; see below.\n    str = ''.join(c if c.isalnum() else ' ' for c in str)\n\n    # Next, we split the string on spaces, ignoring leading and trailing\n    # whitespace.\n    words = str.split()\n\n    # There are now three possibilities: there are no words, there was one\n    # word, or there were multiple words.\n    numWords = len(words)\n    if numWords == 0:\n        # If there were no words, the string contained only spaces (and/or\n        # punctuation).  This can't be made into a valid tag, so we return\n        # None.\n        return None\n    elif numWords == 1:\n        # If there was only one word, that word is the tag, no\n        # post-processing required.\n        return words[0]\n    else:\n        # Finally, if there were multiple words, we camel-case the string:\n        # we lowercase the first word, capitalize the first letter of all\n        # the other words and lowercase the rest, and finally stick all\n        # these words together without spaces.\n        return words[0].lower() + ''.join(w.capitalize() for w in words[1:])\n
\n

And indeed, if we run this code, we get:

\n
>>> parseTags("tHiS iS a tAg, \t\n!&#^ , secondcomment , no!punc$$, ifNOSPACESthenPRESERVEcaps")\n['thisIsATag', 'secondcomment', 'noPunc', 'ifNOSPACESthenPRESERVEcaps']\n
\n

There are two points in this code that it's worth clarifying. First is the use of str.split() in sanitizeTags. This will turn a b c into ['a','b','c'], whereas str.split(' ') would produce ['','a','b','c','']. This is almost certainly the behavior you want, but there's one corner case. Consider the string tAG$. The $ gets turned into a space, and is stripped out by the split; thus, this gets turned into tAG instead of tag. This is probably what you want, but if it isn't, you have to be careful. What I would do is change that line to words = re.split(r'\s+', str), which will split the string on whitespace but leave in the leading and trailing empty strings; however, I would also change parseTags to use rawTags = re.split(r'\s*,\s*', str). You must make both these changes; 'a , b , c'.split(',') becomes ['a ', ' b ', ' c'], which is not the behavior you want, whereas r'\s*,\s*' deletes the space around the commas too. If you ignore leading and trailing white space, the difference is immaterial; but if you don't, then you need to be careful.

\n

Finally, there's the non-use of regular expressions, and instead the use of str = ''.join(c if c.isalnum() else ' ' for c in str). You can, if you want, replace this with a regular expression. (Edit: I removed some inaccuracies about Unicode and regular expressions here.) Ignoring Unicode, you could replace this line with

\n
str = re.sub(r'[^A-Za-z0-9]', ' ', str)\n
\n

This uses [^...] to match everything but the listed characters: ASCII letters and numbers. However, it's better to support Unicode, and it's easy, too. The simplest such approach is

\n
str = re.sub(r'\W', ' ', str, flags=re.UNICODE)\n
\n

Here, \W matches non-word characters; a word character is a letter, a number, or the underscore. With flags=re.UNICODE specified (not available before Python 2.7; you can instead use r'(?u)\W' for earlier versions and 2.7), letters and numbers are both any appropriate Unicode characters; without it, they're just ASCII. If you don't want the underscore, you can add |_ to the regex to match underscores as well, replacing them with spaces too:

\n
str = re.sub(r'\W|_', ' ', str, flags=re.UNICODE)\n
\n

This last one, I believe, matches the behavior of my non-regex-using code exactly.

\n
\n

Also, here's how I'd write the same code without those comments; this also allows me to eliminate some temporary variables. You might prefer the code with the variables present; it's just a matter of taste.

\n
def parseTags(str):\n    return filter(None, map(sanitizeTag, str.split(',')))\n\ndef sanitizeTag(str):\n    words    = ''.join(c if c.isalnum() else ' ' for c in str).split()\n    numWords = len(words)\n    if numWords == 0:\n        return None\n    elif numWords == 1:\n        return words[0]\n    else:\n        return words[0].lower() + ''.join(w.capitalize() for w in words[1:])\n
\n
\n

To handle the newly-desired behavior, there are two things we have to do. First, we need a way to fix the capitalization of the first word: lowercase the whole thing if the first letter's lowercase, and lowercase everything but the first letter if the first letter's upper case. That's easy: we can just check directly. Secondly, we want to treat punctuation as completely invisible: it shouldn't uppercase the following words. Again, that's easy—I even discuss how to handle something similar above. We just filter out all the non-alphanumeric, non-whitespace characters rather than turning them into spaces. Incorporating those changes gives us

\n
def parseTags(str):\n    return filter(None, map(sanitizeTag, str.split(',')))\n\ndef sanitizeTag(str):\n    words    = filter(lambda c: c.isalnum() or c.isspace(), str).split()\n    numWords = len(words)\n    if numWords == 0:\n        return None\n    elif numWords == 1:\n        return words[0]\n    else:\n        words0 = words[0].lower() if words[0][0].islower() else words[0].capitalize()\n        return words0 + ''.join(w.capitalize() for w in words[1:])\n
\n

Running this code gives us the following output

\n
>>> parseTags("tHiS iS a tAg, AnD tHIs, \t\n!&#^ , se@%condcomment$ , No!pUnc$$, ifNOSPACESthenPRESERVEcaps")\n['thisIsATag', 'AndThis', 'secondcomment', 'NopUnc', 'ifNOSPACESthenPRESERVEcaps']\n
\n soup wrap:

Here's an approach to the problem (that doesn't use any regular expressions, although there's one place where it could). We split up the problem into two functions: one function which splits a string into comma-separated pieces and handles each piece (parseTags), and one function which takes a string and processes it into a valid tag (sanitizeTag). The annotated code is as follows:

# This function takes a string with commas separating raw user input, and
# returns a list of valid tags made by sanitizing the strings between the
# commas.
def parseTags(str):
    # First, we split the string on commas.
    rawTags = str.split(',')

    # Then, we sanitize each of the tags.  If sanitizing gives us back None,
    # then the tag was invalid, so we leave those cases out of our final
    # list of tags.  We can use None as the predicate because sanitizeTag
    # will never return '', which is the only falsy string.
    return filter(None, map(sanitizeTag, rawTags))

# This function takes a single proto-tag---the string in between the commas
# that will be turned into a valid tag---and sanitizes it.  It either
# returns an alphanumeric string (if the argument can be made into a valid
# tag) or None (if the argument cannot be made into a valid tag; i.e., if
# the argument contains only whitespace and/or punctuation).
def sanitizeTag(str):
    # First, we turn non-alphanumeric characters into whitespace.  You could
    # also use a regular expression here; see below.
    str = ''.join(c if c.isalnum() else ' ' for c in str)

    # Next, we split the string on spaces, ignoring leading and trailing
    # whitespace.
    words = str.split()

    # There are now three possibilities: there are no words, there was one
    # word, or there were multiple words.
    numWords = len(words)
    if numWords == 0:
        # If there were no words, the string contained only spaces (and/or
        # punctuation).  This can't be made into a valid tag, so we return
        # None.
        return None
    elif numWords == 1:
        # If there was only one word, that word is the tag, no
        # post-processing required.
        return words[0]
    else:
        # Finally, if there were multiple words, we camel-case the string:
        # we lowercase the first word, capitalize the first letter of all
        # the other words and lowercase the rest, and finally stick all
        # these words together without spaces.
        return words[0].lower() + ''.join(w.capitalize() for w in words[1:])

And indeed, if we run this code, we get:

>>> parseTags("tHiS iS a tAg, \t\n!&#^ , secondcomment , no!punc$$, ifNOSPACESthenPRESERVEcaps")
['thisIsATag', 'secondcomment', 'noPunc', 'ifNOSPACESthenPRESERVEcaps']

There are two points in this code that it's worth clarifying. First is the use of str.split() in sanitizeTags. This will turn a b c into ['a','b','c'], whereas str.split(' ') would produce ['','a','b','c','']. This is almost certainly the behavior you want, but there's one corner case. Consider the string tAG$. The $ gets turned into a space, and is stripped out by the split; thus, this gets turned into tAG instead of tag. This is probably what you want, but if it isn't, you have to be careful. What I would do is change that line to words = re.split(r'\s+', str), which will split the string on whitespace but leave in the leading and trailing empty strings; however, I would also change parseTags to use rawTags = re.split(r'\s*,\s*', str). You must make both these changes; 'a , b , c'.split(',') becomes ['a ', ' b ', ' c'], which is not the behavior you want, whereas r'\s*,\s*' deletes the space around the commas too. If you ignore leading and trailing white space, the difference is immaterial; but if you don't, then you need to be careful.

Finally, there's the non-use of regular expressions, and instead the use of str = ''.join(c if c.isalnum() else ' ' for c in str). You can, if you want, replace this with a regular expression. (Edit: I removed some inaccuracies about Unicode and regular expressions here.) Ignoring Unicode, you could replace this line with

str = re.sub(r'[^A-Za-z0-9]', ' ', str)

This uses [^...] to match everything but the listed characters: ASCII letters and numbers. However, it's better to support Unicode, and it's easy, too. The simplest such approach is

str = re.sub(r'\W', ' ', str, flags=re.UNICODE)

Here, \W matches non-word characters; a word character is a letter, a number, or the underscore. With flags=re.UNICODE specified (not available before Python 2.7; you can instead use r'(?u)\W' for earlier versions and 2.7), letters and numbers are both any appropriate Unicode characters; without it, they're just ASCII. If you don't want the underscore, you can add |_ to the regex to match underscores as well, replacing them with spaces too:

str = re.sub(r'\W|_', ' ', str, flags=re.UNICODE)

This last one, I believe, matches the behavior of my non-regex-using code exactly.


Also, here's how I'd write the same code without those comments; this also allows me to eliminate some temporary variables. You might prefer the code with the variables present; it's just a matter of taste.

def parseTags(str):
    return filter(None, map(sanitizeTag, str.split(',')))

def sanitizeTag(str):
    words    = ''.join(c if c.isalnum() else ' ' for c in str).split()
    numWords = len(words)
    if numWords == 0:
        return None
    elif numWords == 1:
        return words[0]
    else:
        return words[0].lower() + ''.join(w.capitalize() for w in words[1:])

To handle the newly-desired behavior, there are two things we have to do. First, we need a way to fix the capitalization of the first word: lowercase the whole thing if the first letter's lowercase, and lowercase everything but the first letter if the first letter's upper case. That's easy: we can just check directly. Secondly, we want to treat punctuation as completely invisible: it shouldn't uppercase the following words. Again, that's easy—I even discuss how to handle something similar above. We just filter out all the non-alphanumeric, non-whitespace characters rather than turning them into spaces. Incorporating those changes gives us

def parseTags(str):
    return filter(None, map(sanitizeTag, str.split(',')))

def sanitizeTag(str):
    words    = filter(lambda c: c.isalnum() or c.isspace(), str).split()
    numWords = len(words)
    if numWords == 0:
        return None
    elif numWords == 1:
        return words[0]
    else:
        words0 = words[0].lower() if words[0][0].islower() else words[0].capitalize()
        return words0 + ''.join(w.capitalize() for w in words[1:])

Running this code gives us the following output

>>> parseTags("tHiS iS a tAg, AnD tHIs, \t\n!&#^ , se@%condcomment$ , No!pUnc$$, ifNOSPACESthenPRESERVEcaps")
['thisIsATag', 'AndThis', 'secondcomment', 'NopUnc', 'ifNOSPACESthenPRESERVEcaps']
qid & accept id: (12102342, 21275483) query: Specific font_face based on syntax in Sublime Text 2 soup:

For syntax specific settings for targeted language, in Packages > User folder, create a file with name of that language.

\n

ex. for PHP, create php.sublime-settings.

\n

and add following code to it:

\n
{\n    "font_face": "Source Code Pro"\n}\n
\n

For JavaScript create file names JavaScript.sublime-settings and so on.

\n

Also, using this technique, you can set different color schemes for different languages using the color_scheme attribute.

\n
{\n    "font_face": "Source Code Pro",\n    "color_scheme": "Packages/Theme - Flatland/Flatland Monokai.tmTheme"\n}\n
\n

Alternatively, if the file with targeted language is open, you can go to Preferences > Settings - More > Syntax Specific - User, and add the font_face setting.

\n soup wrap:

For syntax specific settings for targeted language, in Packages > User folder, create a file with name of that language.

ex. for PHP, create php.sublime-settings.

and add following code to it:

{
    "font_face": "Source Code Pro"
}

For JavaScript create file names JavaScript.sublime-settings and so on.

Also, using this technique, you can set different color schemes for different languages using the color_scheme attribute.

{
    "font_face": "Source Code Pro",
    "color_scheme": "Packages/Theme - Flatland/Flatland Monokai.tmTheme"
}

Alternatively, if the file with targeted language is open, you can go to Preferences > Settings - More > Syntax Specific - User, and add the font_face setting.

qid & accept id: (12151674, 12151687) query: How to get the number of elements returned from a function in Python soup:

You can't -- A function can return different numbers of arguments (stored in a single tuple) and different types of variables in that tuple depending on input or other factors. consider the (silly) function:

\n
def foo(arg):\n    if arg:\n       return 1,2\n    else\n       return "foo","bar","baz"\n
\n

Now call it:

\n
foo(1) # (1,2)\nfoo(0) # ("foo","bar","baz")\n
\n

The only way to know what a function will return is to 1) read the source or 2) (If you're a trusting sort of person) read the documentation for the function :-).

\n soup wrap:

You can't -- A function can return different numbers of arguments (stored in a single tuple) and different types of variables in that tuple depending on input or other factors. consider the (silly) function:

def foo(arg):
    if arg:
       return 1,2
    else
       return "foo","bar","baz"

Now call it:

foo(1) # (1,2)
foo(0) # ("foo","bar","baz")

The only way to know what a function will return is to 1) read the source or 2) (If you're a trusting sort of person) read the documentation for the function :-).

qid & accept id: (12175964, 12179724) query: Python method for reading keypress? soup:

Figured it out by testing all the stuff by myself.\nCouldn't find any topics about it tho, so I'll just leave the solution here. This might not be the only or even the best solution, but it works for my purposes (within getch's limits) and is better than nothing.

\n

Note: proper keyDown() which would recognize all the keys and actual key presses, is still valued.

\n

Solution: using ord()-function to first turn the getch() into an integer (I guess they're virtual key codes, but not too sure) works fine, and then comparing the result to the actual number representing the wanted key. Also, if I needed to, I could add an extra chr() around the number returned so that it would convert it to a character. However, I'm using mostly down arrow, esc, etc. so converting those to a character would be stupid. Here's the final code:

\n
from msvcrt import getch\nwhile True:\n    key = ord(getch())\n    if key == 27: #ESC\n        break\n    elif key == 13: #Enter\n        select()\n    elif key == 224: #Special keys (arrows, f keys, ins, del, etc.)\n        key = ord(getch())\n        if key == 80: #Down arrow\n            moveDown()\n        elif key == 72: #Up arrow\n            moveUp()\n
\n

Also if someone else needs to, you can easily find out the keycodes from google, or by using python and just pressing the key:

\n
from msvcrt import getch\nwhile True:\n    print(ord(getch()))\n
\n soup wrap:

Figured it out by testing all the stuff by myself. Couldn't find any topics about it tho, so I'll just leave the solution here. This might not be the only or even the best solution, but it works for my purposes (within getch's limits) and is better than nothing.

Note: proper keyDown() which would recognize all the keys and actual key presses, is still valued.

Solution: using ord()-function to first turn the getch() into an integer (I guess they're virtual key codes, but not too sure) works fine, and then comparing the result to the actual number representing the wanted key. Also, if I needed to, I could add an extra chr() around the number returned so that it would convert it to a character. However, I'm using mostly down arrow, esc, etc. so converting those to a character would be stupid. Here's the final code:

from msvcrt import getch
while True:
    key = ord(getch())
    if key == 27: #ESC
        break
    elif key == 13: #Enter
        select()
    elif key == 224: #Special keys (arrows, f keys, ins, del, etc.)
        key = ord(getch())
        if key == 80: #Down arrow
            moveDown()
        elif key == 72: #Up arrow
            moveUp()

Also if someone else needs to, you can easily find out the keycodes from google, or by using python and just pressing the key:

from msvcrt import getch
while True:
    print(ord(getch()))
qid & accept id: (12184015, 12185223) query: In Python, how can I naturally sort a list of alphanumeric strings such that alpha characters sort ahead of numeric characters? soup:
re_natural = re.compile('[0-9]+|[^0-9]+')\n\ndef natural_key(s):\n    return [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in re_natural.findall(s)] + [s]\n\nfor case in test_cases:\n    print case[1]\n    print sorted(case[0], key=natural_key)\n\n['a', 'b', 'c']\n['a', 'b', 'c']\n['A', 'b', 'C']\n['A', 'b', 'C']\n['a', 'B', 'r', '0', '9']\n['a', 'B', 'r', '0', '9']\n['a1', 'a2', 'a100', '1a', '10a']\n['a1', 'a2', 'a100', '1a', '10a']\n['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']\n['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']\n['A', 'a', 'b', 'r', '0', '9']\n['A', 'a', 'b', 'r', '0', '9']\n['ABc', 'Abc', 'abc']\n['ABc', 'Abc', 'abc']\n
\n

Edit: I decided to revisit this question and see if it would be possible to handle the bonus case. It requires being more sophisticated in the tie-breaker portion of the key. To match the desired results, the alpha parts of the key must be considered before the numeric parts. I also added a marker between the natural section of the key and the tie-breaker so that short keys always come before long ones.

\n
def natural_key2(s):\n    parts = re_natural.findall(s)\n    natural = [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in parts]\n    ties_alpha = [c for c in parts if not c.isdigit()]\n    ties_numeric = [c for c in parts if c.isdigit()]\n    return natural + [(-1,)] + ties_alpha + ties_numeric\n
\n

This generates identical results for the test cases above, plus the desired output for the bonus case:

\n
['A', 'a', 'A0', 'a0', '0', '00', '0A', '00A', '0a', '00a']\n
\n soup wrap:
re_natural = re.compile('[0-9]+|[^0-9]+')

def natural_key(s):
    return [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in re_natural.findall(s)] + [s]

for case in test_cases:
    print case[1]
    print sorted(case[0], key=natural_key)

['a', 'b', 'c']
['a', 'b', 'c']
['A', 'b', 'C']
['A', 'b', 'C']
['a', 'B', 'r', '0', '9']
['a', 'B', 'r', '0', '9']
['a1', 'a2', 'a100', '1a', '10a']
['a1', 'a2', 'a100', '1a', '10a']
['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']
['alp1', 'alp2', 'alp10', 'ALP11', 'alp100', 'GAM', '1', '2', '100']
['A', 'a', 'b', 'r', '0', '9']
['A', 'a', 'b', 'r', '0', '9']
['ABc', 'Abc', 'abc']
['ABc', 'Abc', 'abc']

Edit: I decided to revisit this question and see if it would be possible to handle the bonus case. It requires being more sophisticated in the tie-breaker portion of the key. To match the desired results, the alpha parts of the key must be considered before the numeric parts. I also added a marker between the natural section of the key and the tie-breaker so that short keys always come before long ones.

def natural_key2(s):
    parts = re_natural.findall(s)
    natural = [(1, int(c)) if c.isdigit() else (0, c.lower()) for c in parts]
    ties_alpha = [c for c in parts if not c.isdigit()]
    ties_numeric = [c for c in parts if c.isdigit()]
    return natural + [(-1,)] + ties_alpha + ties_numeric

This generates identical results for the test cases above, plus the desired output for the bonus case:

['A', 'a', 'A0', 'a0', '0', '00', '0A', '00A', '0a', '00a']
qid & accept id: (12227084, 12227529) query: A way to get the path to the user installed packages on Linux and OS X operating systems? (Usable for Python versions between 2.5 - 2.7) soup:

If you need the specific functionality of the get_python_lib function, the source for that module is fairly straightforward and doesn't use any Python 2.7 specific syntax at all; you could simply backport it.

\n

You'd basically need the following definitions and two functions:

\n
import os\nimport sys\nfrom distutils.errors import DistutilsPlatformError\n\n\nPREFIX = os.path.normpath(sys.prefix)\nEXEC_PREFIX = os.path.normpath(sys.exec_prefix)\n\n\ndef get_python_version():\n    """Return a string containing the major and minor Python version,\n    leaving off the patchlevel.  Sample return values could be '1.5'\n    or '2.2'.\n    """\n    return sys.version[:3]\n\ndef get_python_lib(plat_specific=0, standard_lib=0, prefix=None):\n    """Return the directory containing the Python library (standard or\n    site additions).\n\n    If 'plat_specific' is true, return the directory containing\n    platform-specific modules, i.e. any module from a non-pure-Python\n    module distribution; otherwise, return the platform-shared library\n    directory.  If 'standard_lib' is true, return the directory\n    containing standard Python library modules; otherwise, return the\n    directory for site-specific modules.\n\n    If 'prefix' is supplied, use it instead of sys.prefix or\n    sys.exec_prefix -- i.e., ignore 'plat_specific'.\n    """\n    if prefix is None:\n        prefix = plat_specific and EXEC_PREFIX or PREFIX\n\n    if os.name == "posix":\n        libpython = os.path.join(prefix,\n                                 "lib", "python" + get_python_version())\n        if standard_lib:\n            return libpython\n        else:\n            return os.path.join(libpython, "site-packages")\n\n    elif os.name == "nt":\n        if standard_lib:\n            return os.path.join(prefix, "Lib")\n        else:\n            if get_python_version() < "2.2":\n                return prefix\n            else:\n                return os.path.join(prefix, "Lib", "site-packages")\n\n    elif os.name == "os2":\n        if standard_lib:\n            return os.path.join(prefix, "Lib")\n        else:\n            return os.path.join(prefix, "Lib", "site-packages")\n\n    else:\n        raise DistutilsPlatformError(\n            "I don't know where Python installs its library "\n            "on platform '%s'" % os.name)\n
\n

You can cut the long function down to just the branch you need for your platform, of course; for OS X that'd be:

\n
def get_python_lib(plat_specific=0, standard_lib=0, prefix=None):\n    if prefix is None:\n        prefix = plat_specific and EXEC_PREFIX or PREFIX\n\n    libpython = os.path.join(prefix,\n                             "lib", "python" + get_python_version())\n    if standard_lib:\n        return libpython\n    else:\n        return os.path.join(libpython, "site-packages")\n
\n

Note that Debian patches this function to return dist-packages in the default case, this doesn't apply to OS X.

\n soup wrap:

If you need the specific functionality of the get_python_lib function, the source for that module is fairly straightforward and doesn't use any Python 2.7 specific syntax at all; you could simply backport it.

You'd basically need the following definitions and two functions:

import os
import sys
from distutils.errors import DistutilsPlatformError


PREFIX = os.path.normpath(sys.prefix)
EXEC_PREFIX = os.path.normpath(sys.exec_prefix)


def get_python_version():
    """Return a string containing the major and minor Python version,
    leaving off the patchlevel.  Sample return values could be '1.5'
    or '2.2'.
    """
    return sys.version[:3]

def get_python_lib(plat_specific=0, standard_lib=0, prefix=None):
    """Return the directory containing the Python library (standard or
    site additions).

    If 'plat_specific' is true, return the directory containing
    platform-specific modules, i.e. any module from a non-pure-Python
    module distribution; otherwise, return the platform-shared library
    directory.  If 'standard_lib' is true, return the directory
    containing standard Python library modules; otherwise, return the
    directory for site-specific modules.

    If 'prefix' is supplied, use it instead of sys.prefix or
    sys.exec_prefix -- i.e., ignore 'plat_specific'.
    """
    if prefix is None:
        prefix = plat_specific and EXEC_PREFIX or PREFIX

    if os.name == "posix":
        libpython = os.path.join(prefix,
                                 "lib", "python" + get_python_version())
        if standard_lib:
            return libpython
        else:
            return os.path.join(libpython, "site-packages")

    elif os.name == "nt":
        if standard_lib:
            return os.path.join(prefix, "Lib")
        else:
            if get_python_version() < "2.2":
                return prefix
            else:
                return os.path.join(prefix, "Lib", "site-packages")

    elif os.name == "os2":
        if standard_lib:
            return os.path.join(prefix, "Lib")
        else:
            return os.path.join(prefix, "Lib", "site-packages")

    else:
        raise DistutilsPlatformError(
            "I don't know where Python installs its library "
            "on platform '%s'" % os.name)

You can cut the long function down to just the branch you need for your platform, of course; for OS X that'd be:

def get_python_lib(plat_specific=0, standard_lib=0, prefix=None):
    if prefix is None:
        prefix = plat_specific and EXEC_PREFIX or PREFIX

    libpython = os.path.join(prefix,
                             "lib", "python" + get_python_version())
    if standard_lib:
        return libpython
    else:
        return os.path.join(libpython, "site-packages")

Note that Debian patches this function to return dist-packages in the default case, this doesn't apply to OS X.

qid & accept id: (12231891, 12232048) query: referencing list object by data python soup:

No you can't. You can only reference something with a reference, not data. In your case:

\n
mylist = ['id', value]\n
\n

mylist[1] is a reference (a pointer) to a spot somewhere in memory, which contains the data value. You can't reference a data with itself (well, technically you can, just not in any way that'd make a whole lot of sense).

\n

You can however get a reference to the data with

\n
mylist[mylist.index(value)]\n
\n

In this case, mylist.index(value) gets the index of the first occurrence of value without "iterating over the list". However, you should know that even this way Python is "iterating over the list" under the hood (depending on the implementation of Python). It's simply how arrays work on a binary level; you must iterate at some point. (See http://docs.python.org/py3k/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range)

\n soup wrap:

No you can't. You can only reference something with a reference, not data. In your case:

mylist = ['id', value]

mylist[1] is a reference (a pointer) to a spot somewhere in memory, which contains the data value. You can't reference a data with itself (well, technically you can, just not in any way that'd make a whole lot of sense).

You can however get a reference to the data with

mylist[mylist.index(value)]

In this case, mylist.index(value) gets the index of the first occurrence of value without "iterating over the list". However, you should know that even this way Python is "iterating over the list" under the hood (depending on the implementation of Python). It's simply how arrays work on a binary level; you must iterate at some point. (See http://docs.python.org/py3k/library/stdtypes.html#sequence-types-str-bytes-bytearray-list-tuple-range)

qid & accept id: (12307099, 12307162) query: Modifying a subset of rows in a pandas dataframe soup:

Try this:

\n
df.ix[df.A==0, 'B'] = np.nan\n
\n

the df.A==0 expression creates a boolean series that indexes the rows, 'B' selects the column. You can also use this to transform a subset of a column, e.g.:

\n
df.ix[df.A==0, 'B'] = df.ix[df.A==0, 'B'] / 2\n
\n

I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.

\n
\n

Update

\n

ix is deprecated, use .loc for label based indexing

\n
df.loc[df.A==0, 'B'] = np.nan\n
\n soup wrap:

Try this:

df.ix[df.A==0, 'B'] = np.nan

the df.A==0 expression creates a boolean series that indexes the rows, 'B' selects the column. You can also use this to transform a subset of a column, e.g.:

df.ix[df.A==0, 'B'] = df.ix[df.A==0, 'B'] / 2

I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.


Update

ix is deprecated, use .loc for label based indexing

df.loc[df.A==0, 'B'] = np.nan
qid & accept id: (12309693, 12313371) query: Sending non-string argument in a POST request to a Tornado server soup:

JSON will fit well for your purposes.

\n

Do something like this, on client side:

\n
var data = {'packed_arg':get_form_args(); } \n
\n

Function get_form_args() is abstraction. You can implement it any way. Javascript objects are JSONs by default. \nSo on client side you must create dictionary from form fields. \nThink this way:

\n
var data = {};\nvar names_to_pack = ['packed1', 'packed2']\n$(form).find('input, select').each(function (i, x) {\n    var name = $(x).attr('name')\n    if(names_to_pack.indexOf(name) != -1) { \n        if(!data.packed) {\n            data.packed = {};  \n        }\n        data['packed'][name] = $(x).val(); \n    } else { \n        data[name] = $(x).val(); \n    }\n});\n$.post('/', data); \n
\n

And then on server side:

\n
raw_packed = self.get_argument('packed_arg', None)\npacked = {}\nif raw_packed: \n    packed = tornado.escape.json_decode(raw_packed)\narg1 = packed.get('arg1')\narg2 = packed.get('arg2')\n
\n

Also you can access all POST args in self.request.arguments.

\n soup wrap:

JSON will fit well for your purposes.

Do something like this, on client side:

var data = {'packed_arg':get_form_args(); } 

Function get_form_args() is abstraction. You can implement it any way. Javascript objects are JSONs by default. So on client side you must create dictionary from form fields. Think this way:

var data = {};
var names_to_pack = ['packed1', 'packed2']
$(form).find('input, select').each(function (i, x) {
    var name = $(x).attr('name')
    if(names_to_pack.indexOf(name) != -1) { 
        if(!data.packed) {
            data.packed = {};  
        }
        data['packed'][name] = $(x).val(); 
    } else { 
        data[name] = $(x).val(); 
    }
});
$.post('/', data); 

And then on server side:

raw_packed = self.get_argument('packed_arg', None)
packed = {}
if raw_packed: 
    packed = tornado.escape.json_decode(raw_packed)
arg1 = packed.get('arg1')
arg2 = packed.get('arg2')

Also you can access all POST args in self.request.arguments.

qid & accept id: (12329807, 12541292) query: Django app deployment on nGINX soup:

Once you have created an dJango application. Just follow these steps:

\n

STEP 1. Create a file say uwsgi.ini in your Django Project Directory. i.e besides manage.py

\n
[uwsgi]\n# set the http port\nhttp = :\n\n# change to django project directory\nchdir = \n\n# add /var/www to the pythonpath, in this way we can use the project.app format\npythonpath = /var/www\n\n# set the project settings name\nenv = DJANGO_SETTINGS_MODULE=.settings\n\n# load django\nmodule = django.core.handlers.wsgi:WSGIHandler()\n
\n

STEP 2. Under /etc/nginx/sites-available add .conf file

\n
server {\nlisten 84;\nserver_name example.com;\naccess_log /var/log/nginx/sample_project.access.log;\nerror_log /var/log/nginx/sample_project.error.log;\n\n# https://docs.djangoproject.com/en/dev/howto/static-files/#serving-static-files-in-production\nlocation /static/ { # STATIC_URL\n    alias /home/www/myhostname.com/static/; # STATIC_ROOT\n    expires 30d;\n                  }\n\n       }\n
\n

STEP 3. In nginx.conf, pass the request to your Django application

\n

Under the server { } block,

\n
location /yourapp {\n           include uwsgi_params;\n           uwsgi_pass :;\n                   }\n
\n

STEP 4. Run the uwsgi.ini

\n
> uwsgi --ini uwsgi.ini\n
\n

Now any request to your nGINX will pass the request to your Django App via uwsgi.. Enjoy :)

\n soup wrap:

Once you have created an dJango application. Just follow these steps:

STEP 1. Create a file say uwsgi.ini in your Django Project Directory. i.e besides manage.py

[uwsgi]
# set the http port
http = :

# change to django project directory
chdir = 

# add /var/www to the pythonpath, in this way we can use the project.app format
pythonpath = /var/www

# set the project settings name
env = DJANGO_SETTINGS_MODULE=.settings

# load django
module = django.core.handlers.wsgi:WSGIHandler()

STEP 2. Under /etc/nginx/sites-available add .conf file

server {
listen 84;
server_name example.com;
access_log /var/log/nginx/sample_project.access.log;
error_log /var/log/nginx/sample_project.error.log;

# https://docs.djangoproject.com/en/dev/howto/static-files/#serving-static-files-in-production
location /static/ { # STATIC_URL
    alias /home/www/myhostname.com/static/; # STATIC_ROOT
    expires 30d;
                  }

       }

STEP 3. In nginx.conf, pass the request to your Django application

Under the server { } block,

location /yourapp {
           include uwsgi_params;
           uwsgi_pass :;
                   }

STEP 4. Run the uwsgi.ini

> uwsgi --ini uwsgi.ini

Now any request to your nGINX will pass the request to your Django App via uwsgi.. Enjoy :)

qid & accept id: (12379529, 12380734) query: Sqlalchemy "double layer" query soup:

For example:

\n
class Post(Base):\n    __tablename__ = 'post'\n\n    id = Column(Integer, primary_key=True)\n    text = Column(Unicode)\n\nclass Like(Base):\n    __tablename__ = 'like'\n\n    id = Column(Integer, primary_key=True)\n    post_id = Column(Integer, ForeignKey(Post.id), nullable=False)\n\nclass Alert(Base):\n    __tablename__ = 'alert'\n\n    id = Column(Integer, primary_key=True)\n    like_id = Column(Integer, ForeignKey(Like.id))\n
\n

Then in SQLAlchemy you can use the following query:

\n
DBSession.query(Alert.id).join(Like).join(Post).filter(Post.id==2).all()\n
\n soup wrap:

For example:

class Post(Base):
    __tablename__ = 'post'

    id = Column(Integer, primary_key=True)
    text = Column(Unicode)

class Like(Base):
    __tablename__ = 'like'

    id = Column(Integer, primary_key=True)
    post_id = Column(Integer, ForeignKey(Post.id), nullable=False)

class Alert(Base):
    __tablename__ = 'alert'

    id = Column(Integer, primary_key=True)
    like_id = Column(Integer, ForeignKey(Like.id))

Then in SQLAlchemy you can use the following query:

DBSession.query(Alert.id).join(Like).join(Post).filter(Post.id==2).all()
qid & accept id: (12389570, 12389630) query: get characters from string in python soup:

With regex pattern ^TestVar\s+(\d{8})\s+(\S+) you can get that as >>

\n
import re\np = re.compile('^TestVar\s+(\d{8})\s+(\S+)')\nm = p.match('TestVar 00000000  WWWWWW 222.222 222.222 222.222')\nif m:\n    print 'Match found: ', m.group(2) + '_' + m.group(1)\nelse:\n    print 'No match'\n
\n

Test this demo here.

\n
\n

To find all occurrences in multiline input string use:

\n
p = re.compile("^TestVar\s+(\d{8})\s+(\S+)", re.MULTILINE) \nm = p.findall(input) \n
\n
\n

To learn more about regex with Python, see http://docs.python.org/howto/regex.html

\n soup wrap:

With regex pattern ^TestVar\s+(\d{8})\s+(\S+) you can get that as >>

import re
p = re.compile('^TestVar\s+(\d{8})\s+(\S+)')
m = p.match('TestVar 00000000  WWWWWW 222.222 222.222 222.222')
if m:
    print 'Match found: ', m.group(2) + '_' + m.group(1)
else:
    print 'No match'

Test this demo here.


To find all occurrences in multiline input string use:

p = re.compile("^TestVar\s+(\d{8})\s+(\S+)", re.MULTILINE) 
m = p.findall(input) 

To learn more about regex with Python, see http://docs.python.org/howto/regex.html

qid & accept id: (12413317, 12413493) query: Combining lists and performing a check soup:

Why split in the first place when you could do:

\n
attendees = [(a.profile, a.verified, a.from_user)\n                 for a in Attendee.objects.filter(event=event)]\n
\n

and then:

\n
{% for attendee, verified, from_user in attendees_list %}\n
\n

You can then control what each says at the template level using {% if verified %} or {% if from_user %} blocks.

\n

Alternatively, you can just do:

\n
attendees = Attendee.objects.filter(event=event)\n
\n

and refer to attendee.profile, attendee.verified, and attendee.from_user directly in the template.

\n soup wrap:

Why split in the first place when you could do:

attendees = [(a.profile, a.verified, a.from_user)
                 for a in Attendee.objects.filter(event=event)]

and then:

{% for attendee, verified, from_user in attendees_list %}

You can then control what each says at the template level using {% if verified %} or {% if from_user %} blocks.

Alternatively, you can just do:

attendees = Attendee.objects.filter(event=event)

and refer to attendee.profile, attendee.verified, and attendee.from_user directly in the template.

qid & accept id: (12494277, 12495646) query: Broken XML file parsing and using XPATH soup:

I'm sure my solution is far too simple to cover all cases, but it should be able to cover simple cases when closing tags are missing:

\n
>>> def fix_xml(string):\n    """\n    Tries to insert missing closing XML tags\n    """\n    error = True\n    while error:\n        try:\n            # Put one tag per line\n            string = string.replace('>', '>\n').replace('\n\n', '\n')\n            root = etree.fromstring(string)\n            error = False\n        except etree.XMLSyntaxError as exc:\n            text = str(exc)\n            pattern = "Opening and ending tag mismatch: (\w+) line (\d+) and (\w+), line (\d+), column (\d+)"\n            m = re.match(pattern, text)\n            if m:\n                # Retrieve where error took place\n                missing, l1, closing, l2, c2 = m.groups()\n                l1, l2, c2 = int(l1), int(l2), int(c2)\n                lines = string.split('\n')\n                print 'Adding closing tag <{0}> at line {1}'.format(missing, l2)\n                missing_line = lines[l2 - 1]\n                # Modified line goes back to where it was\n                lines[l2 - 1] = missing_line.replace(''.format(closing), ''.format(missing, closing))\n                string = '\n'.join(lines)\n            else:\n                raise\n    print string\n
\n

This seems to add correctly missing tags B and C:

\n
>>> s = """\n  \n    \n  \n  """\n>>> fix_xml(s)\nAdding closing tag  at line 4\nAdding closing tag  at line 7\n\n  \n    \n  \n\n  \n\n\n
\n soup wrap:

I'm sure my solution is far too simple to cover all cases, but it should be able to cover simple cases when closing tags are missing:

>>> def fix_xml(string):
    """
    Tries to insert missing closing XML tags
    """
    error = True
    while error:
        try:
            # Put one tag per line
            string = string.replace('>', '>\n').replace('\n\n', '\n')
            root = etree.fromstring(string)
            error = False
        except etree.XMLSyntaxError as exc:
            text = str(exc)
            pattern = "Opening and ending tag mismatch: (\w+) line (\d+) and (\w+), line (\d+), column (\d+)"
            m = re.match(pattern, text)
            if m:
                # Retrieve where error took place
                missing, l1, closing, l2, c2 = m.groups()
                l1, l2, c2 = int(l1), int(l2), int(c2)
                lines = string.split('\n')
                print 'Adding closing tag <{0}> at line {1}'.format(missing, l2)
                missing_line = lines[l2 - 1]
                # Modified line goes back to where it was
                lines[l2 - 1] = missing_line.replace(''.format(closing), ''.format(missing, closing))
                string = '\n'.join(lines)
            else:
                raise
    print string

This seems to add correctly missing tags B and C:

>>> s = """
  
    
  
  """
>>> fix_xml(s)
Adding closing tag  at line 4
Adding closing tag  at line 7

  
    
  

  


qid & accept id: (12494930, 12507379) query: How to get the type of change in P4Python soup:

The result of p4.run_opened is an array that has a map for each opened file.\nThis map has the following keys:

\n
'haveRev'\n'rev'\n'clientFile'\n'client'\n'user'\n'action'\n'type'\n'depotFile'\n'change'\n
\n

In order to find out the type of change, iterate over the array and ask each item for the 'action'. In one of my current changelists, the first file is opened for 'edit':

\n
import P4\np4 = P4.P4()\np4.connect()\np4.run_opened()[0]['action']\np4.disconnect()\n
\n

will return: 'edit'

\n soup wrap:

The result of p4.run_opened is an array that has a map for each opened file. This map has the following keys:

'haveRev'
'rev'
'clientFile'
'client'
'user'
'action'
'type'
'depotFile'
'change'

In order to find out the type of change, iterate over the array and ask each item for the 'action'. In one of my current changelists, the first file is opened for 'edit':

import P4
p4 = P4.P4()
p4.connect()
p4.run_opened()[0]['action']
p4.disconnect()

will return: 'edit'

qid & accept id: (12496531, 12496595) query: Sort NumPy float array column by column soup:

numpy.lexsort will work here:

\n
A[np.lexsort(A.T)]\n
\n

You need to transpose A before passing it to lexsort because when passed a 2d array it expects to sort by rows (last row, second last row, etc).

\n

The alternative possibly slightly clearer way is to pass the columns explicitly:

\n
A[np.lexsort((A[:, 0], A[:, 1]))]\n
\n

You still need to remember that lexsort sorts by the last key first (there's probably some good reason for this; it's the same as performing a stable sort on successive keys).

\n soup wrap:

numpy.lexsort will work here:

A[np.lexsort(A.T)]

You need to transpose A before passing it to lexsort because when passed a 2d array it expects to sort by rows (last row, second last row, etc).

The alternative possibly slightly clearer way is to pass the columns explicitly:

A[np.lexsort((A[:, 0], A[:, 1]))]

You still need to remember that lexsort sorts by the last key first (there's probably some good reason for this; it's the same as performing a stable sort on successive keys).

qid & accept id: (12501761, 12501850) query: Passing multple files with asterisk to python shell in Windows soup:

Windows' command interpreter does not expand wildcards as UNIX shells do before passing them to the executed program or script.

\n
python.exe -c "import sys; print sys.argv[1:]" *.txt\n
\n

Result:

\n
['*.txt']\n
\n

Solution: Use the glob module.

\n
from glob import glob\nfrom sys import argv\n\nfor filename in glob(argv[1]):\n    print filename\n
\n soup wrap:

Windows' command interpreter does not expand wildcards as UNIX shells do before passing them to the executed program or script.

python.exe -c "import sys; print sys.argv[1:]" *.txt

Result:

['*.txt']

Solution: Use the glob module.

from glob import glob
from sys import argv

for filename in glob(argv[1]):
    print filename
qid & accept id: (12504976, 12505089) query: Get last "column" after .str.split() operation on column in pandas DataFrame soup:

You could use the tolist method as an intermediary:

\n
In [99]: import pandas as pd\n\nIn [100]: d1 = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})\n\nIn [101]: d1.ticker.str.split().tolist()\nOut[101]: \n[['spx', '5/25/2001', 'p500'],\n ['spx', '5/25/2001', 'p600'],\n ['spx', '5/25/2001', 'p700']]\n
\n

From which you could make a new DataFrame:

\n
In [102]: d2 = pd.DataFrame(d1.ticker.str.split().tolist(), \n   .....:                   columns="symbol date price".split())\n\nIn [103]: d2\nOut[103]: \n  symbol       date price\n0    spx  5/25/2001  p500\n1    spx  5/25/2001  p600\n2    spx  5/25/2001  p700\n
\n

For good measure, you could fix the price:

\n
In [104]: d2["price"] = d2["price"].str.replace("p","").astype(float)\n\nIn [105]: d2\nOut[105]: \n  symbol       date  price\n0    spx  5/25/2001    500\n1    spx  5/25/2001    600\n2    spx  5/25/2001    700\n
\n

PS: but if you really just want the last column, apply would suffice:

\n
In [113]: temp2.apply(lambda x: x[2])\nOut[113]: \n0    p500\n1    p600\n2    p700\nName: ticker\n
\n soup wrap:

You could use the tolist method as an intermediary:

In [99]: import pandas as pd

In [100]: d1 = pd.DataFrame({'ticker' : ['spx 5/25/2001 p500', 'spx 5/25/2001 p600', 'spx 5/25/2001 p700']})

In [101]: d1.ticker.str.split().tolist()
Out[101]: 
[['spx', '5/25/2001', 'p500'],
 ['spx', '5/25/2001', 'p600'],
 ['spx', '5/25/2001', 'p700']]

From which you could make a new DataFrame:

In [102]: d2 = pd.DataFrame(d1.ticker.str.split().tolist(), 
   .....:                   columns="symbol date price".split())

In [103]: d2
Out[103]: 
  symbol       date price
0    spx  5/25/2001  p500
1    spx  5/25/2001  p600
2    spx  5/25/2001  p700

For good measure, you could fix the price:

In [104]: d2["price"] = d2["price"].str.replace("p","").astype(float)

In [105]: d2
Out[105]: 
  symbol       date  price
0    spx  5/25/2001    500
1    spx  5/25/2001    600
2    spx  5/25/2001    700

PS: but if you really just want the last column, apply would suffice:

In [113]: temp2.apply(lambda x: x[2])
Out[113]: 
0    p500
1    p600
2    p700
Name: ticker
qid & accept id: (12573992, 12577047) query: Multiple Forms and Formsets in CreateView soup:

Class based views are still new, so I'll write this out. The process is simple:

\n

First, create the forms for your objects. One of the forms will be repeated. Nothing special to be done here.

\n
class SonInline(ModelForm):\n    model = Son\n\nclass FatherForm(ModelForm):\n    model = Father\n
\n

Then, create your formset:

\n
FatherInlineFormSet = inlineformset_factory(Father,\n    Son,\n    form=SonInline,\n    extra=1,\n    can_delete=False,\n    can_order=False\n)\n
\n

Now, to integrate it with your CreateView:

\n
class CreateFatherView(CreateView):\n    template_name = 'father_create.html'\n    model = Father\n    form_class = FatherForm # the parent object's form\n\n    # On successful form submission\n    def get_success_url(self):\n        return reverse('father-created')\n\n    # Validate forms\n    def form_valid(self, form):\n        ctx = self.get_context_data()\n        inlines = ctx['inlines']\n        if inlines.is_valid() and form.is_valid():\n            self.object = form.save() # saves Father and Children\n            return redirect(self.get_success_url())\n        else:\n            return self.render_to_response(self.get_context_data(form=form))\n\n    def form_invalid(self, form):\n        return self.render_to_response(self.get_context_data(form=form))\n\n    # We populate the context with the forms. Here I'm sending\n    # the inline forms in `inlines`\n    def get_context_data(self, **kwargs):\n        ctx = super(CreateFatherView, self).get_context_data(**kwargs)\n        if self.request.POST:\n            ctx['form'] = FatherForm(self.request.POST)\n            ctx['inlines'] = FatherInlineFormSet(self.request.POST)\n        else:\n            ctx['form'] = Father()\n            ctx['inlines'] = FatherInlineFormSet()\n        return ctx\n
\n

Finally, here is the template:

\n

The key part is the jquery django-dynamic-formset plugin that keeps adding new inline forms:

\n
\n{% csrf_token %}\n
\n {% for f in form %}\n
{{ f.label }}
{{ f }}\n {% if f.errors %}\n {% for v in f.errors %}\n
{{ v }}\n {% endfor %}\n {% endif %}\n
\n {% endfor %}\n
\n
\n

Sons:

\n\n
\n {% for f2 in inlines %}\n \n {% for i in f2 %}\n \n {% endfor %}\n \n {% endfor %}\n
\n {{ i }}{% if i.errors %}{{ i.errors }}{% endif %}\n
\n{{ inlines.management_form }}\n\n
\n\n
\n soup wrap:

Class based views are still new, so I'll write this out. The process is simple:

First, create the forms for your objects. One of the forms will be repeated. Nothing special to be done here.

class SonInline(ModelForm):
    model = Son

class FatherForm(ModelForm):
    model = Father

Then, create your formset:

FatherInlineFormSet = inlineformset_factory(Father,
    Son,
    form=SonInline,
    extra=1,
    can_delete=False,
    can_order=False
)

Now, to integrate it with your CreateView:

class CreateFatherView(CreateView):
    template_name = 'father_create.html'
    model = Father
    form_class = FatherForm # the parent object's form

    # On successful form submission
    def get_success_url(self):
        return reverse('father-created')

    # Validate forms
    def form_valid(self, form):
        ctx = self.get_context_data()
        inlines = ctx['inlines']
        if inlines.is_valid() and form.is_valid():
            self.object = form.save() # saves Father and Children
            return redirect(self.get_success_url())
        else:
            return self.render_to_response(self.get_context_data(form=form))

    def form_invalid(self, form):
        return self.render_to_response(self.get_context_data(form=form))

    # We populate the context with the forms. Here I'm sending
    # the inline forms in `inlines`
    def get_context_data(self, **kwargs):
        ctx = super(CreateFatherView, self).get_context_data(**kwargs)
        if self.request.POST:
            ctx['form'] = FatherForm(self.request.POST)
            ctx['inlines'] = FatherInlineFormSet(self.request.POST)
        else:
            ctx['form'] = Father()
            ctx['inlines'] = FatherInlineFormSet()
        return ctx

Finally, here is the template:

The key part is the jquery django-dynamic-formset plugin that keeps adding new inline forms:

{% csrf_token %}
{% for f in form %}
{{ f.label }}
{{ f }} {% if f.errors %} {% for v in f.errors %}
{{ v }} {% endfor %} {% endif %}
{% endfor %}

Sons:

{% for f2 in inlines %} {% for i in f2 %} {% endfor %} {% endfor %}
{{ i }}{% if i.errors %}{{ i.errors }}{% endif %}
{{ inlines.management_form }}
qid & accept id: (12576313, 12577614) query: Convert excel or csv file to pandas multilevel dataframe soup:

Looks like your file has fixed width columns, for which read_fwf() can be used.

\n
In [145]: data = """\\nSampleID    OtherInfo    Measurements    Error    Notes                   \nsample1     stuff                                 more stuff              \n                         36              6\n                         26              7\n                         37              8\nsample2     newstuff                              lots of stuff           \n                         25              6\n                         27              7\n"""\n\nIn [146]: df = pandas.read_fwf(StringIO(data), widths=[12, 13, 14, 9, 15])\n
\n

Ok, now we have the data, just a little bit of extra work and you have a frame on which you can use set_index() to create a MultiLevel index.

\n
In [147]: df[['Measurements', 'Error']] = df[['Measurements', 'Error']].shift(-1)\n\nIn [148]: df[['SampleID', 'OtherInfo', 'Notes']] = df[['SampleID', 'OtherInfo', 'Notes']].fillna()\n\nIn [150]: df = df.dropna()\n\nIn [151]: df\nOut[151]:\n  SampleID OtherInfo  Measurements  Error          Notes\n0  sample1     stuff            36      6     more stuff\n1  sample1     stuff            26      7     more stuff\n2  sample1     stuff            37      8     more stuff\n4  sample2  newstuff            25      6  lots of stuff\n5  sample2  newstuff            27      7  lots of stuff\n
\n soup wrap:

Looks like your file has fixed width columns, for which read_fwf() can be used.

In [145]: data = """\
SampleID    OtherInfo    Measurements    Error    Notes                   
sample1     stuff                                 more stuff              
                         36              6
                         26              7
                         37              8
sample2     newstuff                              lots of stuff           
                         25              6
                         27              7
"""

In [146]: df = pandas.read_fwf(StringIO(data), widths=[12, 13, 14, 9, 15])

Ok, now we have the data, just a little bit of extra work and you have a frame on which you can use set_index() to create a MultiLevel index.

In [147]: df[['Measurements', 'Error']] = df[['Measurements', 'Error']].shift(-1)

In [148]: df[['SampleID', 'OtherInfo', 'Notes']] = df[['SampleID', 'OtherInfo', 'Notes']].fillna()

In [150]: df = df.dropna()

In [151]: df
Out[151]:
  SampleID OtherInfo  Measurements  Error          Notes
0  sample1     stuff            36      6     more stuff
1  sample1     stuff            26      7     more stuff
2  sample1     stuff            37      8     more stuff
4  sample2  newstuff            25      6  lots of stuff
5  sample2  newstuff            27      7  lots of stuff
qid & accept id: (12667537, 12667576) query: Invoking top-level function by name in Python soup:

The easiest way is to use globals

\n
globals()[func_name]()\n
\n

You can also get the current module object by looking it up in sys.modules.

\n
getattr(sys.modules[__name__], func_name)()\n
\n soup wrap:

The easiest way is to use globals

globals()[func_name]()

You can also get the current module object by looking it up in sys.modules.

getattr(sys.modules[__name__], func_name)()
qid & accept id: (12674794, 12677377) query: Using pywin32 DosDateTimeToTime to unpack DOS packed time soup:

It takes two parameters (16 bit integers) which are identical to the first two parameters of\nDosDateTimeToFileTime

\n

You can see that in the source code PyWinTypesmodule.cpp for pywin32:

\n
static PyObject *PyWin_DosDateTimeToTime(PyObject *self, PyObject *args)\n{ \n    WORD wFatDate, wFatTime;\n    if (!PyArg_ParseTuple(args, "hh", (WORD *)&wFatDate, (WORD *)&wFatTime))\n        return NULL;\n    FILETIME fd;\n    If (!DosDateTimeToFileTime(wFatDate, wFatTime, &fd))\n      return PyWin_SetAPIError("DosDateTimeToFileTime");\n}\n
\n

Those have to be of the format described in this MSDN link with the relevant parts copied below for convenience:

\n
wFatDate [in]\nThe MS-DOS date. The date is a packed value with the following format.\n    Bits    Description\n    0-4     Day of the month (1–31)\n    5-8     Month (1 = January, 2 = February, and so on)\n    9-15    Year offset from 1980 (add 1980 to get actual year)\n\nwFatTime [in]\nThe MS-DOS time. The time is a packed value with the following format.\n    Bits    Description\n    0-4     Second divided by 2\n    5-10    Minute (0–59)\n   11-15    Hour (0–23 on a 24-hour clock)\n
\n soup wrap:

It takes two parameters (16 bit integers) which are identical to the first two parameters of DosDateTimeToFileTime

You can see that in the source code PyWinTypesmodule.cpp for pywin32:

static PyObject *PyWin_DosDateTimeToTime(PyObject *self, PyObject *args)
{ 
    WORD wFatDate, wFatTime;
    if (!PyArg_ParseTuple(args, "hh", (WORD *)&wFatDate, (WORD *)&wFatTime))
        return NULL;
    FILETIME fd;
    If (!DosDateTimeToFileTime(wFatDate, wFatTime, &fd))
      return PyWin_SetAPIError("DosDateTimeToFileTime");
}

Those have to be of the format described in this MSDN link with the relevant parts copied below for convenience:

wFatDate [in]
The MS-DOS date. The date is a packed value with the following format.
    Bits    Description
    0-4     Day of the month (1–31)
    5-8     Month (1 = January, 2 = February, and so on)
    9-15    Year offset from 1980 (add 1980 to get actual year)

wFatTime [in]
The MS-DOS time. The time is a packed value with the following format.
    Bits    Description
    0-4     Second divided by 2
    5-10    Minute (0–59)
   11-15    Hour (0–23 on a 24-hour clock)
qid & accept id: (12713743, 12713961) query: in python, how to manipulate namespace of an instance soup:

I am going to create an example here which I think parallels what you are trying to do.

\n

Say you have some class that we'll call Data that is defined in the module foo. The foo module imports bar and a method of foo.Data calls bar.get_data() to populate itself.

\n

You want to create a module test that will create an instance of foo.Data, but instead of using the actual module bar you want that instance to use a mocked version of this.

\n

You can set this up by importing foo from your test module, and then rebinding foo.bar to your mocked version of the module.

\n

Here is an example of how this might look:

\n
    \n
  • bar.py:

    \n
    def get_data():\n    return 'bar'\n
  • \n
  • foo.py:

    \n
    import bar\n\nclass Data(object):\n    def __init__(self):\n        self.val = bar.get_data()\n\nif __name__ == '__main__':\n    d = Data()\n    print d.val    # prints 'bar'\n
  • \n
  • test.py:

    \n
    import foo\n\nclass bar_mock(object):\n    @staticmethod\n    def get_data():\n        return 'test'\n\nif __name__ == '__main__':\n    foo.bar = bar_mock\n    d = foo.Data()\n    print d.val    # prints 'test'\n
  • \n
\n

Although this will get you by for a simple test case, you are probably better off looking into a mocking library to handle this for you.

\n soup wrap:

I am going to create an example here which I think parallels what you are trying to do.

Say you have some class that we'll call Data that is defined in the module foo. The foo module imports bar and a method of foo.Data calls bar.get_data() to populate itself.

You want to create a module test that will create an instance of foo.Data, but instead of using the actual module bar you want that instance to use a mocked version of this.

You can set this up by importing foo from your test module, and then rebinding foo.bar to your mocked version of the module.

Here is an example of how this might look:

  • bar.py:

    def get_data():
        return 'bar'
    
  • foo.py:

    import bar
    
    class Data(object):
        def __init__(self):
            self.val = bar.get_data()
    
    if __name__ == '__main__':
        d = Data()
        print d.val    # prints 'bar'
    
  • test.py:

    import foo
    
    class bar_mock(object):
        @staticmethod
        def get_data():
            return 'test'
    
    if __name__ == '__main__':
        foo.bar = bar_mock
        d = foo.Data()
        print d.val    # prints 'test'
    

Although this will get you by for a simple test case, you are probably better off looking into a mocking library to handle this for you.

qid & accept id: (12744136, 12775403) query: Flask-WTF: how pass structered object to form soup:

Well found this in the documentation and i will use it for now:

\n

in the view:

\n
channel_obj = db.TVChannel().get_id(channel_id) #load a channel's infos into an object\nchannel     = ChannelForm(request.form, obj=channel_obj) #load channel form\nchannel.CITY1adapt.process_data(channel_obj.streams['City1']['adapt'])\n#and others links\n
\n

And in the form:

\n
class ChannelForm(Form):    \n    _id         = HiddenField()\n    name        = TextField(_('channel name'))    \n    CITY1adapt  = TextField(_('adapt link')) \n    CITY2adapt  = TextField(_('adapt link'))\n    #and so on\n\n    submit      = SubmitField(_('Save'))\n
\n

Now i'm working on when i "save" them.

\n soup wrap:

Well found this in the documentation and i will use it for now:

in the view:

channel_obj = db.TVChannel().get_id(channel_id) #load a channel's infos into an object
channel     = ChannelForm(request.form, obj=channel_obj) #load channel form
channel.CITY1adapt.process_data(channel_obj.streams['City1']['adapt'])
#and others links

And in the form:

class ChannelForm(Form):    
    _id         = HiddenField()
    name        = TextField(_('channel name'))    
    CITY1adapt  = TextField(_('adapt link')) 
    CITY2adapt  = TextField(_('adapt link'))
    #and so on

    submit      = SubmitField(_('Save'))

Now i'm working on when i "save" them.

qid & accept id: (12785573, 12786882) query: counting zigzag sequences soup:

Orders in whole sequence is given with an ordering of first two elements. There are two types of ordering: up-down-up-... and down-up-down-... There are same number of sequences of both ordering, since sequence of one ordering can be transformed in other order by exchanging each number x with k+1-x.

\n

Let U_k(n) be number of sequences with first up order of length n. Let U_k(n, f) be number of sequences with first up order of length n and with first number f. Similar define D_k(n) and D_k(n, f).

\n

Then number of sequences of length n (for n>1) is:

\n
U_k(n) + D_k(n) = 2*U_k(n) = 2*( sum U_k(n, f) for f in 1 ... k ).\n
\n

Same argument gives:

\n
U_k(n, f) = sum D_k(n-1, s) for s = f+1 ... k\n          = sum U_k(n-1, s) for s = 1 ... k-f\nU_k(1, f) = 1\n
\n

Edit:

\n

Slightly simpler implementation. M(n,k) returns n'th row (from back), and C(n,k) counts number of sequences.

\n
def M(n, k):\n    if n == 1: return [1]*k\n    m = M(n-1, k)\n    return [sum(m[:i]) for i in xrange(k)][::-1]\n\ndef C(n, k):\n    if n < 1: return 0\n    if n == 1: return k\n    return 2*sum(M(n,k))\n
\n soup wrap:

Orders in whole sequence is given with an ordering of first two elements. There are two types of ordering: up-down-up-... and down-up-down-... There are same number of sequences of both ordering, since sequence of one ordering can be transformed in other order by exchanging each number x with k+1-x.

Let U_k(n) be number of sequences with first up order of length n. Let U_k(n, f) be number of sequences with first up order of length n and with first number f. Similar define D_k(n) and D_k(n, f).

Then number of sequences of length n (for n>1) is:

U_k(n) + D_k(n) = 2*U_k(n) = 2*( sum U_k(n, f) for f in 1 ... k ).

Same argument gives:

U_k(n, f) = sum D_k(n-1, s) for s = f+1 ... k
          = sum U_k(n-1, s) for s = 1 ... k-f
U_k(1, f) = 1

Edit:

Slightly simpler implementation. M(n,k) returns n'th row (from back), and C(n,k) counts number of sequences.

def M(n, k):
    if n == 1: return [1]*k
    m = M(n-1, k)
    return [sum(m[:i]) for i in xrange(k)][::-1]

def C(n, k):
    if n < 1: return 0
    if n == 1: return k
    return 2*sum(M(n,k))
qid & accept id: (12824228, 12824269) query: How do you check when a file is done being copied in Python? soup:

Since the files can be copied within the poll interval, just process the new files found by the last poll before checking for new files. In other words, instead of this:

\n
while True:\n    newfiles = check_for_new_files()\n    process(newfiles)\n    time.sleep(pollinterval)\n
\n

Do this:

\n
newfiles = []\n\nwhile True:\n    process(newfiles)\n    newfiles = check_for_new_files()\n    time.sleep(pollinterval)\n
\n

Or just put the wait in the middle of the loop (same effect really):

\n
while True:\n    newfiles = check_for_new_files()\n    time.sleep(pollinterval)\n    process(newfiles)\n
\n soup wrap:

Since the files can be copied within the poll interval, just process the new files found by the last poll before checking for new files. In other words, instead of this:

while True:
    newfiles = check_for_new_files()
    process(newfiles)
    time.sleep(pollinterval)

Do this:

newfiles = []

while True:
    process(newfiles)
    newfiles = check_for_new_files()
    time.sleep(pollinterval)

Or just put the wait in the middle of the loop (same effect really):

while True:
    newfiles = check_for_new_files()
    time.sleep(pollinterval)
    process(newfiles)
qid & accept id: (12901066, 12901477) query: Beautiful soup, html table parsing soup:

Would something like this work:

\n
rows = soup.find('tbody').findAll('tr')\n\nfor row in rows:\n    cells = row.findAll('td')\n\n    output = []\n\n    for i, cell in enumerate(cells):\n        if i == 0:\n            output.append(cell.text.strip())\n        elif cell.find('img'):\n            output.append(cell.find('img')['title'])\n        elif cell.find('input'):\n            output.append(cell.find('input')['value'])\n    print output\n
\n

This outputs the following:

\n
[u'Logged-in users', u'True', u'True', u'True', u'True']\n[u'User 1', u'Confirm', u'Confirm', u'Site', u'Confirm']\n[u'User 2', u'Confirm', u'Confirm', u'Confirm', u'Confirm']\n[u'User 3', u'Confirm', u'Confirm', u'Confirm', u'Confirm']\n[u'User 4', u'Confirm', u'Site', u'Site', u'Confirm']\n
\n soup wrap:

Would something like this work:

rows = soup.find('tbody').findAll('tr')

for row in rows:
    cells = row.findAll('td')

    output = []

    for i, cell in enumerate(cells):
        if i == 0:
            output.append(cell.text.strip())
        elif cell.find('img'):
            output.append(cell.find('img')['title'])
        elif cell.find('input'):
            output.append(cell.find('input')['value'])
    print output

This outputs the following:

[u'Logged-in users', u'True', u'True', u'True', u'True']
[u'User 1', u'Confirm', u'Confirm', u'Site', u'Confirm']
[u'User 2', u'Confirm', u'Confirm', u'Confirm', u'Confirm']
[u'User 3', u'Confirm', u'Confirm', u'Confirm', u'Confirm']
[u'User 4', u'Confirm', u'Site', u'Site', u'Confirm']
qid & accept id: (12923835, 12924005) query: How can I compare dates using Python? soup:

You'll want to use Python's standard library datetime module to parse and convert the "date given by the user" to a datetime.date instance and then subtract that from the current date, datetime.date.today(). For example:

\n
>>> birthdate_str = raw_input('Enter your birthday (yyyy-mm-dd): ')\nEnter your birthday (yyyy-mm-dd): 1981-08-04\n>>> birthdatetime = datetime.datetime.strptime(birthdate_str, '%Y-%m-%d')\n>>> birthdate = birthdatetime.date()  # convert from datetime to just date\n>>> age = datetime.date.today() - birthdate\n>>> age\ndatetime.timedelta(11397)\n
\n

age is a datetime.timedelta instance, and the 11397 is their age in days (available directly via age.days).

\n

To get their age in years, you could do something like this:

\n
>>> int(age.days / 365.24)\n31\n
\n soup wrap:

You'll want to use Python's standard library datetime module to parse and convert the "date given by the user" to a datetime.date instance and then subtract that from the current date, datetime.date.today(). For example:

>>> birthdate_str = raw_input('Enter your birthday (yyyy-mm-dd): ')
Enter your birthday (yyyy-mm-dd): 1981-08-04
>>> birthdatetime = datetime.datetime.strptime(birthdate_str, '%Y-%m-%d')
>>> birthdate = birthdatetime.date()  # convert from datetime to just date
>>> age = datetime.date.today() - birthdate
>>> age
datetime.timedelta(11397)

age is a datetime.timedelta instance, and the 11397 is their age in days (available directly via age.days).

To get their age in years, you could do something like this:

>>> int(age.days / 365.24)
31
qid & accept id: (12972595, 12972810) query: Extract a particular number followed by a command line argument variable from a string in python soup:

You should look into the regular expressions module of python. Simply "import re" in your script to provide regex capabilities.\nBy the way if you only want the numbers following the string "vg" then the following script should do the trick.

\n
import re\nurString = "/dev/vg10/lv10:cp:99"\nMatches = re.findall("vg[0-9]*", mv)\nprint Matches\n
\n

Now matches will have a list containing all vg'number'. That [0-9]* means any digit any number of times. Parse it again to get the numbers from it. You should read more about regular expressions. It's fun.

\n

Extending the answer to match OP's requirement:

\n
In [445]: Matches\nOut[445]: ['vg10']\n\nIn [446]: int(*re.findall(r'[0-9]+', Matches[0]))\nOut[446]: 10\n
\n soup wrap:

You should look into the regular expressions module of python. Simply "import re" in your script to provide regex capabilities. By the way if you only want the numbers following the string "vg" then the following script should do the trick.

import re
urString = "/dev/vg10/lv10:cp:99"
Matches = re.findall("vg[0-9]*", mv)
print Matches

Now matches will have a list containing all vg'number'. That [0-9]* means any digit any number of times. Parse it again to get the numbers from it. You should read more about regular expressions. It's fun.

Extending the answer to match OP's requirement:

In [445]: Matches
Out[445]: ['vg10']

In [446]: int(*re.findall(r'[0-9]+', Matches[0]))
Out[446]: 10
qid & accept id: (12986272, 12991706) query: How do I compute all possibilities for an array of numbers/bits (in python, or any language for that matter) soup:

In Python, use itertools for stuff like this

\n
from itertools import product\nfor i in product([0,1], repeat=5): \n    print i\n
\n

Yields:

\n
(0, 0, 0, 0, 0)\n(0, 0, 0, 0, 1)\n(0, 0, 0, 1, 0)\n(0, 0, 0, 1, 1)\n(0, 0, 1, 0, 0)\netc...\n
\n soup wrap:

In Python, use itertools for stuff like this

from itertools import product
for i in product([0,1], repeat=5): 
    print i

Yields:

(0, 0, 0, 0, 0)
(0, 0, 0, 0, 1)
(0, 0, 0, 1, 0)
(0, 0, 0, 1, 1)
(0, 0, 1, 0, 0)
etc...
qid & accept id: (13031989, 13032338) query: regular expression using in glob.glob of python soup:

The easiest way would be to filter the glob results yourself. Here is how to do it using a simple loop comprehension:

\n
import glob\nres = [f for f in glob.glob("*.txt") if "abc" in f or "123" in f or "a1b" in f]\nfor f in res:\n    print f\n
\n

You could also use a regexp and no glob:

\n
import os\nimport re\nres = [f for f in os.listdir(path) if re.search(r'(abc|123|a1b).*\.txt$', f)]\nfor f in res:\n    print f\n
\n

(By the way, naming a variable list is a bad idea since list is a Python type...)

\n soup wrap:

The easiest way would be to filter the glob results yourself. Here is how to do it using a simple loop comprehension:

import glob
res = [f for f in glob.glob("*.txt") if "abc" in f or "123" in f or "a1b" in f]
for f in res:
    print f

You could also use a regexp and no glob:

import os
import re
res = [f for f in os.listdir(path) if re.search(r'(abc|123|a1b).*\.txt$', f)]
for f in res:
    print f

(By the way, naming a variable list is a bad idea since list is a Python type...)

qid & accept id: (13047895, 13048047) query: Efficient ways to duplicate array/list in Python soup:

Use the timeit module in python for testing timings.

\n
from copy import *\n\na=range(1000)\n\ndef cop():\n    b=copy(a)\n\ndef func1():\n    b=list(a)\n\ndef slice():\n    b=a[:]\n\ndef slice_len():\n    b=a[0:len(a)]\n\n\n\nif __name__=="__main__":\n    import timeit\n    print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")\n    print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")\n    print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")\n    print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")\n
\n

Results:

\n
copy(a) 3.98940896988\nlist(a) 2.54542589188\na[:] 1.96630120277                   #winner\na[0:len(a)] 10.5431251526\n
\n

It's surely the extra steps involved in a[0:len(a)] are the reason for it's slowness.

\n

Here's the byte code comparison of the two:

\n
In [19]: dis.dis(func1)\n  2           0 LOAD_GLOBAL              0 (range)\n              3 LOAD_CONST               1 (100000)\n              6 CALL_FUNCTION            1\n              9 STORE_FAST               0 (a)\n\n  3          12 LOAD_FAST                0 (a)\n             15 SLICE+0             \n             16 STORE_FAST               1 (b)\n             19 LOAD_CONST               0 (None)\n             22 RETURN_VALUE        \n\nIn [20]: dis.dis(func2)\n  2           0 LOAD_GLOBAL              0 (range)\n              3 LOAD_CONST               1 (100000)\n              6 CALL_FUNCTION            1\n              9 STORE_FAST               0 (a)\n\n  3          12 LOAD_FAST                0 (a)    #same up to here\n             15 LOAD_CONST               2 (0)    #loads 0\n             18 LOAD_GLOBAL              1 (len) # loads the builtin len(),\n                                                 # so it might take some lookup time\n             21 LOAD_FAST                0 (a)\n             24 CALL_FUNCTION            1         \n             27 SLICE+3             \n             28 STORE_FAST               1 (b)\n             31 LOAD_CONST               0 (None)\n             34 RETURN_VALUE        \n
\n soup wrap:

Use the timeit module in python for testing timings.

from copy import *

a=range(1000)

def cop():
    b=copy(a)

def func1():
    b=list(a)

def slice():
    b=a[:]

def slice_len():
    b=a[0:len(a)]



if __name__=="__main__":
    import timeit
    print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")
    print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")
    print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")
    print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")

Results:

copy(a) 3.98940896988
list(a) 2.54542589188
a[:] 1.96630120277                   #winner
a[0:len(a)] 10.5431251526

It's surely the extra steps involved in a[0:len(a)] are the reason for it's slowness.

Here's the byte code comparison of the two:

In [19]: dis.dis(func1)
  2           0 LOAD_GLOBAL              0 (range)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (a)

  3          12 LOAD_FAST                0 (a)
             15 SLICE+0             
             16 STORE_FAST               1 (b)
             19 LOAD_CONST               0 (None)
             22 RETURN_VALUE        

In [20]: dis.dis(func2)
  2           0 LOAD_GLOBAL              0 (range)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (a)

  3          12 LOAD_FAST                0 (a)    #same up to here
             15 LOAD_CONST               2 (0)    #loads 0
             18 LOAD_GLOBAL              1 (len) # loads the builtin len(),
                                                 # so it might take some lookup time
             21 LOAD_FAST                0 (a)
             24 CALL_FUNCTION            1         
             27 SLICE+3             
             28 STORE_FAST               1 (b)
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        
qid & accept id: (13097764, 13097832) query: What is an elegant way to select all non-None elements from parameters and place them in a python dictionary? soup:

You could use kwargs:

\n
def function(*args, **kwargs):\n    values = {}\n    for k in kwargs:\n        if kwargs[k] is not None:\n            values[k] = kwargs[k]\n    if not values:\n        raise Exception("No values provided")\n    return values\n\n>>> function(varone=None, vartwo="fish", varthree=None)\n{'vartwo': 'fish'}\n
\n

With this syntax, Python removes the need to explicitly specify any argument list, and allows functions to handle any old keyword arguments they want.

\n

If you're specifically looking for keys var1 etc instead of varone you just modify the function call:

\n
>>> function(var1=None, var2="fish", var3=None)\n{'var2': 'fish'}\n
\n

If you want to be REALLY slick, you can use list comprehensions:

\n
def function(**kwargs):\n    values = dict([i for i in kwargs.iteritems() if i[1] != None])\n    if not values:\n        raise Exception("foo")\n    return values\n
\n

Again, you'll have to alter your parameter names to be consistent with your output keys.

\n soup wrap:

You could use kwargs:

def function(*args, **kwargs):
    values = {}
    for k in kwargs:
        if kwargs[k] is not None:
            values[k] = kwargs[k]
    if not values:
        raise Exception("No values provided")
    return values

>>> function(varone=None, vartwo="fish", varthree=None)
{'vartwo': 'fish'}

With this syntax, Python removes the need to explicitly specify any argument list, and allows functions to handle any old keyword arguments they want.

If you're specifically looking for keys var1 etc instead of varone you just modify the function call:

>>> function(var1=None, var2="fish", var3=None)
{'var2': 'fish'}

If you want to be REALLY slick, you can use list comprehensions:

def function(**kwargs):
    values = dict([i for i in kwargs.iteritems() if i[1] != None])
    if not values:
        raise Exception("foo")
    return values

Again, you'll have to alter your parameter names to be consistent with your output keys.

qid & accept id: (13098173, 13098213) query: How to create a double dictionary in Python? soup:

You could try this:

\n
In [17]: results = {}\n\nIn [18]: for k, v in names.iteritems():\n    results[k] = {v: dates.setdefault(k, '')}\n   ....:\n   ....:\n\nIn [20]: results\nOut[20]: \n{'George': {'march': '21/02'},\n 'Mary': {'february': '2/02'},\n 'Peter': {'may': ''},\n 'Steven': {'april': '14/03'},\n 'Will': {'january': '7/01'}}\n
\n

And as to your comment regarding adding month and day, you can add them similarly:

\n
In [28]: for k, v in names.iteritems():\n    results[k] = {'month': v, 'day': dates.setdefault(k, '')}\n   ....:\n   ....:\n\nIn [30]: results\nOut[30]:\n{'George': {'day': '21/02', 'month': 'march'},\n 'Mary': {'day': '2/02', 'month': 'february'},\n 'Peter': {'day': '', 'month': 'may'},\n 'Steven': {'day': '14/03', 'month': 'april'},\n 'Will': {'day': '7/01', 'month': 'january'}}\n
\n

And if you want to omit day completely in the case where a value doesn't exist:

\n
In [8]: results = {}\n\nIn [9]: for k, v in names.iteritems():\n   ...:     results[k] = {'month': v}\n   ...:     if dates.has_key(k):\n   ...:         results[k]['day'] = dates[k]\n   ...:\n   ...:\n\nIn [10]: results\nOut[10]:\n{'George': {'day': '21/03', 'month': 'march'},\n 'Mary': {'day': '2/02', 'month': 'february'},\n 'Peter': {'month': 'may'},\n 'Steven': {'day': '14/03', 'month': 'april'},\n 'Will': {'day': '7/01', 'month': 'january'}}\n
\n

And in the odd case where you know the date but not the month, iterating through the set of the keys (as @KayZhu suggested) with a defaultdict may be the easiest solution:

\n
In [1]: from collections import defaultdict\n\nIn [2]: names = {'Will': 'january', 'Mary': 'february', 'George': 'march', 'Steven': 'april', 'Peter': 'may'}\n\nIn [3]: dates = {'Will': '7/01', 'George': '21/03', 'Steven': '14/03', 'Mary': '2/02', 'Marat': '27/03'}\n\nIn [4]: results = defaultdict(dict)\n\nIn [5]: for name in set(names.keys() + dates.keys()):\n   ...:     if name in names:\n   ...:         results[name]['month'] = names[name]\n   ...:     if name in dates:\n   ...:         results[name]['day'] = dates[name]\n   ...:\n   ...:\n\nIn [6]: for k, v in results.iteritems():\n   ...:     print k, v\n   ...:\n   ...:\nGeorge {'day': '21/03', 'month': 'march'}\nWill {'day': '7/01', 'month': 'january'}\nMarat {'day': '27/03'}\nSteven {'day': '14/03', 'month': 'april'}\nPeter {'month': 'may'}\nMary {'day': '2/02', 'month': 'february'}\n
\n soup wrap:

You could try this:

In [17]: results = {}

In [18]: for k, v in names.iteritems():
    results[k] = {v: dates.setdefault(k, '')}
   ....:
   ....:

In [20]: results
Out[20]: 
{'George': {'march': '21/02'},
 'Mary': {'february': '2/02'},
 'Peter': {'may': ''},
 'Steven': {'april': '14/03'},
 'Will': {'january': '7/01'}}

And as to your comment regarding adding month and day, you can add them similarly:

In [28]: for k, v in names.iteritems():
    results[k] = {'month': v, 'day': dates.setdefault(k, '')}
   ....:
   ....:

In [30]: results
Out[30]:
{'George': {'day': '21/02', 'month': 'march'},
 'Mary': {'day': '2/02', 'month': 'february'},
 'Peter': {'day': '', 'month': 'may'},
 'Steven': {'day': '14/03', 'month': 'april'},
 'Will': {'day': '7/01', 'month': 'january'}}

And if you want to omit day completely in the case where a value doesn't exist:

In [8]: results = {}

In [9]: for k, v in names.iteritems():
   ...:     results[k] = {'month': v}
   ...:     if dates.has_key(k):
   ...:         results[k]['day'] = dates[k]
   ...:
   ...:

In [10]: results
Out[10]:
{'George': {'day': '21/03', 'month': 'march'},
 'Mary': {'day': '2/02', 'month': 'february'},
 'Peter': {'month': 'may'},
 'Steven': {'day': '14/03', 'month': 'april'},
 'Will': {'day': '7/01', 'month': 'january'}}

And in the odd case where you know the date but not the month, iterating through the set of the keys (as @KayZhu suggested) with a defaultdict may be the easiest solution:

In [1]: from collections import defaultdict

In [2]: names = {'Will': 'january', 'Mary': 'february', 'George': 'march', 'Steven': 'april', 'Peter': 'may'}

In [3]: dates = {'Will': '7/01', 'George': '21/03', 'Steven': '14/03', 'Mary': '2/02', 'Marat': '27/03'}

In [4]: results = defaultdict(dict)

In [5]: for name in set(names.keys() + dates.keys()):
   ...:     if name in names:
   ...:         results[name]['month'] = names[name]
   ...:     if name in dates:
   ...:         results[name]['day'] = dates[name]
   ...:
   ...:

In [6]: for k, v in results.iteritems():
   ...:     print k, v
   ...:
   ...:
George {'day': '21/03', 'month': 'march'}
Will {'day': '7/01', 'month': 'january'}
Marat {'day': '27/03'}
Steven {'day': '14/03', 'month': 'april'}
Peter {'month': 'may'}
Mary {'day': '2/02', 'month': 'february'}
qid & accept id: (13103195, 13103378) query: Using beautifulsoup to parse tag with some text soup:

so just try:

\n
from BeautifulSoup import BeautifulSoup\n\ntext = """\n
PLZ:
\n
\n8047\n
"""\n\nnumber = BeautifulSoup(text).find("dt",text="PLZ:").parent.findNextSiblings("dd")\nprint BeautifulSoup(''.join(number[0]))\n
\n

or if you find with findNext try:

\n
number = BeautifulSoup(text).find("dt",text="PLZ:").parent.findNext("dd").contents[0]\n
\n soup wrap:

so just try:

from BeautifulSoup import BeautifulSoup

text = """
PLZ:
8047
""" number = BeautifulSoup(text).find("dt",text="PLZ:").parent.findNextSiblings("dd") print BeautifulSoup(''.join(number[0]))

or if you find with findNext try:

number = BeautifulSoup(text).find("dt",text="PLZ:").parent.findNext("dd").contents[0]
qid & accept id: (13208212, 13208233) query: Python Variable in an HTML email in Python soup:

Use "formatstring".format:

\n
code = "We Says Thanks!"\nhtml = """\\n\n  \n  \n    

Thank you for being a loyal customer.
\n Here is your unique code to unlock exclusive content:
\n

{code}


\n \n

\n \n\n""".format(code=code)\n
\n

If you find yourself substituting a large number of variables, you can use

\n
.format(**locals())\n
\n soup wrap:

Use "formatstring".format:

code = "We Says Thanks!"
html = """\

  
  
    

Thank you for being a loyal customer.
Here is your unique code to unlock exclusive content:


{code}


""".format(code=code)

If you find yourself substituting a large number of variables, you can use

.format(**locals())
qid & accept id: (13247749, 13248151) query: how to change a node value in python soup:

First, the example is not valid xml. You can use xml.etree that comes included:

\n
from xml.etree import ElementTree as et\nxmlstr="""\\n\n\n  \n    \n         Jaipur\n    \n \n"""\ndoc=et.fromstring(xmlstr)\ndoc.find('.//name').text='Mumbai'\nprint et.tostring(doc)\n
\n

output:

\n
\n  \n    \n         Mumbai\n    \n \n\n
\n soup wrap:

First, the example is not valid xml. You can use xml.etree that comes included:

from xml.etree import ElementTree as et
xmlstr="""\


  
    
         Jaipur
    
 
"""
doc=et.fromstring(xmlstr)
doc.find('.//name').text='Mumbai'
print et.tostring(doc)

output:


  
    
         Mumbai
    
 

qid & accept id: (13276796, 13276811) query: print tuple as number of arguments soup:

In Python 2.x you can use str.join:

\n
def my_print(*args):\n    print ' '.join(map(str, args))\n
\n

If you are using Python 3.x then it's even easier because there's a print function:

\n
def my_print(*args):\n    print(*args)\n
\n

Other answers also mention that you can from __future__ import print_function, but this has the disadvantage that all your existing code that uses the print statement will break.

\n soup wrap:

In Python 2.x you can use str.join:

def my_print(*args):
    print ' '.join(map(str, args))

If you are using Python 3.x then it's even easier because there's a print function:

def my_print(*args):
    print(*args)

Other answers also mention that you can from __future__ import print_function, but this has the disadvantage that all your existing code that uses the print statement will break.

qid & accept id: (13279399, 13279573) query: How to obtain values of request variables using Python and Flask soup:

You can get posted form data from request.form and query string data from request.args.

\n
myvar =  request.form["myvar"]\n
\n
myvar = request.args["myvar"]\n
\n soup wrap:

You can get posted form data from request.form and query string data from request.args.

myvar =  request.form["myvar"]
myvar = request.args["myvar"]
qid & accept id: (13374028, 13374118) query: how to Perform search operation in Django? soup:

Suppose you are having a button with id="resume", then you can invoke a particular Django view using the following sample code:

\n
$("#resume").bind("click",function() {\n    $.post("/resume/",\n    {\n        name: "Resume" //Any example parameter that is to be passed to the view function.\n    },\n    function(data,textStatus)\n    {\n    //Callback function on success\n    });\n});\n
\n

And the example view function in views.py:

\n
@csrf_exempt\ndef resume(request):\n    //Do your search operation.\n    return HttpResponse(status=200)\n
\n

The above code also needs jQuery library in your HTML code.

\n soup wrap:

Suppose you are having a button with id="resume", then you can invoke a particular Django view using the following sample code:

$("#resume").bind("click",function() {
    $.post("/resume/",
    {
        name: "Resume" //Any example parameter that is to be passed to the view function.
    },
    function(data,textStatus)
    {
    //Callback function on success
    });
});

And the example view function in views.py:

@csrf_exempt
def resume(request):
    //Do your search operation.
    return HttpResponse(status=200)

The above code also needs jQuery library in your HTML code.

qid & accept id: (13382774, 13382804) query: Initialize list with same bool value soup:

You can do it like this: -

\n
>>> [False] * 10\n[False, False, False, False, False, False, False, False, False, False]\n
\n

NOTE: -\nNote that, you should never do this with a list of mutable types with same value, else you will see surprising behaviour like the one in below example: -

\n
>>> my_list = [[10]] * 3\n>>> my_list\n[[10], [10], [10]]\n>>> my_list[0][0] = 5\n>>> my_list\n[[5], [5], [5]]\n
\n

As you can see, changes you made in one inner list, is reflected in all of them.

\n soup wrap:

You can do it like this: -

>>> [False] * 10
[False, False, False, False, False, False, False, False, False, False]

NOTE: - Note that, you should never do this with a list of mutable types with same value, else you will see surprising behaviour like the one in below example: -

>>> my_list = [[10]] * 3
>>> my_list
[[10], [10], [10]]
>>> my_list[0][0] = 5
>>> my_list
[[5], [5], [5]]

As you can see, changes you made in one inner list, is reflected in all of them.

qid & accept id: (13390315, 13390349) query: have multiple users as one model field in many to one format django models soup:
class Project(models.Model):\n    name = models.CharField(max_length=100)\n    users = models.ManyToManyField(User)\n\nclass Task(models.Model):\n    project = models.ForeignKey(Project, related_name='project_tasks')\n    name = models.CharField(max_length=300)\n    assignee = models.ForeignKey(User, related_name='tasks')\n
\n

Get the the users participating in a specific project:

\n
p = Project.objects.get(name='myproject')\nusers = p.users.all()\n
\n

Get a project's tasks:

\n
users = p.project_tasks.all()  # Because of `related_name` in Task.project\n
\n

Get All the tasks a user has, of all the projects:

\n
u = User.objects.get(username='someuser')\nu.tasks.all()  # Because of `related_name` in Task.assignee\n
\n

Notes:

\n
    \n
  1. A ForeignKey is a Many to One relationship. e.g. A Task belongs to only one Project - A Project can have many Tasks.

  2. \n
  3. You don't need superfluous field names. name is better than Project_Name

  4. \n
  5. One question at a time.

  6. \n
\n soup wrap:
class Project(models.Model):
    name = models.CharField(max_length=100)
    users = models.ManyToManyField(User)

class Task(models.Model):
    project = models.ForeignKey(Project, related_name='project_tasks')
    name = models.CharField(max_length=300)
    assignee = models.ForeignKey(User, related_name='tasks')

Get the the users participating in a specific project:

p = Project.objects.get(name='myproject')
users = p.users.all()

Get a project's tasks:

users = p.project_tasks.all()  # Because of `related_name` in Task.project

Get All the tasks a user has, of all the projects:

u = User.objects.get(username='someuser')
u.tasks.all()  # Because of `related_name` in Task.assignee

Notes:

  1. A ForeignKey is a Many to One relationship. e.g. A Task belongs to only one Project - A Project can have many Tasks.

  2. You don't need superfluous field names. name is better than Project_Name

  3. One question at a time.

qid & accept id: (13405223, 13405331) query: Regex? Match part of or whole word soup:
import re\n\ndef get_matcher(word, minchars):\n    reg = '|'.join([word[0:i] for i in range(len(word), minchars - 1, -1)])\n    return re.compile('(%s)$' % (reg))\n\nmatcher = get_matcher('potato', 4)\nfor s in ["this is a sentence about a potato", "this is a sentence about a potat", "this is another sentence about a pota"]:\n    print matcher.search(s).groups()\n
\n

OUTPUT

\n
('potato',)\n('potat',)\n('pota',)\n
\n soup wrap:
import re

def get_matcher(word, minchars):
    reg = '|'.join([word[0:i] for i in range(len(word), minchars - 1, -1)])
    return re.compile('(%s)$' % (reg))

matcher = get_matcher('potato', 4)
for s in ["this is a sentence about a potato", "this is a sentence about a potat", "this is another sentence about a pota"]:
    print matcher.search(s).groups()

OUTPUT

('potato',)
('potat',)
('pota',)
qid & accept id: (13407560, 13407845) query: Python reverse integer using recursion soup:

This should work:

\n
from math import log10\ndef rev(num):\n    if num < 10:\n        return num\n    else:\n        ones = num % 10\n        rest = num // 10\n        #print ones, rest, int(log10(rest) + 1), ones * 10 ** int(log10(rest) + 1)\n        return ones * 10 ** int(log10(rest) + 1) + rev(rest)\nprint rev(9000), rev(1234), rev(1234567890123456789)\n
\n

You could also reduce the number of times you call log10 and number of math operations by using a nested recursive function:

\n
def rev(num):\n    def rec(num, tens):\n        if num < 10:\n            return num        \n        else:\n            return num % 10 * tens + rec(num // 10, tens // 10)\n    return rec(num, 10 ** int(log10(num)))\n
\n soup wrap:

This should work:

from math import log10
def rev(num):
    if num < 10:
        return num
    else:
        ones = num % 10
        rest = num // 10
        #print ones, rest, int(log10(rest) + 1), ones * 10 ** int(log10(rest) + 1)
        return ones * 10 ** int(log10(rest) + 1) + rev(rest)
print rev(9000), rev(1234), rev(1234567890123456789)

You could also reduce the number of times you call log10 and number of math operations by using a nested recursive function:

def rev(num):
    def rec(num, tens):
        if num < 10:
            return num        
        else:
            return num % 10 * tens + rec(num // 10, tens // 10)
    return rec(num, 10 ** int(log10(num)))
qid & accept id: (13416327, 13416687) query: How to post an image in Python just like byte array in Java? soup:

Have a look at the requests documentation on how to send multipart requests. Basically you just need to do:

\n
>>> url = 'http://httpbin.org/post'\n>>> files = {'file': open('report.xls', 'rb')}   \n>>> r = requests.post(url, files=files)\n
\n

Or in your case:

\n
>>> r = requests.get(url1)\n>>> files = {'image': r.content}   \n>>> r = requests.post(url2, files=files)\n
\n soup wrap:

Have a look at the requests documentation on how to send multipart requests. Basically you just need to do:

>>> url = 'http://httpbin.org/post'
>>> files = {'file': open('report.xls', 'rb')}   
>>> r = requests.post(url, files=files)

Or in your case:

>>> r = requests.get(url1)
>>> files = {'image': r.content}   
>>> r = requests.post(url2, files=files)
qid & accept id: (13454695, 13457566) query: How to print c_ubyte_Array object in Python soup:

Assuming it's a null-terminated string, you can cast the array to a char * and use its value. Here's an example where that's not the case.

\n
>>> class Person(Structure): _fields_ = [("name", c_ubyte * 8), ('age', c_ubyte)]\n... \n>>> smith = Person((c_ubyte * 8)(*bytearray('Mr Smith')), 9)\n>>> smith.age\n9\n>>> cast(smith.name, c_char_p).value\n'Mr Smith\t'\n
\n

"Mr Smith" fills up the array, so casting to c_char_p includes the value of the next field, which is 9 (ASCII tab), and who knows what else, however much until it reaches a null byte.

\n

Instead you can iterate the array with join:

\n
>>> ''.join(map(chr, smith.name))\n'Mr Smith'\n
\n

Or use a bytearray:

\n
>>> bytearray(smith.name)\nbytearray(b'Mr Smith')\n
\n

Python 3:

\n
>>> smith = Person((c_ubyte * 8)(*b'Mr Smith'), 9)\n>>> bytes(smith.name).decode('ascii')\n'Mr Smith'\n
\n soup wrap:

Assuming it's a null-terminated string, you can cast the array to a char * and use its value. Here's an example where that's not the case.

>>> class Person(Structure): _fields_ = [("name", c_ubyte * 8), ('age', c_ubyte)]
... 
>>> smith = Person((c_ubyte * 8)(*bytearray('Mr Smith')), 9)
>>> smith.age
9
>>> cast(smith.name, c_char_p).value
'Mr Smith\t'

"Mr Smith" fills up the array, so casting to c_char_p includes the value of the next field, which is 9 (ASCII tab), and who knows what else, however much until it reaches a null byte.

Instead you can iterate the array with join:

>>> ''.join(map(chr, smith.name))
'Mr Smith'

Or use a bytearray:

>>> bytearray(smith.name)
bytearray(b'Mr Smith')

Python 3:

>>> smith = Person((c_ubyte * 8)(*b'Mr Smith'), 9)
>>> bytes(smith.name).decode('ascii')
'Mr Smith'
qid & accept id: (13471083, 13473702) query: How to extend model on serializer level with django-rest-framework soup:

First, create a view that will return the MenuItemComponent instances that you're interested in.

\n
class ListComponents(generics.ListAPIView):\n    serializer_class = MenuItemComponentSerializer\n\n    def get_queryset(self):\n        """\n        Override .get_queryset() to filter the items returned by the list.\n        """\n        menuitem = self.kwargs['menuitem']\n        return MenuItemComponent.objects.filter(menuItem=menuitem)\n
\n

Then you need to create a serializer to give you the representation you want. Your example is a bit more interesting/involved than the typical case, so it'd look something like this...

\n
class MenuItemComponentSerializer(serializers.Serializer):\n    url = ComponentURLField(source='component')\n    name = Field(source='component.name')\n    isReplaceable = Field()\n
\n

The fields 'name' and 'isReplaceable' can simply use the default read-only Field class.

\n

There's no field that quite meets your 'url' case here, so we'll create a custom field for that:

\n
class ComponentURLField(serializers.Field):\n    def to_native(self, obj):\n        """\n        Return a URL, given a component instance, 'obj'.\n        """\n\n        # Something like this...\n        request = self.context['request']\n        return reverse('component-detail', kwargs=kwargs, request=request)\n
\n

I think that should all be about right.

\n

That's for a read-only serialization - if you wanted a writable serialization you'd need to look into overriding the restore_object method on the serializer, and using WritableField, or something along those lines.

\n soup wrap:

First, create a view that will return the MenuItemComponent instances that you're interested in.

class ListComponents(generics.ListAPIView):
    serializer_class = MenuItemComponentSerializer

    def get_queryset(self):
        """
        Override .get_queryset() to filter the items returned by the list.
        """
        menuitem = self.kwargs['menuitem']
        return MenuItemComponent.objects.filter(menuItem=menuitem)

Then you need to create a serializer to give you the representation you want. Your example is a bit more interesting/involved than the typical case, so it'd look something like this...

class MenuItemComponentSerializer(serializers.Serializer):
    url = ComponentURLField(source='component')
    name = Field(source='component.name')
    isReplaceable = Field()

The fields 'name' and 'isReplaceable' can simply use the default read-only Field class.

There's no field that quite meets your 'url' case here, so we'll create a custom field for that:

class ComponentURLField(serializers.Field):
    def to_native(self, obj):
        """
        Return a URL, given a component instance, 'obj'.
        """

        # Something like this...
        request = self.context['request']
        return reverse('component-detail', kwargs=kwargs, request=request)

I think that should all be about right.

That's for a read-only serialization - if you wanted a writable serialization you'd need to look into overriding the restore_object method on the serializer, and using WritableField, or something along those lines.

qid & accept id: (13497170, 13497227) query: clean way to accomplish -- if x in [(0, 1, 2), (2, 0, 1), (1, 2, 0)]:? soup:

You can get the cycles of the list with:

\n
def cycles(a):\n    return [ a[i:] + a[:i] for i in range(len(a)) ]\n
\n

You can then check if b is a cycle of a with:

\n
b in cycles(a)\n
\n

If the length of the list is long, or if want to make multiple comparison to the same cycles, it may be beneficial (performance wise) to embed the results in a set.

\n
set_cycles = set(cycles(a))\nb in set_cycles\n
\n

You can prevent necessarily constructing all the cycles by embedding the equality check in the list and using any:

\n
any( b == a[i:]+a[:i] for i in range(len(a)))\n
\n

You could also achieve this effect by turning the cycles function into a generator.

\n soup wrap:

You can get the cycles of the list with:

def cycles(a):
    return [ a[i:] + a[:i] for i in range(len(a)) ]

You can then check if b is a cycle of a with:

b in cycles(a)

If the length of the list is long, or if want to make multiple comparison to the same cycles, it may be beneficial (performance wise) to embed the results in a set.

set_cycles = set(cycles(a))
b in set_cycles

You can prevent necessarily constructing all the cycles by embedding the equality check in the list and using any:

any( b == a[i:]+a[:i] for i in range(len(a)))

You could also achieve this effect by turning the cycles function into a generator.

qid & accept id: (13539339, 13539360) query: Python regex: How to specify an optional match (for potentially empty sub expression)? soup:

To make abc_ optional, you could use the question mark operator:

\n
(abc_)?\n
\n

Thus, the entire regex becomes:

\n
r'foo_(abc_)?bar'\n
\n

With this regex, the second underscore (if present) will become part of the capture group. If you don't want that, you could either remove it post-match with .rstrip('_') or use a slightly more complex regex:

\n
r'foo_(?:(abc)_)?bar'\n
\n
\n

I found that [_|] does not match an empty string.

\n
\n

That's right. Square brackets denote a character group. The [_|] would match exactly one underscore or exactly one vertical bar, and nothing else. In other words, the vertical bar loses its special meaning when it appears inside a character group.

\n soup wrap:

To make abc_ optional, you could use the question mark operator:

(abc_)?

Thus, the entire regex becomes:

r'foo_(abc_)?bar'

With this regex, the second underscore (if present) will become part of the capture group. If you don't want that, you could either remove it post-match with .rstrip('_') or use a slightly more complex regex:

r'foo_(?:(abc)_)?bar'

I found that [_|] does not match an empty string.

That's right. Square brackets denote a character group. The [_|] would match exactly one underscore or exactly one vertical bar, and nothing else. In other words, the vertical bar loses its special meaning when it appears inside a character group.

qid & accept id: (13548996, 13549092) query: Reiterating over lists and dictionaries soup:

The way I'd do this is something like this:

\n
import operator\nfrom collections import defaultdict\nlistoflists = [['A', 'B', 'C', 'D'], ['B', 'A', 'C', 'D'], ['B', 'C', 'D', 'A']]\n\ndef borda(listoflists):\n   outdict = defaultdict(int)\n   for item in listoflists:\n      outdict[item[0]] += 3\n      outdict[item[1]] += 2\n      outdict[item[2]] += 1\n\n   highestitem = max(outdict.iteritems(), key=operator.itemgetter(1))[0]\n   outlist = [outdict[item[0]] for item in sorted(outdict.keys())]\n\n   return (highestitem, outlist)\n
\n

Update:
\nI'm not sure why you wouldn't be able to import standard modules, but if for whatever reason you're forbidden from using the import statement, here's a version with only built-in functions:

\n
listoflists = [['A', 'B', 'C', 'D'], ['B', 'A', 'C', 'D'], ['B', 'C', 'D', 'A']]\n\ndef borda(listoflists):\n    outdict = {}\n    for singlelist in listoflists:\n        # Below, we're just turning singlelist around in order to\n        # make use of index numbers from enumerate to add to the scores\n        for index, item in enumerate(singlelist[2::-1]):\n            if item not in outdict:\n                outdict[item] = index + 1\n            else:\n                outdict[item] += index + 1\n\n    highestitem = max(outdict.iteritems(), key=lambda i: i[1])[0]\n    outlist = [outdict[item[0]] for item in sorted(outdict.keys())]\n\n    return (highestitem, outlist)\n
\n soup wrap:

The way I'd do this is something like this:

import operator
from collections import defaultdict
listoflists = [['A', 'B', 'C', 'D'], ['B', 'A', 'C', 'D'], ['B', 'C', 'D', 'A']]

def borda(listoflists):
   outdict = defaultdict(int)
   for item in listoflists:
      outdict[item[0]] += 3
      outdict[item[1]] += 2
      outdict[item[2]] += 1

   highestitem = max(outdict.iteritems(), key=operator.itemgetter(1))[0]
   outlist = [outdict[item[0]] for item in sorted(outdict.keys())]

   return (highestitem, outlist)

Update:
I'm not sure why you wouldn't be able to import standard modules, but if for whatever reason you're forbidden from using the import statement, here's a version with only built-in functions:

listoflists = [['A', 'B', 'C', 'D'], ['B', 'A', 'C', 'D'], ['B', 'C', 'D', 'A']]

def borda(listoflists):
    outdict = {}
    for singlelist in listoflists:
        # Below, we're just turning singlelist around in order to
        # make use of index numbers from enumerate to add to the scores
        for index, item in enumerate(singlelist[2::-1]):
            if item not in outdict:
                outdict[item] = index + 1
            else:
                outdict[item] += index + 1

    highestitem = max(outdict.iteritems(), key=lambda i: i[1])[0]
    outlist = [outdict[item[0]] for item in sorted(outdict.keys())]

    return (highestitem, outlist)
qid & accept id: (13562613, 13563055) query: Pair combinations of elements in dictionary without repetition soup:

Your combinations approach was correct, you just need to turn the results of each combination into a dict again:

\n
import itertools\n\ndef pairwise(input):\n    for values in input.itervalues():\n        for pair in itertools.combinations(values.iteritems(), 2):\n            yield dict(pair)\n
\n

This version is a generator, yielding pairs efficiently, nothing is held in memory any longer than absolutely necessary. If you need a list, just call list() on the generator:

\n
list(pairwise(pleio))\n
\n

Output:

\n
>>> from pprint import pprint\n>>> pprint(list(pairwise(pleio)))\n[{'enf2': ['48', 'free'], 'enf3': ['34', 'set']},\n {'enf1': ['54', 'set'], 'enf3': ['34', 'set']},\n {'enf3': ['34', 'set'], 'enf4': ['12', 'free']},\n {'enf1': ['54', 'set'], 'enf2': ['48', 'free']},\n {'enf2': ['48', 'free'], 'enf4': ['12', 'free']},\n {'enf1': ['54', 'set'], 'enf4': ['12', 'free']}]\n
\n

You can even combine the whole thing into a one-liner generator:

\n
from itertools import combinations\n\nfor paired in (dict(p) for v in pleio.itervalues() for p in combinations(v.iteritems(), 2)):\n    print paired\n
\n

Which outputs:

\n
>>> for paired in (dict(p) for v in pleio.itervalues() for p in combinations(v.iteritems(), 2)):\n...     print paired\n... \n{'enf3': ['34', 'set'], 'enf2': ['48', 'free']}\n{'enf3': ['34', 'set'], 'enf1': ['54', 'set']}\n{'enf3': ['34', 'set'], 'enf4': ['12', 'free']}\n{'enf2': ['48', 'free'], 'enf1': ['54', 'set']}\n{'enf2': ['48', 'free'], 'enf4': ['12', 'free']}\n{'enf1': ['54', 'set'], 'enf4': ['12', 'free']}\n
\n

If you are on Python 3, replace .itervalues() and .iteritems() by .values() and .items() respectively.

\n soup wrap:

Your combinations approach was correct, you just need to turn the results of each combination into a dict again:

import itertools

def pairwise(input):
    for values in input.itervalues():
        for pair in itertools.combinations(values.iteritems(), 2):
            yield dict(pair)

This version is a generator, yielding pairs efficiently, nothing is held in memory any longer than absolutely necessary. If you need a list, just call list() on the generator:

list(pairwise(pleio))

Output:

>>> from pprint import pprint
>>> pprint(list(pairwise(pleio)))
[{'enf2': ['48', 'free'], 'enf3': ['34', 'set']},
 {'enf1': ['54', 'set'], 'enf3': ['34', 'set']},
 {'enf3': ['34', 'set'], 'enf4': ['12', 'free']},
 {'enf1': ['54', 'set'], 'enf2': ['48', 'free']},
 {'enf2': ['48', 'free'], 'enf4': ['12', 'free']},
 {'enf1': ['54', 'set'], 'enf4': ['12', 'free']}]

You can even combine the whole thing into a one-liner generator:

from itertools import combinations

for paired in (dict(p) for v in pleio.itervalues() for p in combinations(v.iteritems(), 2)):
    print paired

Which outputs:

>>> for paired in (dict(p) for v in pleio.itervalues() for p in combinations(v.iteritems(), 2)):
...     print paired
... 
{'enf3': ['34', 'set'], 'enf2': ['48', 'free']}
{'enf3': ['34', 'set'], 'enf1': ['54', 'set']}
{'enf3': ['34', 'set'], 'enf4': ['12', 'free']}
{'enf2': ['48', 'free'], 'enf1': ['54', 'set']}
{'enf2': ['48', 'free'], 'enf4': ['12', 'free']}
{'enf1': ['54', 'set'], 'enf4': ['12', 'free']}

If you are on Python 3, replace .itervalues() and .iteritems() by .values() and .items() respectively.

qid & accept id: (13584299, 13587198) query: Generate nested dictionary with list and dict comprehensions soup:

It seems that a "person" and a "tweet" are going to be objects that have their own data, and functions. You can logically associate this idea by wrapping things up in a class. For example:

\n
class tweet(object):\n    def __init__(self, text):\n        self.text = text\n        self.retweets = 0\n    def retweet(self):\n        self.retweets += 1\n    def __repr__(self):\n        return "(%i)" % (self.retweets)\n    def __hash__(self):\n        return hash(self.text)\n\nclass person(object):\n    def __init__(self, name):\n        self.name = name\n        self.tweets = dict()\n\n    def __repr__(self):\n        return "%s : %s" % (self.name, self.tweets)\n\n    def new_tweet(self, text):\n        self.tweets[text] = tweet(text)\n\n    def retweet(self, text):\n        self.tweets[text].retweet()\n\nM = person("mac389")\nM.new_tweet('foo')\nM.new_tweet('bar')\nM.retweet('foo')\nM.retweet('foo')\n\nprint M\n
\n

Would give:

\n
mac389 : {'foo': (2), 'bar': (0)}\n
\n

The advantage here is twofold. One, is that new data associated with a person or tweet is added in an obvious and logical way. The second is that you've created a nice user interface (even if you're the only one using it!) that will make life easier in the long run.

\n soup wrap:

It seems that a "person" and a "tweet" are going to be objects that have their own data, and functions. You can logically associate this idea by wrapping things up in a class. For example:

class tweet(object):
    def __init__(self, text):
        self.text = text
        self.retweets = 0
    def retweet(self):
        self.retweets += 1
    def __repr__(self):
        return "(%i)" % (self.retweets)
    def __hash__(self):
        return hash(self.text)

class person(object):
    def __init__(self, name):
        self.name = name
        self.tweets = dict()

    def __repr__(self):
        return "%s : %s" % (self.name, self.tweets)

    def new_tweet(self, text):
        self.tweets[text] = tweet(text)

    def retweet(self, text):
        self.tweets[text].retweet()

M = person("mac389")
M.new_tweet('foo')
M.new_tweet('bar')
M.retweet('foo')
M.retweet('foo')

print M

Would give:

mac389 : {'foo': (2), 'bar': (0)}

The advantage here is twofold. One, is that new data associated with a person or tweet is added in an obvious and logical way. The second is that you've created a nice user interface (even if you're the only one using it!) that will make life easier in the long run.

qid & accept id: (13612437, 13612527) query: How to implement man-like help page in python(python shell already has it) soup:

Look at the code for pydoc, i.e.:

\n
    Python27\Lib\pydoc.py\n
\n

(This is for Windows, of course everywhere else the slashes go the other way.)

\n

Helper class's help member function calls doc function calls render_doc, which is probably the function you want.

\n
import sys\nimport pydoc\n\nplainSysDoc = pydoc.plain((pydoc.render_doc(sys)))\nprint plainSysDoc\n
\n

pydoc.plain is a formatting function (that removes bold formatting).

\n

As a side note, while fact checking this answer I learned that pydoc can be called from the command line:

\n
pydoc sys\n
\n soup wrap:

Look at the code for pydoc, i.e.:

    Python27\Lib\pydoc.py

(This is for Windows, of course everywhere else the slashes go the other way.)

Helper class's help member function calls doc function calls render_doc, which is probably the function you want.

import sys
import pydoc

plainSysDoc = pydoc.plain((pydoc.render_doc(sys)))
print plainSysDoc

pydoc.plain is a formatting function (that removes bold formatting).

As a side note, while fact checking this answer I learned that pydoc can be called from the command line:

pydoc sys
qid & accept id: (13669642, 13677926) query: Gtk 3 python entry color soup:

There are different parts to this solution. Firstly the coloring is handling by the css which colors the text grey when the entry box is not focused and then black as it gains focus. The second part is up to you if you wish to implement which is that you might want to clear the grey text in the box as the text box gets focus. For convenience I have put focus in and out event handlers that currently just print to the terminal.

\n

code

\n
from gi.repository import Gtk, Gdk\n\ndef focus_in(*args):\n    print 'focus_in called'\n\ndef focus_out(*args):\n    print 'focus_out called'\n\nwindow = Gtk.Window()\nwindow.connect('destroy', Gtk.main_quit)\nscreen = Gdk.Screen.get_default()\ncss_provider = Gtk.CssProvider()\ncss_provider.load_from_path('style.css')\npriority = Gtk.STYLE_PROVIDER_PRIORITY_USER\ncontext = Gtk.StyleContext()\ncontext.add_provider_for_screen(screen, css_provider, priority)\nfname = Gtk.Entry(text='First Name')\nlname = Gtk.Entry(text='Last Name')\nbutton = Gtk.Button('Submit')\nfname.connect('focus-in-event', focus_in)\nfname.connect('focus-out-event', focus_out)\nvbox = Gtk.VBox()\nvbox.add(fname)\nvbox.add(lname)\nvbox.add(button)\nwindow.add(vbox)\nwindow.show_all()\nGtk.main()\n
\n

style.css

\n
GtkEntry {\n    color: darkgrey;\n}\n\nGtkEntry:focused {\n    color: black;\n}\n
\n

screenshot

\n

enter image description here

\n soup wrap:

There are different parts to this solution. Firstly the coloring is handling by the css which colors the text grey when the entry box is not focused and then black as it gains focus. The second part is up to you if you wish to implement which is that you might want to clear the grey text in the box as the text box gets focus. For convenience I have put focus in and out event handlers that currently just print to the terminal.

code

from gi.repository import Gtk, Gdk

def focus_in(*args):
    print 'focus_in called'

def focus_out(*args):
    print 'focus_out called'

window = Gtk.Window()
window.connect('destroy', Gtk.main_quit)
screen = Gdk.Screen.get_default()
css_provider = Gtk.CssProvider()
css_provider.load_from_path('style.css')
priority = Gtk.STYLE_PROVIDER_PRIORITY_USER
context = Gtk.StyleContext()
context.add_provider_for_screen(screen, css_provider, priority)
fname = Gtk.Entry(text='First Name')
lname = Gtk.Entry(text='Last Name')
button = Gtk.Button('Submit')
fname.connect('focus-in-event', focus_in)
fname.connect('focus-out-event', focus_out)
vbox = Gtk.VBox()
vbox.add(fname)
vbox.add(lname)
vbox.add(button)
window.add(vbox)
window.show_all()
Gtk.main()

style.css

GtkEntry {
    color: darkgrey;
}

GtkEntry:focused {
    color: black;
}

screenshot

enter image description here

qid & accept id: (13695181, 13695217) query: Pass a counter to every python logging method soup:

You could use a logging.Filter:

\n
import logging\n\nclass ContextFilter(logging.Filter):\n    def filter(self, record):\n        record.count = counter\n        return True\n\nlogging.basicConfig(\n    level = logging.DEBUG,\n    format = '%(levelname)-8s: %(count)s: %(message)s')\nlogger = logging.getLogger(__name__)\nlogger.addFilter(ContextFilter())\n\ncounter = 5\nlogger.debug('First Event')\ncounter += 2\nlogger.warning('Second Event')\n
\n

yields

\n
DEBUG   : 5: First Event\nWARNING : 7: Second Event\n
\n soup wrap:

You could use a logging.Filter:

import logging

class ContextFilter(logging.Filter):
    def filter(self, record):
        record.count = counter
        return True

logging.basicConfig(
    level = logging.DEBUG,
    format = '%(levelname)-8s: %(count)s: %(message)s')
logger = logging.getLogger(__name__)
logger.addFilter(ContextFilter())

counter = 5
logger.debug('First Event')
counter += 2
logger.warning('Second Event')

yields

DEBUG   : 5: First Event
WARNING : 7: Second Event
qid & accept id: (13701374, 13746443) query: How to avoid defining a variable to hold a function result which might be needed only once soup:

I found a neat solution (better ones are welcomed) using a decorator which saves the last value returned by the decorated function in the last attribute. My construct becomes:

\n
if my_func(x) == some_value:\n    # do anything with the value returned by my_func, saved in my_func.last\n    # such as\n    print my_func.last\n    return my_func.last\n
\n

Concise and clear. The function is evaluated only once and you don't need to introduce an annoying temporary variable.

\n

Of course you must remember to decorate the functions which you want to 'enable' the last attribute for, using (assumed that the decorator name is save_last):

\n
@save_last\ndef my_func(...):\n    # function definition\n
\n

The decorator is defined as:

\n
# last value returned by decorated function is accessible as 'last' attribute\ndef save_last(f):\n    def w(*args, **kwargs): # w is the 'wrapper' function\n        w.last=f(*args, **kwargs)\n        return w.last\n    return w\n
\n soup wrap:

I found a neat solution (better ones are welcomed) using a decorator which saves the last value returned by the decorated function in the last attribute. My construct becomes:

if my_func(x) == some_value:
    # do anything with the value returned by my_func, saved in my_func.last
    # such as
    print my_func.last
    return my_func.last

Concise and clear. The function is evaluated only once and you don't need to introduce an annoying temporary variable.

Of course you must remember to decorate the functions which you want to 'enable' the last attribute for, using (assumed that the decorator name is save_last):

@save_last
def my_func(...):
    # function definition

The decorator is defined as:

# last value returned by decorated function is accessible as 'last' attribute
def save_last(f):
    def w(*args, **kwargs): # w is the 'wrapper' function
        w.last=f(*args, **kwargs)
        return w.last
    return w
qid & accept id: (13715594, 13715925) query: Conditionally disable caching decorator based on instance variable soup:

You can use a decorator to call the right (cached or not) function checking the desired attribute:

\n
def conditional(decorator):\n\n    def conditional_decorator(fn):\n        dec = decorator(fn)\n        def wrapper(self, *args, **kw):\n             if self.read_only:\n                 return dec(self, *args, **kw)\n             return fn(self, *args, **kw)\n        return wrapper\n\n    return conditional_decorator\n
\n

Use like this:

\n
@conditional(cache_region('long_term'))\ndef get(self, arg):\n    return arg + 1\n
\n soup wrap:

You can use a decorator to call the right (cached or not) function checking the desired attribute:

def conditional(decorator):

    def conditional_decorator(fn):
        dec = decorator(fn)
        def wrapper(self, *args, **kw):
             if self.read_only:
                 return dec(self, *args, **kw)
             return fn(self, *args, **kw)
        return wrapper

    return conditional_decorator

Use like this:

@conditional(cache_region('long_term'))
def get(self, arg):
    return arg + 1
qid & accept id: (13728878, 13728917) query: Dumping multiple variables to disk in Json. One variable per line soup:

Set the indent option to 0 or more:

\n
with open(p_out, 'wb') as fp:\n    json.dump(my_dictionary, fp, indent=0)\n
\n

From the documentation:

\n
\n

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, or negative, will only insert newlines. None (the default) selects the most compact representation.

\n
\n

Your example would be output as:

\n
{\n"variable_2": "something_else", \n"variable_1": "something"\n}\n
\n soup wrap:

Set the indent option to 0 or more:

with open(p_out, 'wb') as fp:
    json.dump(my_dictionary, fp, indent=0)

From the documentation:

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, or negative, will only insert newlines. None (the default) selects the most compact representation.

Your example would be output as:

{
"variable_2": "something_else", 
"variable_1": "something"
}
qid & accept id: (13793694, 13793757) query: Can you make a function that would create several instances of a class for you in Python? soup:

You're very close! The usual approach is to use a dictionary, and to use the names you want as dictionary keys. For example:

\n
>>> class Room(object):\n...     def __init__(self, x, y):\n...         self.x = x\n...         self.y = y\n...         \n>>> rooms = {}\n>>> names = ['a', 'b', 'c', 'd']\n>>> locations = [[1,1], [1,2], [2,1], [2,2]]\n>>> for name, loc in zip(names, locations):\n...     rooms[name] = Room(*loc)\n...     \n>>> rooms\n{'a': <__main__.Room object at 0x8a0030c>, 'c': <__main__.Room object at 0x89b01cc>, 'b': <__main__.Room object at 0x89b074c>, 'd': <__main__.Room object at 0x89b02ec>}\n>>> rooms['c']\n<__main__.Room object at 0x89b01cc>\n>>> rooms['c'].x\n2\n>>> rooms['c'].y\n1\n
\n

This way you can iterate over the rooms in several ways, for example:

\n
>>> for roomname, room in rooms.items():\n...     print roomname, room.x, room.y\n...     \na 1 1\nc 2 1\nb 1 2\nd 2 2\n
\n soup wrap:

You're very close! The usual approach is to use a dictionary, and to use the names you want as dictionary keys. For example:

>>> class Room(object):
...     def __init__(self, x, y):
...         self.x = x
...         self.y = y
...         
>>> rooms = {}
>>> names = ['a', 'b', 'c', 'd']
>>> locations = [[1,1], [1,2], [2,1], [2,2]]
>>> for name, loc in zip(names, locations):
...     rooms[name] = Room(*loc)
...     
>>> rooms
{'a': <__main__.Room object at 0x8a0030c>, 'c': <__main__.Room object at 0x89b01cc>, 'b': <__main__.Room object at 0x89b074c>, 'd': <__main__.Room object at 0x89b02ec>}
>>> rooms['c']
<__main__.Room object at 0x89b01cc>
>>> rooms['c'].x
2
>>> rooms['c'].y
1

This way you can iterate over the rooms in several ways, for example:

>>> for roomname, room in rooms.items():
...     print roomname, room.x, room.y
...     
a 1 1
c 2 1
b 1 2
d 2 2
qid & accept id: (13794620, 13795426) query: calculate distance between two chains in PDB file soup:

Here's one way using GNU awk. Run like:

\n
awk -f script.awk file{,}\n
\n

Contents of script.awk

\n
NR==1 {\n    n = $5\n}\n\nFNR==NR && $5 != n {\n    a[c++]=$0\n}\n\nFNR!=NR && $5 == n {\n    for (i=0;i<=c-1;i++) {\n        split (a[i],b)\n        dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2)\n        if (dist >= 5) {\n            printf "%s-%s\t%.2f\n", $NF, b[NF], dist\n        }\n    }\n}\n
\n

Tab separated results:

\n
N-C 51.70\nN-O 52.83\nN-N 51.30\nC-C 51.14\nC-O 52.29\nC-N 50.71\nC-C 50.00\nC-O 51.14\nC-N 49.56\n
\n

Alternatively, here's the one-liner:

\n
awk 'NR==1 { n = $5 } FNR==NR && $5 != n { a[c++]=$0 } FNR!=NR && $5 == n { for (i=0;i<=c-1;i++) { split (a[i],b); dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2); if (dist >= 5) printf "%s-%s\t%.2f\n", $NF, b[NF], dist } }' file{,}\n
\n

So to perform this on multiple files in the present working directory, and assuming there's nothing but files of interest in this directory, you can wrap a for loop around the awk statement. Obviously, you'll need to change /path/to/folder/ to your path of choice for it to work correctly:

\n
for i in *; do awk 'NR==1 { n = $5 } FNR==NR && $5 != n { a[c++]=$0 } FNR!=NR && $5 == n { for (i=0;i<=c-1;i++) { split (a[i],b); dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2); if (dist >= 5) printf "%s-%s\t%.2f\n", $NF, b[NF], dist > "/path/to/folder/" FILENAME } }' "$i"{,}; done\n
\n soup wrap:

Here's one way using GNU awk. Run like:

awk -f script.awk file{,}

Contents of script.awk

NR==1 {
    n = $5
}

FNR==NR && $5 != n {
    a[c++]=$0
}

FNR!=NR && $5 == n {
    for (i=0;i<=c-1;i++) {
        split (a[i],b)
        dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2)
        if (dist >= 5) {
            printf "%s-%s\t%.2f\n", $NF, b[NF], dist
        }
    }
}

Tab separated results:

N-C 51.70
N-O 52.83
N-N 51.30
C-C 51.14
C-O 52.29
C-N 50.71
C-C 50.00
C-O 51.14
C-N 49.56

Alternatively, here's the one-liner:

awk 'NR==1 { n = $5 } FNR==NR && $5 != n { a[c++]=$0 } FNR!=NR && $5 == n { for (i=0;i<=c-1;i++) { split (a[i],b); dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2); if (dist >= 5) printf "%s-%s\t%.2f\n", $NF, b[NF], dist } }' file{,}

So to perform this on multiple files in the present working directory, and assuming there's nothing but files of interest in this directory, you can wrap a for loop around the awk statement. Obviously, you'll need to change /path/to/folder/ to your path of choice for it to work correctly:

for i in *; do awk 'NR==1 { n = $5 } FNR==NR && $5 != n { a[c++]=$0 } FNR!=NR && $5 == n { for (i=0;i<=c-1;i++) { split (a[i],b); dist = sqrt (($7-b[7])^2 + ($8-b[8])^2 + ($9-b[9])^2); if (dist >= 5) printf "%s-%s\t%.2f\n", $NF, b[NF], dist > "/path/to/folder/" FILENAME } }' "$i"{,}; done
qid & accept id: (13839905, 13840598) query: Adding information from one file to another, after a specific action soup:
import sys\ndef programs_info_comb(fileName1, fileName2):\n    my_file1 = open(fileName1, "r")\n    my_line1=my_file1.readlines()\n    my_file1.close()\n\n    my_file2 = open(fileName2, "r")\n    my_line2=my_file2.readlines() \n    my_file2.close()\n\n    # load file1 into a dict for lookup later\n    infoFor = dict()\n    for line1 in my_line1: \n        parts = line1.strip().split("\t")\n        infoFor[parts[0]] = parts[1:] \n\n    # iterate over line numbers to be able to refer previous line numbers\n    for line2 in range(len(my_line2)):\n        if my_line2[line2].startswith("# Q"):\n            name2 = my_line2[line2][9:-1]\n            # lookup\n            if infoFor.has_key(name2):\n                print '# ' + name2\n        for info in infoFor[name2]:\n                    print info\n            # print programinfo and query lines\n                print my_line2[line2-1],\n                print my_line2[line2],\n    # skip program info always\n        elif my_line2[line2].startswith("# ProgramInfo"):\n            pass\n    # otherwise just print as is\n        else:\n            print my_line2[line2],\n\nif __name__== "__main__":\n    programs_info_comb(sys.argv[1], sys.argv[2])\n
\n

I have loaded file1 into a dictionary for lookup later and sent the output to stdout. Before sending the output I've checked the type of line I'm on and output accordingly.

\n

Here's the o/p :-

\n
C:\>python st.py f1.txt f2.txt\n# IdName1 Info1 Info2 Info3\n#Info: from program1 for name1\n#Info: from program2 for name1\n# ProgramInfo\n# Query: IdName1 Info1 Info2 Info3\n# DatabaseInfo\n# FiledInfo\nline1\nline2\n# IdName2 Info1 Info2 Info3\n#Info: from program1 for name2\n#Info: from program2 for name2\n# ProgramInfo\n# Query: IdName2 Info1 Info2 Info3\n# DatabaseInfo\n# FiledInfo\n# IdName4 Info1 Info2 Info3\n#Info: from program1 for name4\n# ProgramInfo\n# Query: IdName4 Info1 Info2 Info3\n# DatabaseInfo\n# FiledInfo\nline1\nline2\nline3\nline4\n
\n soup wrap:
import sys
def programs_info_comb(fileName1, fileName2):
    my_file1 = open(fileName1, "r")
    my_line1=my_file1.readlines()
    my_file1.close()

    my_file2 = open(fileName2, "r")
    my_line2=my_file2.readlines() 
    my_file2.close()

    # load file1 into a dict for lookup later
    infoFor = dict()
    for line1 in my_line1: 
        parts = line1.strip().split("\t")
        infoFor[parts[0]] = parts[1:] 

    # iterate over line numbers to be able to refer previous line numbers
    for line2 in range(len(my_line2)):
        if my_line2[line2].startswith("# Q"):
            name2 = my_line2[line2][9:-1]
            # lookup
            if infoFor.has_key(name2):
                print '# ' + name2
        for info in infoFor[name2]:
                    print info
            # print programinfo and query lines
                print my_line2[line2-1],
                print my_line2[line2],
    # skip program info always
        elif my_line2[line2].startswith("# ProgramInfo"):
            pass
    # otherwise just print as is
        else:
            print my_line2[line2],

if __name__== "__main__":
    programs_info_comb(sys.argv[1], sys.argv[2])

I have loaded file1 into a dictionary for lookup later and sent the output to stdout. Before sending the output I've checked the type of line I'm on and output accordingly.

Here's the o/p :-

C:\>python st.py f1.txt f2.txt
# IdName1 Info1 Info2 Info3
#Info: from program1 for name1
#Info: from program2 for name1
# ProgramInfo
# Query: IdName1 Info1 Info2 Info3
# DatabaseInfo
# FiledInfo
line1
line2
# IdName2 Info1 Info2 Info3
#Info: from program1 for name2
#Info: from program2 for name2
# ProgramInfo
# Query: IdName2 Info1 Info2 Info3
# DatabaseInfo
# FiledInfo
# IdName4 Info1 Info2 Info3
#Info: from program1 for name4
# ProgramInfo
# Query: IdName4 Info1 Info2 Info3
# DatabaseInfo
# FiledInfo
line1
line2
line3
line4
qid & accept id: (13882808, 13891631) query: GIMP Python-fu nested group layers soup:

The support to layer groups on Python-fu was added on the last minutes before 2.8 release, and is rather incomplete.

\n

So, the only way to create a proper layer group in GIMP 2.8 is to use the pdb call:

\n
group = pdb.gimp_layer_group_new(img)\ngroup.name = "my group"\n
\n

(Using the img.GroupLayer call is buggy on gimp 2.8 - should be the way to go in the future)

\n

Once you have your group, you can insert it anywhere on the image using a

\n
pdb.gimp_image_insert_layer(, , , )\n
\n

Like in:

\n
>>> img = gimp.Image(640, 480, RGB)\n>>> pdb.gimp_display_new(img)\n\n>>> parent_group = pdb.gimp_layer_group_new(img)\n>>> child_group_1 = pdb.gimp_layer_group_new(img)\n>>> child_group_2 = pdb.gimp_layer_group_new(img)\n>>> grand_child_group = pdb.gimp_layer_group_new(img)\n>>> img.add_layer(parent_group, 0)\n>>> pdb.gimp_image_insert_layer(img, child_group_1, parent_group,0)\n>>> pdb.gimp_image_insert_layer(img, child_group_2, parent_group,1)\n>>> pdb.gimp_image_insert_layer(img, grand_child_group, child_group_1,0)\n>>> l1 = gimp.Layer(img, "test", 320,240)\n>>> pdb.gimp_image_insert_layer(img,l1, grand_child_group,0)\n
\n

So, indeed, there is this extreme API asymmetry, in which you add layers and groups to the image through an "add_layer" method on the parent, but have to add either to a layer group,\nyou have to go troguh the pdb.gimp_image_insert_layer call.

\n

update (Feb/2015) - The bug for gimp.GroupLayer() is fixed in GIMP's git and it will work properly from GIMP 2.8.16 onward. Now all one has to do to add a new group layer is:

\n
>>> g = gimp.GroupLayer(img)\n>>> pdb.gimp_image_insert_layer(img, g, None, 0)\n
\n soup wrap:

The support to layer groups on Python-fu was added on the last minutes before 2.8 release, and is rather incomplete.

So, the only way to create a proper layer group in GIMP 2.8 is to use the pdb call:

group = pdb.gimp_layer_group_new(img)
group.name = "my group"

(Using the img.GroupLayer call is buggy on gimp 2.8 - should be the way to go in the future)

Once you have your group, you can insert it anywhere on the image using a

pdb.gimp_image_insert_layer(, , , )

Like in:

>>> img = gimp.Image(640, 480, RGB)
>>> pdb.gimp_display_new(img)

>>> parent_group = pdb.gimp_layer_group_new(img)
>>> child_group_1 = pdb.gimp_layer_group_new(img)
>>> child_group_2 = pdb.gimp_layer_group_new(img)
>>> grand_child_group = pdb.gimp_layer_group_new(img)
>>> img.add_layer(parent_group, 0)
>>> pdb.gimp_image_insert_layer(img, child_group_1, parent_group,0)
>>> pdb.gimp_image_insert_layer(img, child_group_2, parent_group,1)
>>> pdb.gimp_image_insert_layer(img, grand_child_group, child_group_1,0)
>>> l1 = gimp.Layer(img, "test", 320,240)
>>> pdb.gimp_image_insert_layer(img,l1, grand_child_group,0)

So, indeed, there is this extreme API asymmetry, in which you add layers and groups to the image through an "add_layer" method on the parent, but have to add either to a layer group, you have to go troguh the pdb.gimp_image_insert_layer call.

update (Feb/2015) - The bug for gimp.GroupLayer() is fixed in GIMP's git and it will work properly from GIMP 2.8.16 onward. Now all one has to do to add a new group layer is:

>>> g = gimp.GroupLayer(img)
>>> pdb.gimp_image_insert_layer(img, g, None, 0)
qid & accept id: (13913530, 13913585) query: Python regular expression to search for words in a sentence soup:

Use the union operator | to search for all the words you need to find:

\n
In [20]: re_pattern = r'\b(?:total|staff)\b'\n\nIn [21]: re.findall(re_pattern, question)\nOut[21]: ['total', 'staff']\n
\n

This matches your example above most closely. However, this approach only works if there are no other characters which have been prepended or appended to a word. This is often the case at the end of main and subordinate clauses in which a comma, a dot, an exclamation mark or a question mark are appended to the last word of the clause.

\n

For example, in the question How many people are in your staff? the approach above wouldn't find the word staff because there is no word boundary at the end of staff. Instead, there is a question mark. But if you leave out the second \b at the end of the regular expression above, the expression would wrongly detect words in substrings, such as total in totally or totalities.

\n

The best way to accomplish what you want is to extract all alphanumeric characters in your sentence first and then search this list for the words you need to find:

\n
In [51]: def find_all_words(words, sentence):\n....:     all_words = re.findall(r'\w+', sentence)\n....:     words_found = []\n....:     for word in words:\n....:         if word in all_words:\n....:             words_found.append(word)\n....:     return words_found\n\nIn [52]: print find_all_words(['total', 'staff'], 'The total number of staff in 30?')\n['total', 'staff'] \n\nIn [53]: print find_all_words(['total', 'staff'], 'My staff is totally overworked.')\n['staff']\n
\n soup wrap:

Use the union operator | to search for all the words you need to find:

In [20]: re_pattern = r'\b(?:total|staff)\b'

In [21]: re.findall(re_pattern, question)
Out[21]: ['total', 'staff']

This matches your example above most closely. However, this approach only works if there are no other characters which have been prepended or appended to a word. This is often the case at the end of main and subordinate clauses in which a comma, a dot, an exclamation mark or a question mark are appended to the last word of the clause.

For example, in the question How many people are in your staff? the approach above wouldn't find the word staff because there is no word boundary at the end of staff. Instead, there is a question mark. But if you leave out the second \b at the end of the regular expression above, the expression would wrongly detect words in substrings, such as total in totally or totalities.

The best way to accomplish what you want is to extract all alphanumeric characters in your sentence first and then search this list for the words you need to find:

In [51]: def find_all_words(words, sentence):
....:     all_words = re.findall(r'\w+', sentence)
....:     words_found = []
....:     for word in words:
....:         if word in all_words:
....:             words_found.append(word)
....:     return words_found

In [52]: print find_all_words(['total', 'staff'], 'The total number of staff in 30?')
['total', 'staff'] 

In [53]: print find_all_words(['total', 'staff'], 'My staff is totally overworked.')
['staff']
qid & accept id: (13920235, 13920256) query: Splitting a list of sequences into two lists efficiently soup:
>>> catalog = [('abc', '123'), ('foo', '456'), ('bar', '789'), ('test', '1337')]\n>>> names, vals = zip(*catalog)\n>>> names\n('abc', 'foo', 'bar', 'test')\n>>> vals\n('123', '456', '789', '1337')\n
\n

The *catalog syntax here is called Unpacking Argument Lists, and zip(*catalog) translates into the call zip(catalog[0], catalog[1], catalog[2], ...).

\n

The zip() builtin function groups iterables by indices, so when you pass a bunch of two-element tuples as above, you get a two-element list of tuples where the first tuple contains the first element of each tuple from catalog, and the second tuple contains the second element from each tuple from catalog.

\n

In a quick timeit test the zip() version outperforms a looping approach when I tested with 1,000,000 pairs:

\n
In [1]: catalog = [(i, i+1) for i in range(1000000)]\n\nIn [2]: def with_zip():\n   ...:     return zip(*catalog)\n   ...: \n\nIn [3]: def without_zip():\n   ...:     names, vals = [], []\n   ...:     for name, val in catalog:\n   ...:         names.append(name)\n   ...:         vals.append(val)\n   ...:     return names, vals\n   ...: \n\nIn [4]: %timeit with_zip()\n1 loops, best of 3: 176 ms per loop\n\nIn [5]: %timeit without_zip()\n1 loops, best of 3: 250 ms per loop\n
\n soup wrap:
>>> catalog = [('abc', '123'), ('foo', '456'), ('bar', '789'), ('test', '1337')]
>>> names, vals = zip(*catalog)
>>> names
('abc', 'foo', 'bar', 'test')
>>> vals
('123', '456', '789', '1337')

The *catalog syntax here is called Unpacking Argument Lists, and zip(*catalog) translates into the call zip(catalog[0], catalog[1], catalog[2], ...).

The zip() builtin function groups iterables by indices, so when you pass a bunch of two-element tuples as above, you get a two-element list of tuples where the first tuple contains the first element of each tuple from catalog, and the second tuple contains the second element from each tuple from catalog.

In a quick timeit test the zip() version outperforms a looping approach when I tested with 1,000,000 pairs:

In [1]: catalog = [(i, i+1) for i in range(1000000)]

In [2]: def with_zip():
   ...:     return zip(*catalog)
   ...: 

In [3]: def without_zip():
   ...:     names, vals = [], []
   ...:     for name, val in catalog:
   ...:         names.append(name)
   ...:         vals.append(val)
   ...:     return names, vals
   ...: 

In [4]: %timeit with_zip()
1 loops, best of 3: 176 ms per loop

In [5]: %timeit without_zip()
1 loops, best of 3: 250 ms per loop
qid & accept id: (13936068, 13936747) query: Passing binary data from Python to C API extension soup:

Ok, I figured out a with the help of this link.

\n

I used a PyByteArrayObject (docs here) like this:

\n
from authbind import authenticate\n\ncreds = 'foo\x00bar\x00'\nauthenticate(bytearray(creds))\n
\n

And then in the extension code:

\n
static PyObject* authenticate(PyObject *self, PyObject *args) {\n\n    PyByteArrayObject *creds;\n\n    if (!PyArg_ParseTuple(args, "O", &creds))\n        return NULL;\n\n    char* credsCopy;\n    credsCopy = PyByteArray_AsString((PyObject*) creds);\n}\n
\n

credsCopy now holds the string of bytes, exactly as they are needed.

\n soup wrap:

Ok, I figured out a with the help of this link.

I used a PyByteArrayObject (docs here) like this:

from authbind import authenticate

creds = 'foo\x00bar\x00'
authenticate(bytearray(creds))

And then in the extension code:

static PyObject* authenticate(PyObject *self, PyObject *args) {

    PyByteArrayObject *creds;

    if (!PyArg_ParseTuple(args, "O", &creds))
        return NULL;

    char* credsCopy;
    credsCopy = PyByteArray_AsString((PyObject*) creds);
}

credsCopy now holds the string of bytes, exactly as they are needed.

qid & accept id: (13953964, 14216278) query: SCons to generate variable number of targets soup:

The only way I found I can do it is with emitter.\nBelow example consists of 3 files:

\n
./\n|-SConstruct\n|-src/\n| |-SConscript\n| |-source.txt\n|-build/\n
\n

SConstruct

\n
env = Environment()\n\ndirname = 'build'\nVariantDir(dirname, 'src', duplicate=0)\n\nExport('env')\n\nSConscript(dirname+'/SConscript')\n
\n

src/SConscript

\n
Import('env')\n\ndef my_emitter( env, target, source ):\n    data = str(source[0])\n    target = []\n    with open( data, 'r' ) as lines:\n        for line in lines:\n           line = line.strip()\n           name, contents = line.split(' ', 1)\n           if not name: continue\n\n           generated_source  = env.Command( name, [], 'echo "{0}" > $TARGET'.format(contents) )\n           source.extend( generated_source )\n           target.append( name+'.c' )\n\n    return target, source\n\ndef my_action( env, target, source ):\n    for t,s in zip(target, source[1:]):\n        with open(t.abspath, 'w') as tf:\n            with open(s.abspath, 'r') as sf:\n                tf.write( sf.read() )\n\nSourcesGenerator = env.Builder( action = my_action, emitter = my_emitter )\ngenerated_sources = SourcesGenerator( env, source = 'source.txt' )\n\nlib = env.Library( 'functions', generated_sources )\n
\n

src/source.txt

\n
a int a(){}\nb int b(){}\nc int c(){}\nd int d(){}\ng int g(){}\n
\n

Output:

\n
$ scons\nscons: Reading SConscript files ...\nscons: done reading SConscript files.\nscons: Building targets ...\necho "int a(){}" > build/a\necho "int b(){}" > build/b\necho "int c(){}" > build/c\necho "int d(){}" > build/d\necho "int g(){}" > build/g\nmy_action(["build/a.c", "build/b.c", "build/c.c", "build/d.c", "build/g.c"], ["src/source.txt", "build/a", "build/b", "build/c", "build/d", "build/g"])\ngcc -o build/a.o -c build/a.c\ngcc -o build/b.o -c build/b.c\ngcc -o build/c.o -c build/c.c\ngcc -o build/d.o -c build/d.c\ngcc -o build/g.o -c build/g.c\nar rc build/libfunctions.a build/a.o build/b.o build/c.o build/d.o build/g.o\nranlib build/libfunctions.a\nscons: done building targets.\n
\n

Also this has one thing I don't really like, which is parsing of headers_list.txt with each scons execution. I feel like there should be a way to parse it only if the file changed. I could cache it by hand, but I still hope there is some trick to make SCons handle that caching for me.

\n

And I couldn't find a way to not duplicate files (a and a.c being the same).\nOne way would be to simply generate library in my_action instead of sources (which is approach I used in my final solution).

\n soup wrap:

The only way I found I can do it is with emitter. Below example consists of 3 files:

./
|-SConstruct
|-src/
| |-SConscript
| |-source.txt
|-build/

SConstruct

env = Environment()

dirname = 'build'
VariantDir(dirname, 'src', duplicate=0)

Export('env')

SConscript(dirname+'/SConscript')

src/SConscript

Import('env')

def my_emitter( env, target, source ):
    data = str(source[0])
    target = []
    with open( data, 'r' ) as lines:
        for line in lines:
           line = line.strip()
           name, contents = line.split(' ', 1)
           if not name: continue

           generated_source  = env.Command( name, [], 'echo "{0}" > $TARGET'.format(contents) )
           source.extend( generated_source )
           target.append( name+'.c' )

    return target, source

def my_action( env, target, source ):
    for t,s in zip(target, source[1:]):
        with open(t.abspath, 'w') as tf:
            with open(s.abspath, 'r') as sf:
                tf.write( sf.read() )

SourcesGenerator = env.Builder( action = my_action, emitter = my_emitter )
generated_sources = SourcesGenerator( env, source = 'source.txt' )

lib = env.Library( 'functions', generated_sources )

src/source.txt

a int a(){}
b int b(){}
c int c(){}
d int d(){}
g int g(){}

Output:

$ scons
scons: Reading SConscript files ...
scons: done reading SConscript files.
scons: Building targets ...
echo "int a(){}" > build/a
echo "int b(){}" > build/b
echo "int c(){}" > build/c
echo "int d(){}" > build/d
echo "int g(){}" > build/g
my_action(["build/a.c", "build/b.c", "build/c.c", "build/d.c", "build/g.c"], ["src/source.txt", "build/a", "build/b", "build/c", "build/d", "build/g"])
gcc -o build/a.o -c build/a.c
gcc -o build/b.o -c build/b.c
gcc -o build/c.o -c build/c.c
gcc -o build/d.o -c build/d.c
gcc -o build/g.o -c build/g.c
ar rc build/libfunctions.a build/a.o build/b.o build/c.o build/d.o build/g.o
ranlib build/libfunctions.a
scons: done building targets.

Also this has one thing I don't really like, which is parsing of headers_list.txt with each scons execution. I feel like there should be a way to parse it only if the file changed. I could cache it by hand, but I still hope there is some trick to make SCons handle that caching for me.

And I couldn't find a way to not duplicate files (a and a.c being the same). One way would be to simply generate library in my_action instead of sources (which is approach I used in my final solution).

qid & accept id: (13983620, 13983651) query: What is a Pythonic way to count dictionary values in list of dictionaries soup:

collections.Counter

\n
>>> from collections import Counter\n>>> c = Counter([thing['count'] for thing in things])\n>>> c[1]               # Number of elements with count==1\n100\n>>> c[2]               # Number of elements with count==2\n100\n>>> c.most_common()    # Most common elements\n[(1, 100), (2, 100)]\n>>> sum(c.values())    # Number of elements\n200\n>>> list(c)            # List of unique counts\n[1, 2]\n>>> dict(c)            # Converted to a dict \n{1: 100, 2: 100}\n
\n
\n

Perhaps you could do something like this?

\n
class DictCounter(object):\n    def __init__(self, list_of_ds):\n        for k,v in list_of_ds[0].items():\n            self.__dict__[k] = collections.Counter([d[k] for d in list_of_ds])\n\n>>> new_things = [{'test': 1, 'count': 1} for i in range(10)]\n>>> for i in new_things[0:5]: i['count']=2\n\n>>> d = DictCounter(new_things)\n>>> d.count\nCounter({1: 5, 2: 5})\n>>> d.test\nCounter({1: 10})\n
\n

Extended DictCounter to handle missing keys:

\n
>>> class DictCounter(object):\n    def __init__(self, list_of_ds):\n        keys = set(itertools.chain(*(i.keys() for i in list_of_ds)))\n        for k in keys:\n            self.__dict__[k] = collections.Counter([d.get(k) for d in list_of_ds])\n\n>>> a = [{'test': 5, 'count': 4}, {'test': 3, 'other': 5}, {'test':3}, {'test':5}]\n>>> d = DictCounter(a)\n>>> d.test\nCounter({3: 2, 5: 2})\n>>> d.count\nCounter({None: 3, 4: 1})\n>>> d.other\nCounter({None: 3, 5: 1})\n
\n soup wrap:

collections.Counter

>>> from collections import Counter
>>> c = Counter([thing['count'] for thing in things])
>>> c[1]               # Number of elements with count==1
100
>>> c[2]               # Number of elements with count==2
100
>>> c.most_common()    # Most common elements
[(1, 100), (2, 100)]
>>> sum(c.values())    # Number of elements
200
>>> list(c)            # List of unique counts
[1, 2]
>>> dict(c)            # Converted to a dict 
{1: 100, 2: 100}

Perhaps you could do something like this?

class DictCounter(object):
    def __init__(self, list_of_ds):
        for k,v in list_of_ds[0].items():
            self.__dict__[k] = collections.Counter([d[k] for d in list_of_ds])

>>> new_things = [{'test': 1, 'count': 1} for i in range(10)]
>>> for i in new_things[0:5]: i['count']=2

>>> d = DictCounter(new_things)
>>> d.count
Counter({1: 5, 2: 5})
>>> d.test
Counter({1: 10})

Extended DictCounter to handle missing keys:

>>> class DictCounter(object):
    def __init__(self, list_of_ds):
        keys = set(itertools.chain(*(i.keys() for i in list_of_ds)))
        for k in keys:
            self.__dict__[k] = collections.Counter([d.get(k) for d in list_of_ds])

>>> a = [{'test': 5, 'count': 4}, {'test': 3, 'other': 5}, {'test':3}, {'test':5}]
>>> d = DictCounter(a)
>>> d.test
Counter({3: 2, 5: 2})
>>> d.count
Counter({None: 3, 4: 1})
>>> d.other
Counter({None: 3, 5: 1})
qid & accept id: (14001382, 14001504) query: Regex Parse Email Python soup:

See if this works for you, the lines that you want start with digits followed by a plus sign:

\n
^[0-9]*\+.*$\n
\n

This will match the expected output:

\n
\*{3}[^\*]*(?:(?=\*{3})|(?=^-*$))\n
\n
\n
    \n
  1. ^ Matches the beginning of the string.
  2. \n
  3. [0-9] Matches any single character in the range 0-9.
  4. \n
  5. * Matches 0 or more of the preceeding token. This is a greedy match, and will match as many characters as possible before satisfying the next token.
  6. \n
  7. \+ Matches a + character.
  8. \n
  9. . Matches any character.
  10. \n
  11. $ Matches the end of the string.
  12. \n
\n
\n
#!/usr/bin/env python\n#-*- coding:utf-8 -*-\nimport re\nwith open("/path/to/file", "r") as fileInput:\n    listLines = [   line.strip()\n                    for line in fileInput.readlines()\n                    if re.match("^[0-9]*\+.*$", line)\n                    ] \n\n\nfor line in listLines:\n    print line\n\n>>> 10+BB {MYXV ABC 4116    SM  MYXV YA 102-15 } | 2010/11 4.0s             4.0s\n>>> 6+ BB {MYXV ABC 4132    NS  MYXV YT 102-22 } | 2010 4.5s                4.5s\n>>> 10+BB  {NXTW VXA 4061   SL  MYXV YA 103-22 } | 11 wala 3.5s             3.5s\n>>> 10+BB  {NXTW VXA 12-47  SP  MYXV YA 106-20 } | 22 wala 4.0s             4.0s\n
\n

Updated to meet new requirements:

\n
#!/usr/bin/env python\n#-*- coding:utf-8 -*-\nimport re\nwith open("/path/to/file", "r") as fileInput:\n    regex = re.compile(r"\*{3}[^\*]*?(?:(?=^-*$)|(?=\*))", re.MULTILINE)\n\n    listMsg = [ [   line.strip()\n                    for line in message.split("\n")\n                    if not line.startswith("*") and line.strip()\n                    ]\n                for message in regex.findall(fileInput.read())\n                ]\n\n>>> 10+BB {MYXV ABC 4116    SM  MYXV YA 102-15 } | 2010/11 4.0s             4.0s\n>>> 6+ BB {MYXV ABC 4132    NS  MYXV YT 102-22 } | 2010 4.5s                4.5s\n>>> ABO 2006-OP1 M1     00442PAG5     19-24      p5\n>>> 10+BB  {NXTW VXA 4061   SL  MYXV YA 103-22 } | 11 wala 3.5s             3.5s\n>>> 10+BB  {NXTW VXA 12-47  SP  MYXV YA 106-20 } | 22 wala 4.0s             4.0s\n
\n

Updated to extract the whole body of the email:

\n
#!/usr/bin/env python\n#-*- coding:utf-8 -*-\nimport re\nwith open("/path/to/file", "r") as fileInput:\n    regex = re.compile(r"(?<=^At:)([^\n\r]*)(.*?)(?=^-*-$)", re.MULTILINE|re.DOTALL)\n\n    print regex.search(fileInput.read()).groups()[1]\n\n>>> ACE 2006-OP1 ZZ 111111111 19-24 Z5 ZZW 2012-0P1 SD 222222222 77-00 150\n>>> ***NEW ISSUE SUPPORTED THROUGH UNIVERSALITY   vs 104-13 on AY 3s JAN   \n>>> 10+BB {MYXV ABC 4116    SM  MYXV YA 102-15 } | 2010/11 4.0s             4.0s\n>>> 6+ BB {MYXV ABC 4132    NS  MYXV YT 102-22 } | 2010 4.5s                4.5s\n>>> ABO 2006-OP1 M1     00442PAG5     19-24      p5 \n>>> ***SECOND SUPPORTED TRHOUGH INVERSALITY GEVINGS                      \n>>> 10+BB  {NXTW VXA 4061   SL  MYXV YA 103-22 } | 11 wala 3.5s             3.5s\n>>> 10+BB  {NXTW VXA 12-47  SP  MYXV YA 106-20 } | 22 wala 4.0s             4.0s\n
\n soup wrap:

See if this works for you, the lines that you want start with digits followed by a plus sign:

^[0-9]*\+.*$

This will match the expected output:

\*{3}[^\*]*(?:(?=\*{3})|(?=^-*$))
  1. ^ Matches the beginning of the string.
  2. [0-9] Matches any single character in the range 0-9.
  3. * Matches 0 or more of the preceeding token. This is a greedy match, and will match as many characters as possible before satisfying the next token.
  4. \+ Matches a + character.
  5. . Matches any character.
  6. $ Matches the end of the string.
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import re
with open("/path/to/file", "r") as fileInput:
    listLines = [   line.strip()
                    for line in fileInput.readlines()
                    if re.match("^[0-9]*\+.*$", line)
                    ] 


for line in listLines:
    print line

>>> 10+BB {MYXV ABC 4116    SM  MYXV YA 102-15 } | 2010/11 4.0s             4.0s
>>> 6+ BB {MYXV ABC 4132    NS  MYXV YT 102-22 } | 2010 4.5s                4.5s
>>> 10+BB  {NXTW VXA 4061   SL  MYXV YA 103-22 } | 11 wala 3.5s             3.5s
>>> 10+BB  {NXTW VXA 12-47  SP  MYXV YA 106-20 } | 22 wala 4.0s             4.0s

Updated to meet new requirements:

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import re
with open("/path/to/file", "r") as fileInput:
    regex = re.compile(r"\*{3}[^\*]*?(?:(?=^-*$)|(?=\*))", re.MULTILINE)

    listMsg = [ [   line.strip()
                    for line in message.split("\n")
                    if not line.startswith("*") and line.strip()
                    ]
                for message in regex.findall(fileInput.read())
                ]

>>> 10+BB {MYXV ABC 4116    SM  MYXV YA 102-15 } | 2010/11 4.0s             4.0s
>>> 6+ BB {MYXV ABC 4132    NS  MYXV YT 102-22 } | 2010 4.5s                4.5s
>>> ABO 2006-OP1 M1     00442PAG5     19-24      p5
>>> 10+BB  {NXTW VXA 4061   SL  MYXV YA 103-22 } | 11 wala 3.5s             3.5s
>>> 10+BB  {NXTW VXA 12-47  SP  MYXV YA 106-20 } | 22 wala 4.0s             4.0s

Updated to extract the whole body of the email:

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import re
with open("/path/to/file", "r") as fileInput:
    regex = re.compile(r"(?<=^At:)([^\n\r]*)(.*?)(?=^-*-$)", re.MULTILINE|re.DOTALL)

    print regex.search(fileInput.read()).groups()[1]

>>> ACE 2006-OP1 ZZ 111111111 19-24 Z5 ZZW 2012-0P1 SD 222222222 77-00 150
>>> ***NEW ISSUE SUPPORTED THROUGH UNIVERSALITY   vs 104-13 on AY 3s JAN   
>>> 10+BB {MYXV ABC 4116    SM  MYXV YA 102-15 } | 2010/11 4.0s             4.0s
>>> 6+ BB {MYXV ABC 4132    NS  MYXV YT 102-22 } | 2010 4.5s                4.5s
>>> ABO 2006-OP1 M1     00442PAG5     19-24      p5 
>>> ***SECOND SUPPORTED TRHOUGH INVERSALITY GEVINGS                      
>>> 10+BB  {NXTW VXA 4061   SL  MYXV YA 103-22 } | 11 wala 3.5s             3.5s
>>> 10+BB  {NXTW VXA 12-47  SP  MYXV YA 106-20 } | 22 wala 4.0s             4.0s
qid & accept id: (14014292, 14014352) query: Python inheritance - going from base class to derived one soup:

Perhaps you are looking for the __subclasses__ method:

\n
class Alpha(object):\n    @classmethod\n    def get_derivatives(cls):\n        return cls.__subclasses__() \n\nclass Beta(Alpha):\n    pass\n\nprint(Alpha.get_derivatives())\nprint(Beta.get_derivatives())\n
\n

yields

\n
[]\n[]\n
\n soup wrap:

Perhaps you are looking for the __subclasses__ method:

class Alpha(object):
    @classmethod
    def get_derivatives(cls):
        return cls.__subclasses__() 

class Beta(Alpha):
    pass

print(Alpha.get_derivatives())
print(Beta.get_derivatives())

yields

[]
[]
qid & accept id: (14017199, 14017213) query: Python create instance from list of classes soup:

Just store the class itself:

\n
class_register[self.__class__.__name__] = self.__class__\n
\n

But this is a bit of overkill, since you are registering the class every time you instantiate it.

\n

Better is to use this:

\n
def register(cls):\n    class_register[cls.__name__] = cls\n\nclass Foo(object):\n    # blah blah\n\nregister(Foo)\n
\n

And then you can turn this into a class decorator to use like this:

\n
def register(cls):\n    class_register[cls.__name__] = cls\n    return cls\n\n@register\nclass Foo(object):\n    # blah blah\n
\n soup wrap:

Just store the class itself:

class_register[self.__class__.__name__] = self.__class__

But this is a bit of overkill, since you are registering the class every time you instantiate it.

Better is to use this:

def register(cls):
    class_register[cls.__name__] = cls

class Foo(object):
    # blah blah

register(Foo)

And then you can turn this into a class decorator to use like this:

def register(cls):
    class_register[cls.__name__] = cls
    return cls

@register
class Foo(object):
    # blah blah
qid & accept id: (14047979, 14048046) query: executing Python script in PHP and exchanging data between the two soup:

You can generally communicate between languages by using common language formats, and using stdin and stdout to communicate the data.

\n

Example with PHP/Python using a shell argument to send the initial data via JSON

\n

PHP:

\n
// This is the data you want to pass to Python\n$data = array('as', 'df', 'gh');\n\n// Execute the python script with the JSON data\n$result = shell_exec('python /path/to/myScript.py ' . escapeshellarg(json_encode($data)));\n\n// Decode the result\n$resultData = json_decode($result, true);\n\n// This will contain: array('status' => 'Yes!')\nvar_dump($resultData);\n
\n

Python:

\n
import sys, json\n\n# Load the data that PHP sent us\ntry:\n    data = json.loads(sys.argv[1])\nexcept:\n    print "ERROR"\n    sys.exit(1)\n\n# Generate some data to send to PHP\nresult = {'status': 'Yes!'}\n\n# Send it to stdout (to PHP)\nprint json.dumps(result)\n
\n soup wrap:

You can generally communicate between languages by using common language formats, and using stdin and stdout to communicate the data.

Example with PHP/Python using a shell argument to send the initial data via JSON

PHP:

// This is the data you want to pass to Python
$data = array('as', 'df', 'gh');

// Execute the python script with the JSON data
$result = shell_exec('python /path/to/myScript.py ' . escapeshellarg(json_encode($data)));

// Decode the result
$resultData = json_decode($result, true);

// This will contain: array('status' => 'Yes!')
var_dump($resultData);

Python:

import sys, json

# Load the data that PHP sent us
try:
    data = json.loads(sys.argv[1])
except:
    print "ERROR"
    sys.exit(1)

# Generate some data to send to PHP
result = {'status': 'Yes!'}

# Send it to stdout (to PHP)
print json.dumps(result)
qid & accept id: (14076207, 22894683) query: Simulating a key press event in Python 2.7 soup:

I wrote this code more than 1 year ago so it is not perfect but it works:

\n
from win32api import keybd_event\nimport time\nimport random\n\n\nCombs = {\n    'A': [\n        'SHIFT',\n        'a'],\n    'B': [\n        'SHIFT',\n        'b'],\n    'C': [\n        'SHIFT',\n        'c'],\n    'D': [\n        'SHIFT',\n        'd'],\n    'E': [\n        'SHIFT',\n        'e'],\n    'F': [\n        'SHIFT',\n        'f'],\n    'G': [\n        'SHIFT',\n        'g'],\n    'H': [\n        'SHIFT',\n        'h'],\n    'I': [\n        'SHIFT',\n        'i'],\n    'J': [\n        'SHIFT',\n        'j'],\n    'K': [\n        'SHIFT',\n        'k'],\n    'L': [\n        'SHIFT',\n        'l'],\n    'M': [\n        'SHIFT',\n        'm'],\n    'N': [\n        'SHIFT',\n        'n'],\n    'O': [\n        'SHIFT',\n        'o'],\n    'P': [\n        'SHIFT',\n        'p'],\n    'R': [\n        'SHIFT',\n        'r'],\n    'S': [\n        'SHIFT',\n        's'],\n    'T': [\n        'SHIFT',\n        't'],\n    'U': [\n        'SHIFT',\n        'u'],\n    'W': [\n        'SHIFT',\n        'w'],\n    'X': [\n        'SHIFT',\n        'x'],\n    'Y': [\n        'SHIFT',\n        'y'],\n    'Z': [\n        'SHIFT',\n        'z'],\n    'V': [\n        'SHIFT',\n        'v'],\n    'Q': [\n        'SHIFT',\n        'q'],\n    '?': [\n        'SHIFT',\n        '/'],\n    '>': [\n        'SHIFT',\n        '.'],\n    '<': [\n        'SHIFT',\n        ','],\n    '"': [\n        'SHIFT',\n        "'"],\n    ':': [\n        'SHIFT',\n        ';'],\n    '|': [\n        'SHIFT',\n        '\\'],\n    '}': [\n        'SHIFT',\n        ']'],\n    '{': [\n        'SHIFT',\n        '['],\n    '+': [\n        'SHIFT',\n        '='],\n    '_': [\n        'SHIFT',\n        '-'],\n    '!': [\n        'SHIFT',\n        '1'],\n    '@': [\n        'SHIFT',\n        '2'],\n    '#': [\n        'SHIFT',\n        '3'],\n    '$': [\n        'SHIFT',\n        '4'],\n    '%': [\n        'SHIFT',\n        '5'],\n    '^': [\n        'SHIFT',\n        '6'],\n    '&': [\n        'SHIFT',\n        '7'],\n    '*': [\n        'SHIFT',\n        '8'],\n    '(': [\n        'SHIFT',\n        '9'],\n    ')': [\n        'SHIFT',\n        '0'] }\nBase = {\n    '0': 48,\n    '1': 49,\n    '2': 50,\n    '3': 51,\n    '4': 52,\n    '5': 53,\n    '6': 54,\n    '7': 55,\n    '8': 56,\n    '9': 57,\n    'a': 65,\n    'b': 66,\n    'c': 67,\n    'd': 68,\n    'e': 69,\n    'f': 70,\n    'g': 71,\n    'h': 72,\n    'i': 73,\n    'j': 74,\n    'k': 75,\n    'l': 76,\n    'm': 77,\n    'n': 78,\n    'o': 79,\n    'p': 80,\n    'q': 81,\n    'r': 82,\n    's': 83,\n    't': 84,\n    'u': 85,\n    'v': 86,\n    'w': 87,\n    'x': 88,\n    'y': 89,\n    'z': 90,\n    '.': 190,\n    '-': 189,\n    ',': 188,\n    '=': 187,\n    '/': 191,\n    ';': 186,\n    '[': 219,\n    ']': 221,\n    '\\': 220,\n    "'": 222,\n    'ALT': 18,\n    'TAB': 9,\n    'CAPSLOCK': 20,\n    'ENTER': 13,\n    'BS': 8,\n    'CTRL': 17,\n    'ESC': 27,\n    ' ': 32,\n    'END': 35,\n    'DOWN': 40,\n    'LEFT': 37,\n    'UP': 38,\n    'RIGHT': 39,\n    'SELECT': 41,\n    'PRINTSCR': 44,\n    'INS': 45,\n    'DEL': 46,\n    'LWIN': 91,\n    'RWIN': 92,\n    'LSHIFT': 160,\n    'SHIFT': 161,\n    'LCTRL': 162,\n    'RCTRL': 163,\n    'VOLUP': 175,\n    'DOLDOWN': 174,\n    'NUMLOCK': 144,\n    'SCROLL': 145 }\n\ndef KeyUp(Key):\n    keybd_event(Key, 0, 2, 0)\n\n\ndef KeyDown(Key):\n    keybd_event(Key, 0, 1, 0)\n\n\ndef Press(Key, speed=1):\n    rest_time = 0.05/speed\n    if Key in Base:\n        Key = Base[Key]\n        KeyDown(Key)\n        time.sleep(rest_time)\n        KeyUp(Key)\n        return True\n    if Key in Combs:\n        KeyDown(Base[Combs[Key][0]])\n        time.sleep(rest_time)\n        KeyDown(Base[Combs[Key][1]])\n        time.sleep(rest_time)\n        KeyUp(Base[Combs[Key][1]])\n        time.sleep(rest_time)\n        KeyUp(Base[Combs[Key][0]])\n        return True\n    return False\n\n\ndef Write(Str, speed = 1):\n    for s in Str:\n        Press(s, speed)\n        time.sleep((0.1 + random.random()/10.0) / float(speed))\n
\n

Example:

\n
>>> Write('Hello, World!', speed=3)\nHello, World!\n>>> Press('ENTER')\n
\n

If you want to implement some more keys then you can find their codes here. And just add these keys to the Base dictionary.

\n soup wrap:

I wrote this code more than 1 year ago so it is not perfect but it works:

from win32api import keybd_event
import time
import random


Combs = {
    'A': [
        'SHIFT',
        'a'],
    'B': [
        'SHIFT',
        'b'],
    'C': [
        'SHIFT',
        'c'],
    'D': [
        'SHIFT',
        'd'],
    'E': [
        'SHIFT',
        'e'],
    'F': [
        'SHIFT',
        'f'],
    'G': [
        'SHIFT',
        'g'],
    'H': [
        'SHIFT',
        'h'],
    'I': [
        'SHIFT',
        'i'],
    'J': [
        'SHIFT',
        'j'],
    'K': [
        'SHIFT',
        'k'],
    'L': [
        'SHIFT',
        'l'],
    'M': [
        'SHIFT',
        'm'],
    'N': [
        'SHIFT',
        'n'],
    'O': [
        'SHIFT',
        'o'],
    'P': [
        'SHIFT',
        'p'],
    'R': [
        'SHIFT',
        'r'],
    'S': [
        'SHIFT',
        's'],
    'T': [
        'SHIFT',
        't'],
    'U': [
        'SHIFT',
        'u'],
    'W': [
        'SHIFT',
        'w'],
    'X': [
        'SHIFT',
        'x'],
    'Y': [
        'SHIFT',
        'y'],
    'Z': [
        'SHIFT',
        'z'],
    'V': [
        'SHIFT',
        'v'],
    'Q': [
        'SHIFT',
        'q'],
    '?': [
        'SHIFT',
        '/'],
    '>': [
        'SHIFT',
        '.'],
    '<': [
        'SHIFT',
        ','],
    '"': [
        'SHIFT',
        "'"],
    ':': [
        'SHIFT',
        ';'],
    '|': [
        'SHIFT',
        '\\'],
    '}': [
        'SHIFT',
        ']'],
    '{': [
        'SHIFT',
        '['],
    '+': [
        'SHIFT',
        '='],
    '_': [
        'SHIFT',
        '-'],
    '!': [
        'SHIFT',
        '1'],
    '@': [
        'SHIFT',
        '2'],
    '#': [
        'SHIFT',
        '3'],
    '$': [
        'SHIFT',
        '4'],
    '%': [
        'SHIFT',
        '5'],
    '^': [
        'SHIFT',
        '6'],
    '&': [
        'SHIFT',
        '7'],
    '*': [
        'SHIFT',
        '8'],
    '(': [
        'SHIFT',
        '9'],
    ')': [
        'SHIFT',
        '0'] }
Base = {
    '0': 48,
    '1': 49,
    '2': 50,
    '3': 51,
    '4': 52,
    '5': 53,
    '6': 54,
    '7': 55,
    '8': 56,
    '9': 57,
    'a': 65,
    'b': 66,
    'c': 67,
    'd': 68,
    'e': 69,
    'f': 70,
    'g': 71,
    'h': 72,
    'i': 73,
    'j': 74,
    'k': 75,
    'l': 76,
    'm': 77,
    'n': 78,
    'o': 79,
    'p': 80,
    'q': 81,
    'r': 82,
    's': 83,
    't': 84,
    'u': 85,
    'v': 86,
    'w': 87,
    'x': 88,
    'y': 89,
    'z': 90,
    '.': 190,
    '-': 189,
    ',': 188,
    '=': 187,
    '/': 191,
    ';': 186,
    '[': 219,
    ']': 221,
    '\\': 220,
    "'": 222,
    'ALT': 18,
    'TAB': 9,
    'CAPSLOCK': 20,
    'ENTER': 13,
    'BS': 8,
    'CTRL': 17,
    'ESC': 27,
    ' ': 32,
    'END': 35,
    'DOWN': 40,
    'LEFT': 37,
    'UP': 38,
    'RIGHT': 39,
    'SELECT': 41,
    'PRINTSCR': 44,
    'INS': 45,
    'DEL': 46,
    'LWIN': 91,
    'RWIN': 92,
    'LSHIFT': 160,
    'SHIFT': 161,
    'LCTRL': 162,
    'RCTRL': 163,
    'VOLUP': 175,
    'DOLDOWN': 174,
    'NUMLOCK': 144,
    'SCROLL': 145 }

def KeyUp(Key):
    keybd_event(Key, 0, 2, 0)


def KeyDown(Key):
    keybd_event(Key, 0, 1, 0)


def Press(Key, speed=1):
    rest_time = 0.05/speed
    if Key in Base:
        Key = Base[Key]
        KeyDown(Key)
        time.sleep(rest_time)
        KeyUp(Key)
        return True
    if Key in Combs:
        KeyDown(Base[Combs[Key][0]])
        time.sleep(rest_time)
        KeyDown(Base[Combs[Key][1]])
        time.sleep(rest_time)
        KeyUp(Base[Combs[Key][1]])
        time.sleep(rest_time)
        KeyUp(Base[Combs[Key][0]])
        return True
    return False


def Write(Str, speed = 1):
    for s in Str:
        Press(s, speed)
        time.sleep((0.1 + random.random()/10.0) / float(speed))

Example:

>>> Write('Hello, World!', speed=3)
Hello, World!
>>> Press('ENTER')

If you want to implement some more keys then you can find their codes here. And just add these keys to the Base dictionary.

qid & accept id: (14144315, 14144422) query: Remove unwanted commas from CSV using Python soup:

You can define the separating and quoting characters with Python's CSV reader. For example:

\n

With this CSV:

\n
1,`Flat 5, Park Street`\n
\n

And this Python:

\n
import csv\n\nwith open('14144315.csv', 'rb') as csvfile:\n    rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')\n    for row in rowreader:\n        print row\n
\n

You will see this output:

\n
['1', 'Flat 5, Park Street']\n
\n

This would use commas to separate values but inverted commas for quoted commas

\n soup wrap:

You can define the separating and quoting characters with Python's CSV reader. For example:

With this CSV:

1,`Flat 5, Park Street`

And this Python:

import csv

with open('14144315.csv', 'rb') as csvfile:
    rowreader = csv.reader(csvfile, delimiter=',', quotechar='`')
    for row in rowreader:
        print row

You will see this output:

['1', 'Flat 5, Park Street']

This would use commas to separate values but inverted commas for quoted commas

qid & accept id: (14154851, 14155403) query: any python min like function which gives a list as result soup:

Using collections.defaultdict:

\n
d=collections.defaultdict(list)\nfor item in lst:\n    d[item[1]].append(item)\nd[min(key for key in d.keys() if key!=0)]\n
\n

Out:

\n
[('NORTHLANDER', 3), ('VOLT', 3)]\n
\n

Test:

\n
#unwind's solution\n\ndef f(lst):\n    return [y for y in lst if y[1] == min([x for x in lst if x[1] > 0],\n                                             key = lambda x: x[1])[1]]\n\ndef f2(lst):\n    d=collections.defaultdict(list)\n    for item in lst:\n        d[item[1]].append(item)\n    return d[min(key for key in d.keys() if key!=0)]\n\n%timeit f(lst)\n100000 loops, best of 3: 12.1 us per loop\n%timeit f2(lst)\n100000 loops, best of 3: 5.42 us per loop\n
\n

So, defaultdict seems to be more than twice as fast.

\n

edit\n@martineau optimization:

\n
def f3(lst):\n    lstm = min((x for x in lst if x[1]), key = lambda x: x[1])[1]\n    return [y for y in lst if y[1] == lstm]\n\n%timeit f3(lst)\n100000 loops, best of 3: 4.19 us per loop\n
\n

And another dict based solution using set.default is even a bit faster:

\n
def f4(lst):\n    d={}\n    for item in lst:\n        if item[1] != 0:\n            d.setdefault(item[1],{})[item]=0\n    return d[min(d.keys())].keys()\n\n%timeit f4(lst)\n100000 loops, best of 3: 3.76 us per loop\n
\n soup wrap:

Using collections.defaultdict:

d=collections.defaultdict(list)
for item in lst:
    d[item[1]].append(item)
d[min(key for key in d.keys() if key!=0)]

Out:

[('NORTHLANDER', 3), ('VOLT', 3)]

Test:

#unwind's solution

def f(lst):
    return [y for y in lst if y[1] == min([x for x in lst if x[1] > 0],
                                             key = lambda x: x[1])[1]]

def f2(lst):
    d=collections.defaultdict(list)
    for item in lst:
        d[item[1]].append(item)
    return d[min(key for key in d.keys() if key!=0)]

%timeit f(lst)
100000 loops, best of 3: 12.1 us per loop
%timeit f2(lst)
100000 loops, best of 3: 5.42 us per loop

So, defaultdict seems to be more than twice as fast.

edit @martineau optimization:

def f3(lst):
    lstm = min((x for x in lst if x[1]), key = lambda x: x[1])[1]
    return [y for y in lst if y[1] == lstm]

%timeit f3(lst)
100000 loops, best of 3: 4.19 us per loop

And another dict based solution using set.default is even a bit faster:

def f4(lst):
    d={}
    for item in lst:
        if item[1] != 0:
            d.setdefault(item[1],{})[item]=0
    return d[min(d.keys())].keys()

%timeit f4(lst)
100000 loops, best of 3: 3.76 us per loop
qid & accept id: (14208280, 14208348) query: Python BeautifulSoup get text from HTML soup:

You can read the subsequent sibling of each p tag (note this is very specific to this text, so hopefully it can be expanded to your situation):

\n
In [1]: from bs4 import BeautifulSoup\n\nIn [2]: html = """\\n   ...: 

aaa

bbb\n ...:

ccc

ddd"""\n\nIn [3]: soup = BeautifulSoup(html)\n\nIn [4]: [p.next_sibling for p in soup.findAll('p')]\nOut[4]: [u'bbb\n', u'ddd']\n
\n

This picks up the trailing newline, so you can strip it off if need be:

\n
In [5]: [p.next_sibling.strip() for p in soup.findAll('p')]\nOut[5]: [u'bbb', u'ddd']\n
\n

The general idea is that you locate the tag(s) before your target text and then find the next sibling element, which should be your text.

\n soup wrap:

You can read the subsequent sibling of each p tag (note this is very specific to this text, so hopefully it can be expanded to your situation):

In [1]: from bs4 import BeautifulSoup

In [2]: html = """\
   ...: 

aaa

bbb ...:

ccc

ddd""" In [3]: soup = BeautifulSoup(html) In [4]: [p.next_sibling for p in soup.findAll('p')] Out[4]: [u'bbb\n', u'ddd']

This picks up the trailing newline, so you can strip it off if need be:

In [5]: [p.next_sibling.strip() for p in soup.findAll('p')]
Out[5]: [u'bbb', u'ddd']

The general idea is that you locate the tag(s) before your target text and then find the next sibling element, which should be your text.

qid & accept id: (14211597, 14211674) query: How to do a groupby of a list of lists soup:

Let's name your list a and not list (list is a very useful function in Python and we don't want to mask it):

\n
import itertools as it\n\na = [('2013-01-04', u'crid2557171372', 1),\n     ('2013-01-04', u'crid9904536154', 719677),\n     ('2013-01-04', u'crid7990924609', 577352),\n     ('2013-01-04', u'crid7990924609', 399058),\n     ('2013-01-04', u'crid9904536154', 385260),\n     ('2013-01-04', u'crid2557171372', 78873)]\n\nb = []\nfor k,v in it.groupby(sorted(a, key=lambda x: x[:2]), key=lambda x: x[:2]):\n    b.append(k + (sum(x[2] for x in v),))\n
\n

b is now:

\n
[('2013-01-04', u'crid2557171372', 78874),\n ('2013-01-04', u'crid7990924609', 976410),\n ('2013-01-04', u'crid9904536154', 1104937)]\n
\n soup wrap:

Let's name your list a and not list (list is a very useful function in Python and we don't want to mask it):

import itertools as it

a = [('2013-01-04', u'crid2557171372', 1),
     ('2013-01-04', u'crid9904536154', 719677),
     ('2013-01-04', u'crid7990924609', 577352),
     ('2013-01-04', u'crid7990924609', 399058),
     ('2013-01-04', u'crid9904536154', 385260),
     ('2013-01-04', u'crid2557171372', 78873)]

b = []
for k,v in it.groupby(sorted(a, key=lambda x: x[:2]), key=lambda x: x[:2]):
    b.append(k + (sum(x[2] for x in v),))

b is now:

[('2013-01-04', u'crid2557171372', 78874),
 ('2013-01-04', u'crid7990924609', 976410),
 ('2013-01-04', u'crid9904536154', 1104937)]
qid & accept id: (14258317, 14259108) query: More numpy way of iterating through the 'orthogonal' diagonals of a 2D array soup:

Here is a way that avoids Python for-loops.

\n

First, let's look at our addition tables:

\n
import numpy as np\ngrid_shape = (4,5)\nN = np.prod(grid_shape)\n\ny = np.add.outer(np.arange(grid_shape[0]),np.arange(grid_shape[1]))\nprint(y)\n\n# [[0 1 2 3 4]\n#  [1 2 3 4 5]\n#  [2 3 4 5 6]\n#  [3 4 5 6 7]]\n
\n

The key idea is that if we visit the sums in the addition table in order, we would be iterating through the array in the desired order.

\n

We can find out the indices associated with that order using np.argsort:

\n
idx = np.argsort(y.ravel())\nprint(idx)\n# [ 0  1  5  2  6 10  3  7 11 15  4  8 12 16  9 13 17 14 18 19]\n
\n

idx is golden. It is essentially everything you need to iterate through any 2D array of shape (4,5), since a 2D array is just a 1D array reshaped.

\n

If your ultimate goal is to generate the array A that you show above at the end of your post, then you could use argsort again:

\n
print(np.argsort(idx).reshape(grid_shape[0],-1))\n# [[ 0  1  3  6 10]\n#  [ 2  4  7 11 14]\n#  [ 5  8 12 15 17]\n#  [ 9 13 16 18 19]]\n
\n

Or, alternatively, if you need to assign other values to A, perhaps this would be more useful:

\n
A = np.zeros(grid_shape)\nA1d = A.ravel()\nA1d[idx] = np.arange(N)  # you can change np.arange(N) to any 1D array of shape (N,)\nprint(A)\n# [[  0.   1.   3.   6.  10.]\n#  [  2.   4.   7.  11.  15.]\n#  [  5.   8.  12.  16.  18.]\n#  [  9.  13.  14.  17.  19.]]\n
\n

I know you asked for a way to iterate through your array, but I wanted to show the above because generating arrays through whole-array assignment or numpy function calls (like np.argsort) as done above will probably be faster than using a Python loop. But if you need to use a Python loop, then:

\n
for i, j in enumerate(idx):\n   A1d[j] = i\n\nprint(A)\n# [[  0.   1.   3.   6.  10.]\n#  [  2.   4.   7.  11.  15.]\n#  [  5.   8.  12.  16.  18.]\n#  [  9.  13.  14.  17.  19.]]\n
\n soup wrap:

Here is a way that avoids Python for-loops.

First, let's look at our addition tables:

import numpy as np
grid_shape = (4,5)
N = np.prod(grid_shape)

y = np.add.outer(np.arange(grid_shape[0]),np.arange(grid_shape[1]))
print(y)

# [[0 1 2 3 4]
#  [1 2 3 4 5]
#  [2 3 4 5 6]
#  [3 4 5 6 7]]

The key idea is that if we visit the sums in the addition table in order, we would be iterating through the array in the desired order.

We can find out the indices associated with that order using np.argsort:

idx = np.argsort(y.ravel())
print(idx)
# [ 0  1  5  2  6 10  3  7 11 15  4  8 12 16  9 13 17 14 18 19]

idx is golden. It is essentially everything you need to iterate through any 2D array of shape (4,5), since a 2D array is just a 1D array reshaped.

If your ultimate goal is to generate the array A that you show above at the end of your post, then you could use argsort again:

print(np.argsort(idx).reshape(grid_shape[0],-1))
# [[ 0  1  3  6 10]
#  [ 2  4  7 11 14]
#  [ 5  8 12 15 17]
#  [ 9 13 16 18 19]]

Or, alternatively, if you need to assign other values to A, perhaps this would be more useful:

A = np.zeros(grid_shape)
A1d = A.ravel()
A1d[idx] = np.arange(N)  # you can change np.arange(N) to any 1D array of shape (N,)
print(A)
# [[  0.   1.   3.   6.  10.]
#  [  2.   4.   7.  11.  15.]
#  [  5.   8.  12.  16.  18.]
#  [  9.  13.  14.  17.  19.]]

I know you asked for a way to iterate through your array, but I wanted to show the above because generating arrays through whole-array assignment or numpy function calls (like np.argsort) as done above will probably be faster than using a Python loop. But if you need to use a Python loop, then:

for i, j in enumerate(idx):
   A1d[j] = i

print(A)
# [[  0.   1.   3.   6.  10.]
#  [  2.   4.   7.  11.  15.]
#  [  5.   8.  12.  16.  18.]
#  [  9.  13.  14.  17.  19.]]
qid & accept id: (14275975, 14276423) query: Creating random binary files soup:

IMHO - the following is completely redundant:

\n
f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))\n
\n

There's absolutely no need to use struct.pack, just do something like:

\n
import os\n\nwith open('output_file', 'wb') as fout:\n    fout.write(os.urandom(1024)) # replace 1024 with size_kb if not unreasonably large\n
\n

Then, if you need to re-use the file for reading integers, then struct.unpack then.

\n
\n

(my use case is generating a file for a unit test so I just need a\n file that isn't identical with other generated files).

\n
\n

Another option is to just write a UUID4 to the file, but since I don't know the exact use case, I'm not sure that's viable.

\n soup wrap:

IMHO - the following is completely redundant:

f.write(struct.pack("=I",random.randint(0,sys.maxint*2+1)))

There's absolutely no need to use struct.pack, just do something like:

import os

with open('output_file', 'wb') as fout:
    fout.write(os.urandom(1024)) # replace 1024 with size_kb if not unreasonably large

Then, if you need to re-use the file for reading integers, then struct.unpack then.

(my use case is generating a file for a unit test so I just need a file that isn't identical with other generated files).

Another option is to just write a UUID4 to the file, but since I don't know the exact use case, I'm not sure that's viable.

qid & accept id: (14285049, 14285765) query: Sort a list based on a given distribution soup:
from numpy import take,argsort\n\ntake(opt,argsort(argsort(perc)[::-1]))\n
\n

or without imports:

\n
zip(*sorted(zip(sorted(range(len(perc)), key=perc.__getitem__)[::-1],opt)))[1]\n
\n
\n
#Test\n\nl=[([0.23, 0.27, 0.4, 0.1],[3, 2, 2, 1]),\n   ([0.25, 0.25, 0.4, 0.1],[3, 2, 2, 1]),\n   ([0.2,  0.2,  0.4, 0.2],[3, 2, 2, 1])]\n\ndef f1(perc,opt):\n    return take(opt,argsort(argsort(perc)[::-1]))\n\ndef f2(perc,opt):\n    return zip(*sorted(zip(sorted(range(len(perc)),\n             key=perc.__getitem__)[::-1],opt)))[1]       \n\nfor i in l:\n    perc, opt = i\n    print f1(perc,opt), f2(perc,opt)\n\n# output:\n# [2 2 3 1] (2, 2, 3, 1)\n# [2 2 3 1] (2, 2, 3, 1)\n# [1 2 3 2] (1, 2, 3, 2)\n
\n soup wrap:
from numpy import take,argsort

take(opt,argsort(argsort(perc)[::-1]))

or without imports:

zip(*sorted(zip(sorted(range(len(perc)), key=perc.__getitem__)[::-1],opt)))[1]

#Test

l=[([0.23, 0.27, 0.4, 0.1],[3, 2, 2, 1]),
   ([0.25, 0.25, 0.4, 0.1],[3, 2, 2, 1]),
   ([0.2,  0.2,  0.4, 0.2],[3, 2, 2, 1])]

def f1(perc,opt):
    return take(opt,argsort(argsort(perc)[::-1]))

def f2(perc,opt):
    return zip(*sorted(zip(sorted(range(len(perc)),
             key=perc.__getitem__)[::-1],opt)))[1]       

for i in l:
    perc, opt = i
    print f1(perc,opt), f2(perc,opt)

# output:
# [2 2 3 1] (2, 2, 3, 1)
# [2 2 3 1] (2, 2, 3, 1)
# [1 2 3 2] (1, 2, 3, 2)
qid & accept id: (14363016, 14363883) query: Finding superstrings in a set of strings in python soup:

Let's make a laundry list of the major players in this problem:

\n
    \n
  • strings, e.g. '24 139 277'
  • \n
  • sets -- a collection of "superstrings"
  • \n
  • superset inclusion -- the <= set operator
  • \n
  • splitting the strings into a set of number-strings: e.g. set(['24', '139', '277'])
  • \n
\n

We are given a list of strings, but what we'd really like -- what would be more useful -- is a list of sets:

\n
In [20]: strings = [frozenset(s.split()) for s in strings]    \nIn [21]: strings\nOut[21]: \n[frozenset(['24']),\n frozenset(['277']),\n ...\n frozenset(['136', '139']),\n frozenset(['246'])]\n
\n

The reason for frozensets will become apparent shortly. I'll explain why, below. The reason why we want sets at all is because that have a convenient superset comparison operator:

\n
In [22]: frozenset(['136']) <= frozenset(['136', '139', '24'])\nOut[22]: True\n\nIn [23]: frozenset(['136']) <= frozenset(['24', '277'])\nOut[23]: False\n
\n

This is exactly what we need to determine if one string is a superstring of another.

\n

So, basically, we want to:

\n
    \n
  • Start with an empty set of superstrings = set()
  • \n
  • Iterate through strings: for s in strings.
  • \n
  • As we examine each s in strings, we will add new ones to\nsuperstrings if they are not a subset of a item already in\nsuperstrings.
  • \n
  • For each s, iterate through a set of superstrings: for sup in superstrings.

    \n
      \n
    • Check if s <= sup -- that is, if s is a subset of sup, quit the loop since s is smaller than some known superstring.

    • \n
    • Check if sup <= s -- that is, if\ns a superset of some item in superstrings. In this case, remove the item in superstrings and replace it with s.

    • \n
  • \n
\n

Technical notes:

\n
    \n
  • Because we are removing items from superstrings, we can not also\niterate over superstrings itself. So, instead, iterate over a copy:

    \n
    for sup in superstrings.copy():\n
  • \n
  • And finally, we would like superstrings to be a set of sets. But\nthe items in a set have to be hashable, and sets themselves are not\nhashable. But frozensets are, so it is possible to have a set of\nfrozensets. This is why we converted strings into a list of\nfrozensets.
  • \n
\n
\n
strings = [\n    '24', '277', '277 24', '139 24', '139 277 24', '139 277', '139', '136 24',\n    '136 277 24', '136 277', '136', '136 139 24', '136 139 277 24', '136 139 277',\n    '136 139', '246']\n\ndef find_supersets(strings):\n    superstrings = set()\n    set_to_string = dict(zip([frozenset(s.split()) for s in strings], strings))\n    for s in set_to_string.keys():\n        for sup in superstrings.copy():\n            if s <= sup:\n                # print('{s!r} <= {sup!r}'.format(s = s, sup = sup))\n                break\n            elif sup < s:\n                # print('{sup!r} <= {s!r}'.format(s = s, sup = sup))\n                superstrings.remove(sup)\n        else:\n            superstrings.add(s)\n    return [set_to_string[sup] for sup in superstrings]\n\nprint(find_supersets(strings))\n
\n

yields

\n
['136 139 277 24', '246']\n
\n
\n

It turns out this is faster than pre-sorting the strings:

\n
def using_sorted(strings):\n    stsets = sorted(\n        (frozenset(s.split()) for s in strings), key=len, reverse=True)\n    superstrings = set()\n    for stset in stsets:\n        if not any(stset.issubset(s) for s in superstrings):\n            superstrings.add(stset)\n    return superstrings\n\nIn [29]: timeit find_supersets(strings)\n100000 loops, best of 3: 18.3 us per loop\nIn [25]: timeit using_sorted(strings)\n10000 loops, best of 3: 24.9 us per loop\n
\n soup wrap:

Let's make a laundry list of the major players in this problem:

  • strings, e.g. '24 139 277'
  • sets -- a collection of "superstrings"
  • superset inclusion -- the <= set operator
  • splitting the strings into a set of number-strings: e.g. set(['24', '139', '277'])

We are given a list of strings, but what we'd really like -- what would be more useful -- is a list of sets:

In [20]: strings = [frozenset(s.split()) for s in strings]    
In [21]: strings
Out[21]: 
[frozenset(['24']),
 frozenset(['277']),
 ...
 frozenset(['136', '139']),
 frozenset(['246'])]

The reason for frozensets will become apparent shortly. I'll explain why, below. The reason why we want sets at all is because that have a convenient superset comparison operator:

In [22]: frozenset(['136']) <= frozenset(['136', '139', '24'])
Out[22]: True

In [23]: frozenset(['136']) <= frozenset(['24', '277'])
Out[23]: False

This is exactly what we need to determine if one string is a superstring of another.

So, basically, we want to:

  • Start with an empty set of superstrings = set()
  • Iterate through strings: for s in strings.
  • As we examine each s in strings, we will add new ones to superstrings if they are not a subset of a item already in superstrings.
  • For each s, iterate through a set of superstrings: for sup in superstrings.

    • Check if s <= sup -- that is, if s is a subset of sup, quit the loop since s is smaller than some known superstring.

    • Check if sup <= s -- that is, if s a superset of some item in superstrings. In this case, remove the item in superstrings and replace it with s.

Technical notes:

  • Because we are removing items from superstrings, we can not also iterate over superstrings itself. So, instead, iterate over a copy:

    for sup in superstrings.copy():
    
  • And finally, we would like superstrings to be a set of sets. But the items in a set have to be hashable, and sets themselves are not hashable. But frozensets are, so it is possible to have a set of frozensets. This is why we converted strings into a list of frozensets.

strings = [
    '24', '277', '277 24', '139 24', '139 277 24', '139 277', '139', '136 24',
    '136 277 24', '136 277', '136', '136 139 24', '136 139 277 24', '136 139 277',
    '136 139', '246']

def find_supersets(strings):
    superstrings = set()
    set_to_string = dict(zip([frozenset(s.split()) for s in strings], strings))
    for s in set_to_string.keys():
        for sup in superstrings.copy():
            if s <= sup:
                # print('{s!r} <= {sup!r}'.format(s = s, sup = sup))
                break
            elif sup < s:
                # print('{sup!r} <= {s!r}'.format(s = s, sup = sup))
                superstrings.remove(sup)
        else:
            superstrings.add(s)
    return [set_to_string[sup] for sup in superstrings]

print(find_supersets(strings))

yields

['136 139 277 24', '246']

It turns out this is faster than pre-sorting the strings:

def using_sorted(strings):
    stsets = sorted(
        (frozenset(s.split()) for s in strings), key=len, reverse=True)
    superstrings = set()
    for stset in stsets:
        if not any(stset.issubset(s) for s in superstrings):
            superstrings.add(stset)
    return superstrings

In [29]: timeit find_supersets(strings)
100000 loops, best of 3: 18.3 us per loop
In [25]: timeit using_sorted(strings)
10000 loops, best of 3: 24.9 us per loop
qid & accept id: (14366713, 14366769) query: removing "()" using python soup:

Use regular expressions with re:

\n
>>> import re\n>>> s = 'N1B N 1.2620(4) 0.3320(4) 0.0049(7)'\n>>> re.sub('\(.*?\)', '', s)\n'N1B N 1.2620 0.3320 0.0049'\n
\n

? sign is for making you regex lazy. Without it you'll get:

\n
>>> re.sub('\(.*\)', '', s)\n'N1B N 1.2620'\n
\n

If you want to delete only digits, use \d insted of .:

\n
>>> s = 'N1B N 1.2620(spam) 0.3320(4) 0.0049(7)'\n>>> re.sub('\(\d*?\)', '', s)\n'N1B N 1.2620(spam) 0.3320 0.0049'\n
\n soup wrap:

Use regular expressions with re:

>>> import re
>>> s = 'N1B N 1.2620(4) 0.3320(4) 0.0049(7)'
>>> re.sub('\(.*?\)', '', s)
'N1B N 1.2620 0.3320 0.0049'

? sign is for making you regex lazy. Without it you'll get:

>>> re.sub('\(.*\)', '', s)
'N1B N 1.2620'

If you want to delete only digits, use \d insted of .:

>>> s = 'N1B N 1.2620(spam) 0.3320(4) 0.0049(7)'
>>> re.sub('\(\d*?\)', '', s)
'N1B N 1.2620(spam) 0.3320 0.0049'
qid & accept id: (14383937, 14384301) query: Check printable for Unicode soup:

You are looking for a test for a range of codepoints, so you need a regular expression:

\n
import re\n# match characters from ¿ to the end of the JSON-encodable range\nexclude = re.compile(ur'[\u00bf-\uffff]')\n\ndef isprintable(s):\n    return not bool(exclude.search(s))\n
\n

This will return False for any unicode text that has codepoints past \u00BE ("¾").

\n
>>> isprintable(u'Hello World!')\nTrue\n>>> isprintable(u'Jeg \u00f8ve mit Norsk.')\nFalse\n
\n soup wrap:

You are looking for a test for a range of codepoints, so you need a regular expression:

import re
# match characters from ¿ to the end of the JSON-encodable range
exclude = re.compile(ur'[\u00bf-\uffff]')

def isprintable(s):
    return not bool(exclude.search(s))

This will return False for any unicode text that has codepoints past \u00BE ("¾").

>>> isprintable(u'Hello World!')
True
>>> isprintable(u'Jeg \u00f8ve mit Norsk.')
False
qid & accept id: (14407563, 14408360) query: syntactic whitespaces with pyparsing's operatorPrecedence soup:

After thinking it over, I think the language you're trying to define is ambiguous, but there are multiple ways to fix that.

\n

You want this:

\n
parse('a:b -> c : d e')\n
\n

To give you this:

\n
[[['a', ':', 'b'], '->', ['c', ':', ['d', 'e']]]]\n
\n

You've implied that you want whitespace to act as an operator. But then why isn't it an operator in the context of 'c :'? What's the rule for when it is and when it isn't an operator?

\n

Either that, or you want each operand to be a space-separated list of words. But in that case, why is that 'a' instead of ['a']? Either each of the operands is a list, or none of them are, right? It's clearly not position-dependent, and you haven't specified any other rule.

\n

There is (at least) one plausible rule that fits what you have in mind: Collapse any operand that's a single-element list down to just that element. But that's a strange rule—and when you later use this parse tree for whatever purpose you're using it for, you have to effectively reverse the same rule, by writing code that handles a single word as if it were a one-word list. So… why do it that way?

\n

I can think of three better alternatives:

\n
    \n
  1. Require every operand to be a space-delimited list of words.
  2. \n
  3. Allow spaces in the middle of operands.
  4. \n
  5. Use default whitespace handling, and allow multiple terms on each side of any operator.
  6. \n
\n

Any of these are very easy to parse, and give you a parse tree that's very easy to use. I'd probably go with #2, but since I already explained how to do that in a comment above, let's do #3 here:

\n
>>> operands = OneOrMore(Word(alphanums))\n>>> precedence = [\n...     (":", 2, opAssoc.LEFT),\n...     ("->", 2, opAssoc.LEFT),\n...     ]\n>>> parser = operatorPrecedence(operands, precedence)\n>>> def parse(s): return parser.parseString(s, parseAll=True)\n>>> print(parse('a:b -> c : d e'))\n[[['a', ':', 'b'], '->', ['c', ':', 'd', 'e']]]\n>>> print(parse('caffeine : A1 antagonist -> caffeine : peripheral stimulant'))\n[[['caffeine', ':', 'A1', 'antagonist'], '->', ['caffeine', ':', 'peripheral', 'stimulant']]]\n
\n soup wrap:

After thinking it over, I think the language you're trying to define is ambiguous, but there are multiple ways to fix that.

You want this:

parse('a:b -> c : d e')

To give you this:

[[['a', ':', 'b'], '->', ['c', ':', ['d', 'e']]]]

You've implied that you want whitespace to act as an operator. But then why isn't it an operator in the context of 'c :'? What's the rule for when it is and when it isn't an operator?

Either that, or you want each operand to be a space-separated list of words. But in that case, why is that 'a' instead of ['a']? Either each of the operands is a list, or none of them are, right? It's clearly not position-dependent, and you haven't specified any other rule.

There is (at least) one plausible rule that fits what you have in mind: Collapse any operand that's a single-element list down to just that element. But that's a strange rule—and when you later use this parse tree for whatever purpose you're using it for, you have to effectively reverse the same rule, by writing code that handles a single word as if it were a one-word list. So… why do it that way?

I can think of three better alternatives:

  1. Require every operand to be a space-delimited list of words.
  2. Allow spaces in the middle of operands.
  3. Use default whitespace handling, and allow multiple terms on each side of any operator.

Any of these are very easy to parse, and give you a parse tree that's very easy to use. I'd probably go with #2, but since I already explained how to do that in a comment above, let's do #3 here:

>>> operands = OneOrMore(Word(alphanums))
>>> precedence = [
...     (":", 2, opAssoc.LEFT),
...     ("->", 2, opAssoc.LEFT),
...     ]
>>> parser = operatorPrecedence(operands, precedence)
>>> def parse(s): return parser.parseString(s, parseAll=True)
>>> print(parse('a:b -> c : d e'))
[[['a', ':', 'b'], '->', ['c', ':', 'd', 'e']]]
>>> print(parse('caffeine : A1 antagonist -> caffeine : peripheral stimulant'))
[[['caffeine', ':', 'A1', 'antagonist'], '->', ['caffeine', ':', 'peripheral', 'stimulant']]]
qid & accept id: (14421133, 14421297) query: Catch Keyboard Interrupt in program that is waiting on an Event soup:

Update: On the current Python 3 finished_event.wait() works on my Ubuntu machine (starting with Python 3.2). You don't need to specify the timeout parameter, to interrupt it using Ctrl+C. You need to pass the timeout parameter on CPython 2.

\n

Here's a complete code example:

\n
#!/usr/bin/env python3\nimport threading\n\ndef f(event):\n    while True:\n        pass\n    # never reached, otherwise event.set() would be here\n\nevent = threading.Event()\nthreading.Thread(target=f, args=[event], daemon=True).start()\ntry:\n    print('Press Ctrl+C to exit')\n    event.wait()\nexcept KeyboardInterrupt:\n    print('got Ctrl+C')\n
\n

There could be bugs related to Ctrl+C. Test whether it works in your environment.

\n
\n

Old polling answer:

\n

You could try to allow the interpreter to run the main thread:

\n
while not finished_event.wait(.1): # timeout in seconds\n    pass\n
\n

If you just want to wait until the child thread is done:

\n
while thread.is_alive():\n    thread.join(.1)\n
\n soup wrap:

Update: On the current Python 3 finished_event.wait() works on my Ubuntu machine (starting with Python 3.2). You don't need to specify the timeout parameter, to interrupt it using Ctrl+C. You need to pass the timeout parameter on CPython 2.

Here's a complete code example:

#!/usr/bin/env python3
import threading

def f(event):
    while True:
        pass
    # never reached, otherwise event.set() would be here

event = threading.Event()
threading.Thread(target=f, args=[event], daemon=True).start()
try:
    print('Press Ctrl+C to exit')
    event.wait()
except KeyboardInterrupt:
    print('got Ctrl+C')

There could be bugs related to Ctrl+C. Test whether it works in your environment.


Old polling answer:

You could try to allow the interpreter to run the main thread:

while not finished_event.wait(.1): # timeout in seconds
    pass

If you just want to wait until the child thread is done:

while thread.is_alive():
    thread.join(.1)
qid & accept id: (14421630, 14421665) query: Create a list with all possible permutations from a know list of objects, but make the final list x in size soup:

I think itertools.product is what you're looking for.

\n
# A simple example\nimport itertools\nlst = [0, 1]\nprint(list(itertools.product(lst, repeat=2)))\n# [(0, 0), (0, 1), (1, 0), (1, 1)]\n
\n

Note that itertools.product itself returns an itertools.product object, not a list.

\n
# In your case\nimport itertools\nlst = [0, 1, 3, a, b, c]\noutput = list(itertools.product(lst, repeat=20))\n
\n soup wrap:

I think itertools.product is what you're looking for.

# A simple example
import itertools
lst = [0, 1]
print(list(itertools.product(lst, repeat=2)))
# [(0, 0), (0, 1), (1, 0), (1, 1)]

Note that itertools.product itself returns an itertools.product object, not a list.

# In your case
import itertools
lst = [0, 1, 3, a, b, c]
output = list(itertools.product(lst, repeat=20))
qid & accept id: (14436970, 14536529) query: Django: how to change label using formset extra? soup:

I'm assuming you want the result of the first form to determine the number of fields and their labels of the second, you might want to look into Django form wizards. But here's a simple, non-form-wizard (and probably less ideal/maintainable) way to do it, utilising the __init__ method of the formset to modify the form labels*:

\n
\n

forms.py:

\n
# File: forms.py\nfrom django import forms\nfrom django.forms.formsets import BaseFormSet\n\n\n# What you've called 'GetMachine'\nclass MachineForm(forms.Form):\n    no_of_lines = forms.IntegerField(max_value=4)\n\n\n# What you've called 'GetLine'\nclass LineForm(forms.Form):\n    beamline_name = forms.CharField(max_length=15, label='Name of Beamline')\n\n\n# Create a custom formset and override __init__\nclass BaseLineFormSet(BaseFormSet):\n    def __init__(self, *args, **kwargs):\n        super(BaseLineFormSet, self).__init__(*args, **kwargs)\n        no_of_forms = len(self)\n        for i in range(0, no_of_forms):\n            self[i].fields['beamline_name'].label += "-%d" % (i + 1)\n
\n

views.py:

\n
# File: views.py\nfrom django.forms.formsets import formset_factory\nfrom django.shortcuts import render_to_response\nfrom django.template import RequestContext\nfrom django.http import HttpResponseRedirect\nfrom django.core.urlresolvers import reverse\nfrom forms import MachineForm, LineForm, BaseLineFormSet\n\n\ndef get_no_of_lines(request):\n    if request.method == 'POST':\n        machine_form = MachineForm(request.POST)\n        if machine_form.is_valid():\n            # At this point, form fields have already been \n            # converted to Python data types :)\n            # so no need to convert `line_no` to an integer\n            no_of_lines = machine_form.cleaned_data['no_of_lines']\n            return HttpResponseRedirect(reverse('line_form', kwargs={'no_of_lines': no_of_lines}))\n    else:\n        # It looks to me like you probably don't mean to\n        # use formsets here (but a form instead)\n        machine_form = MachineForm()\n\n    c = RequestContext(request, {\n        'machine_form': machine_form,\n    })\n    return render_to_response('get_no_of_lines.html', c)\n\n\ndef line_form(request, no_of_lines):\n    # You probably should validate this number (again).\n    # In fact, you probably need to validate first form (MachineForm).\n    # ...But I'm assuming it'll be valid in this example.\n    no_of_lines = int(no_of_lines)\n    LineFormSet = formset_factory(LineForm, extra=no_of_lines, formset=BaseLineFormSet)\n    if request.method == "POST":\n        formset = LineFormSet(request.POST, request.FILES)\n        if formset.is_valid():\n            pass\n            # Do stuff with form submission\n            # Redirect\n\n    else:\n        formset = LineFormSet()\n\n    c = RequestContext(request, {\n        'formset': formset,\n    })\n    return render_to_response('line_form.html', c)\n
\n

urls.py:

\n
from django.conf.urls import url, patterns\nfrom views import get_no_of_lines, line_form\n\n\nurlpatterns = patterns('',\n     url(r'^$', get_no_of_lines, name='get_no_of_lines'),\n     url(r'^line_form/(?P\d{1})$', line_form, name='line_form'),\n)\n
\n

get_no_of_lines.html:

\n
\n{% csrf_token %}\n{{ machine_form }}\n
\n
\n

line_form.html:

\n
\n{% csrf_token %}\n{{ formset.as_p }}\n
\n
\n

The reason why I say this approach is probably not the best way to do this is because you have to validate no_of_lines being passed to line_form view (which could be > 4, so you'll have to perform validation here and introduce validation logic rather than having it one place — the form). And if you need to add a new field to the first form, you'll likely end up having to modify the code. So hence why I'd recommend looking into form wizards.

\n
\n\n soup wrap:

I'm assuming you want the result of the first form to determine the number of fields and their labels of the second, you might want to look into Django form wizards. But here's a simple, non-form-wizard (and probably less ideal/maintainable) way to do it, utilising the __init__ method of the formset to modify the form labels*:


forms.py:

# File: forms.py
from django import forms
from django.forms.formsets import BaseFormSet


# What you've called 'GetMachine'
class MachineForm(forms.Form):
    no_of_lines = forms.IntegerField(max_value=4)


# What you've called 'GetLine'
class LineForm(forms.Form):
    beamline_name = forms.CharField(max_length=15, label='Name of Beamline')


# Create a custom formset and override __init__
class BaseLineFormSet(BaseFormSet):
    def __init__(self, *args, **kwargs):
        super(BaseLineFormSet, self).__init__(*args, **kwargs)
        no_of_forms = len(self)
        for i in range(0, no_of_forms):
            self[i].fields['beamline_name'].label += "-%d" % (i + 1)

views.py:

# File: views.py
from django.forms.formsets import formset_factory
from django.shortcuts import render_to_response
from django.template import RequestContext
from django.http import HttpResponseRedirect
from django.core.urlresolvers import reverse
from forms import MachineForm, LineForm, BaseLineFormSet


def get_no_of_lines(request):
    if request.method == 'POST':
        machine_form = MachineForm(request.POST)
        if machine_form.is_valid():
            # At this point, form fields have already been 
            # converted to Python data types :)
            # so no need to convert `line_no` to an integer
            no_of_lines = machine_form.cleaned_data['no_of_lines']
            return HttpResponseRedirect(reverse('line_form', kwargs={'no_of_lines': no_of_lines}))
    else:
        # It looks to me like you probably don't mean to
        # use formsets here (but a form instead)
        machine_form = MachineForm()

    c = RequestContext(request, {
        'machine_form': machine_form,
    })
    return render_to_response('get_no_of_lines.html', c)


def line_form(request, no_of_lines):
    # You probably should validate this number (again).
    # In fact, you probably need to validate first form (MachineForm).
    # ...But I'm assuming it'll be valid in this example.
    no_of_lines = int(no_of_lines)
    LineFormSet = formset_factory(LineForm, extra=no_of_lines, formset=BaseLineFormSet)
    if request.method == "POST":
        formset = LineFormSet(request.POST, request.FILES)
        if formset.is_valid():
            pass
            # Do stuff with form submission
            # Redirect

    else:
        formset = LineFormSet()

    c = RequestContext(request, {
        'formset': formset,
    })
    return render_to_response('line_form.html', c)

urls.py:

from django.conf.urls import url, patterns
from views import get_no_of_lines, line_form


urlpatterns = patterns('',
     url(r'^$', get_no_of_lines, name='get_no_of_lines'),
     url(r'^line_form/(?P\d{1})$', line_form, name='line_form'),
)

get_no_of_lines.html:


{% csrf_token %}
{{ machine_form }}

line_form.html:

{% csrf_token %} {{ formset.as_p }}

The reason why I say this approach is probably not the best way to do this is because you have to validate no_of_lines being passed to line_form view (which could be > 4, so you'll have to perform validation here and introduce validation logic rather than having it one place — the form). And if you need to add a new field to the first form, you'll likely end up having to modify the code. So hence why I'd recommend looking into form wizards.


qid & accept id: (14462993, 14478692) query: How to define multi-company-aware models in OpenERP soup:

Many documents in the official OpenERP addons have similar multi-company features, so you should presumably reuse the same implementation technique, it seems to match your use case.

\n

There are tons of examples in the source code if you search for "company_id" or "company_id.*fields.many2one", for example Sales Shops in the sale module.

\n

In a nutshell, you will need to:

\n
    \n
  1. Declare the company_id field as a regular many2one towards res.company. The default security record rules (defined here) will take care of dynamically showing only the companies that are subsidiaries of the user's current company. The user can change their current company to any of their allowed companies at any time in the preferences, to work in a different company context. And since security record rules do not apply for the special admin user, it will always be possible to choose any company when logged in as admin.

    \n
    'company_id': fields.many2one('res.company', 'Company', required=False)\n
  2. \n
  3. Automatically select the user's current company as default when creating new records. The framework provides a method for doing exactly that: res.company._company_default_get(). It is possible to define custom rules for selecting a default company for each kind of document, but the default will be the user's current company. So it is as simple as adding this snippet to your model's _defaults:

    \n
    'company_id': lambda self,cr,uid,ctx: self.pool['res.company']._company_default_get(cr,uid,object='',context=ctx)\n
  4. \n
  5. Add the company_id field to your model's form view. Usually you want to restrict it to the multi-company group, to show it only to users who actually need it:

    \n
    \n
  6. \n
\n

It's supposed to be as simple as that.

\n soup wrap:

Many documents in the official OpenERP addons have similar multi-company features, so you should presumably reuse the same implementation technique, it seems to match your use case.

There are tons of examples in the source code if you search for "company_id" or "company_id.*fields.many2one", for example Sales Shops in the sale module.

In a nutshell, you will need to:

  1. Declare the company_id field as a regular many2one towards res.company. The default security record rules (defined here) will take care of dynamically showing only the companies that are subsidiaries of the user's current company. The user can change their current company to any of their allowed companies at any time in the preferences, to work in a different company context. And since security record rules do not apply for the special admin user, it will always be possible to choose any company when logged in as admin.

    'company_id': fields.many2one('res.company', 'Company', required=False)
    
  2. Automatically select the user's current company as default when creating new records. The framework provides a method for doing exactly that: res.company._company_default_get(). It is possible to define custom rules for selecting a default company for each kind of document, but the default will be the user's current company. So it is as simple as adding this snippet to your model's _defaults:

    'company_id': lambda self,cr,uid,ctx: self.pool['res.company']._company_default_get(cr,uid,object='',context=ctx)
    
  3. Add the company_id field to your model's form view. Usually you want to restrict it to the multi-company group, to show it only to users who actually need it:

    
    

It's supposed to be as simple as that.

qid & accept id: (14494101, 14494131) query: Using other keys for the waitKey() function of opencv soup:

You can use ord() function in Python for that.

\n

For example, if you want to trigger 'a' key press, do as follows :

\n
if cv2.waitKey(33) == ord('a'):\n   print "pressed a"\n
\n

See a sample code here: Drawing Histogram

\n

UPDATE :

\n

To find the key value for any key is to print the key value using a simple script as follows :

\n
import cv2\nimg = cv2.imread('sof.jpg') # load a dummy image\nwhile(1):\n    cv2.imshow('img',img)\n    k = cv2.waitKey(33)\n    if k==27:    # Esc key to stop\n        break\n    elif k==-1:  # normally -1 returned,so don't print it\n        continue\n    else:\n        print k # else print its value\n
\n

With this code, I got following values :

\n
Upkey : 2490368\nDownKey : 2621440\nLeftKey : 2424832\nRightKey: 2555904\nSpace : 32\nDelete : 3014656\n...... # Continue yourself :)\n
\n soup wrap:

You can use ord() function in Python for that.

For example, if you want to trigger 'a' key press, do as follows :

if cv2.waitKey(33) == ord('a'):
   print "pressed a"

See a sample code here: Drawing Histogram

UPDATE :

To find the key value for any key is to print the key value using a simple script as follows :

import cv2
img = cv2.imread('sof.jpg') # load a dummy image
while(1):
    cv2.imshow('img',img)
    k = cv2.waitKey(33)
    if k==27:    # Esc key to stop
        break
    elif k==-1:  # normally -1 returned,so don't print it
        continue
    else:
        print k # else print its value

With this code, I got following values :

Upkey : 2490368
DownKey : 2621440
LeftKey : 2424832
RightKey: 2555904
Space : 32
Delete : 3014656
...... # Continue yourself :)
qid & accept id: (14508570, 14508649) query: Numpy averaging with multi-dimensional weights along an axis soup:

In a single line:

\n
np.average(a.reshape(48, -1), weights=b.ravel()), axis=1)\n
\n

You can test it with:

\n
a = np.random.rand(48, 90, 144)\nb = np.random.rand(90,144)\nnp.testing.assert_almost_equal(np.average(a.reshape(48, -1),\n                                          weights=b.ravel(), axis=1),\n                               np.array([np.average(a[i],\n                                                    weights=b) for i in range(48)]))\n
\n soup wrap:

In a single line:

np.average(a.reshape(48, -1), weights=b.ravel()), axis=1)

You can test it with:

a = np.random.rand(48, 90, 144)
b = np.random.rand(90,144)
np.testing.assert_almost_equal(np.average(a.reshape(48, -1),
                                          weights=b.ravel(), axis=1),
                               np.array([np.average(a[i],
                                                    weights=b) for i in range(48)]))
qid & accept id: (14536778, 14536813) query: Date formate conversion in Python soup:

The isoformat() method gives you that:

\n
d.isoformat(' ')\n
\n

but might include microsends; you could use the .strftime() method for some more control:

\n
d.strftime('%Y-%m-%d %H:%M:%S')\n
\n

Output

\n
>>> import datetime\n>>> d = datetime.datetime.today()\n>>> d.isoformat(' ')\n'2013-01-26 13:12:08.628580'\n>>> d.strftime('%Y-%m-%d %H:%M:%S')\n'2013-01-26 13:12:08'\n
\n soup wrap:

The isoformat() method gives you that:

d.isoformat(' ')

but might include microsends; you could use the .strftime() method for some more control:

d.strftime('%Y-%m-%d %H:%M:%S')

Output

>>> import datetime
>>> d = datetime.datetime.today()
>>> d.isoformat(' ')
'2013-01-26 13:12:08.628580'
>>> d.strftime('%Y-%m-%d %H:%M:%S')
'2013-01-26 13:12:08'
qid & accept id: (14555771, 14555862) query: Order a NXM Numpy Array according to cumulative sums of each sub-array soup:

Assuming that by cumulative sum you mean total (there's a cumulative sum function which returns something else), then you can do this both using the standard sort:

\n
>>> v = [[1,2,3,4], [2,3,4,5], [11,21,3,4], [4,33,21,1], [2,4,6,5]]\n>>> sorted(v, key=sum, reverse=True)\n[[4, 33, 21, 1], [11, 21, 3, 4], [2, 4, 6, 5], [2, 3, 4, 5], [1, 2, 3, 4]]\n
\n

and in numpy using argsort:

\n
>>> a = np.array(v)\n>>> a.sum(axis=1)\narray([10, 14, 39, 59, 17])\n>>> a.sum(axis=1).argsort()\narray([0, 1, 4, 2, 3])\n>>> a[a.sum(axis=1).argsort()[::-1]]\narray([[ 4, 33, 21,  1],\n       [11, 21,  3,  4],\n       [ 2,  4,  6,  5],\n       [ 2,  3,  4,  5],\n       [ 1,  2,  3,  4]])\n
\n

But I may be misunderstanding you.

\n soup wrap:

Assuming that by cumulative sum you mean total (there's a cumulative sum function which returns something else), then you can do this both using the standard sort:

>>> v = [[1,2,3,4], [2,3,4,5], [11,21,3,4], [4,33,21,1], [2,4,6,5]]
>>> sorted(v, key=sum, reverse=True)
[[4, 33, 21, 1], [11, 21, 3, 4], [2, 4, 6, 5], [2, 3, 4, 5], [1, 2, 3, 4]]

and in numpy using argsort:

>>> a = np.array(v)
>>> a.sum(axis=1)
array([10, 14, 39, 59, 17])
>>> a.sum(axis=1).argsort()
array([0, 1, 4, 2, 3])
>>> a[a.sum(axis=1).argsort()[::-1]]
array([[ 4, 33, 21,  1],
       [11, 21,  3,  4],
       [ 2,  4,  6,  5],
       [ 2,  3,  4,  5],
       [ 1,  2,  3,  4]])

But I may be misunderstanding you.

qid & accept id: (14572495, 14572525) query: Creating an iterable of dictionaries from an iterable of tuples soup:

How about:

\n
In [8]: [{'location':l, 'name':n, 'value':v} for (n, l, v) in all_values]\nOut[8]: \n[{'location': 0, 'name': 'a', 'value': 0.1},\n {'location': 1, 'name': 'b', 'value': 0.5},\n {'location': 2, 'name': 'c', 'value': 1.0}]\n
\n

or, if you prefer a more general solution:

\n
In [12]: keys = ('name', 'location', 'value')\n\nIn [13]: [dict(zip(keys, values)) for values in all_values]\nOut[13]: \n[{'location': 0, 'name': 'a', 'value': 0.1},\n {'location': 1, 'name': 'b', 'value': 0.5},\n {'location': 2, 'name': 'c', 'value': 1.0}]\n
\n soup wrap:

How about:

In [8]: [{'location':l, 'name':n, 'value':v} for (n, l, v) in all_values]
Out[8]: 
[{'location': 0, 'name': 'a', 'value': 0.1},
 {'location': 1, 'name': 'b', 'value': 0.5},
 {'location': 2, 'name': 'c', 'value': 1.0}]

or, if you prefer a more general solution:

In [12]: keys = ('name', 'location', 'value')

In [13]: [dict(zip(keys, values)) for values in all_values]
Out[13]: 
[{'location': 0, 'name': 'a', 'value': 0.1},
 {'location': 1, 'name': 'b', 'value': 0.5},
 {'location': 2, 'name': 'c', 'value': 1.0}]
qid & accept id: (14582852, 14583014) query: Named dictionary in python soup:

You should be adding the dictionary into an array like :

\n
friends = []\nfor message in messages:\n  dict = {"message" : message.message, "phone" : message.phone }\n  friends.append(dict)\n
\n

to loop again, you can do it this way :

\n
for friend in friends:\n  print "%s - %s" % (friend["message"], friend["phone"])\n
\n soup wrap:

You should be adding the dictionary into an array like :

friends = []
for message in messages:
  dict = {"message" : message.message, "phone" : message.phone }
  friends.append(dict)

to loop again, you can do it this way :

for friend in friends:
  print "%s - %s" % (friend["message"], friend["phone"])
qid & accept id: (14583576, 14586194) query: Sorting panda DataFrames based on criteria soup:

I'm not sure I understand your question correctly; maybe this one below works?

\n
data['Cat1'][data['Counter'].rank(ascending=0) - 1]\n
\n

--EDIT--

\n

As in the comment, my solution would be

\n
data['ranking'] = data.groupby('Cat1')['Counter'].rank(ascending=0)\n
\n

I can't think of anything else, sorry. Maybe others will have a different perspective..

\n soup wrap:

I'm not sure I understand your question correctly; maybe this one below works?

data['Cat1'][data['Counter'].rank(ascending=0) - 1]

--EDIT--

As in the comment, my solution would be

data['ranking'] = data.groupby('Cat1')['Counter'].rank(ascending=0)

I can't think of anything else, sorry. Maybe others will have a different perspective..

qid & accept id: (14601544, 14614040) query: Plone - Override Zope Schema fields soup:

Since the author(me) didn't make it simple to override this, it'll be a little complicated to do your override. The following steps are what you'll need to do. Warning, it's all pseudo code though so you'll need to tweak perhaps to make it work for you.

\n

First, provide your customized interface by extending the interface you want to customize:

\n
class IEnhancedDocumentViewerSchema(IGlobalDocumentViewerSettings):\n    """ \n    Use all the fields from the default schema, and add various extra fields.\n    """\n\n    folder_location = schema.TextLine(\n        title=u"Default folder location",\n        description=u'This folder will be created in the Plone root folder. '\n                    u'Plone client must have write access to directory.',\n        default=u"files_folder")\n
\n

Then, create the settings adapter that is used to store and retrieve the settings of the schema::

\n
from collective.documentviewer.settings import Base\nclass CustomSettings(Base):\n    implements(IEnhancedDocumentViewerSchema)\n    use_interface = IEnhancedDocumentViewerSchema\n
\n

Then, register your adapter::

\n
\n
\n

Then, create a form using your custom schema::

\n
from z3c.form import field\nfrom plone.app.z3cform.layout import wrap_form\nfrom collective.documentviewer.views import GlobalSettingsForm\nclass CustomGlobalSettingsForm(GlobalSettingsForm):\n    fields = field.Fields(IEnhancedDocumentViewerSchema)\nCustomGlobalSettingsFormView = wrap_form(CustomGlobalSettingsForm)\n
\n

Then, create a customization layer for your product, extending the documentviewer layer. This will require 2 steps. First, add the layer interface::

\n
from collective.documentviewer.interfaces import ILayer as IDocumentViewerLayer\nclass ICustomLayer(IDocumentViewerLayer):\n    """\n    custom layer class\n    """\n
\n

And register your layer with generic setup. Add the the xml file, browserlayer.xml, to your profile with the following contents(make sure to reinstall the product so the layer gets registered)::

\n
\n\n    \n\n
\n

Finally, override the global settings view with your custom form just for the layer you have registered for your product::

\n
\n
\n

Wow, that was way too difficult.

\n soup wrap:

Since the author(me) didn't make it simple to override this, it'll be a little complicated to do your override. The following steps are what you'll need to do. Warning, it's all pseudo code though so you'll need to tweak perhaps to make it work for you.

First, provide your customized interface by extending the interface you want to customize:

class IEnhancedDocumentViewerSchema(IGlobalDocumentViewerSettings):
    """ 
    Use all the fields from the default schema, and add various extra fields.
    """

    folder_location = schema.TextLine(
        title=u"Default folder location",
        description=u'This folder will be created in the Plone root folder. '
                    u'Plone client must have write access to directory.',
        default=u"files_folder")

Then, create the settings adapter that is used to store and retrieve the settings of the schema::

from collective.documentviewer.settings import Base
class CustomSettings(Base):
    implements(IEnhancedDocumentViewerSchema)
    use_interface = IEnhancedDocumentViewerSchema

Then, register your adapter::


Then, create a form using your custom schema::

from z3c.form import field
from plone.app.z3cform.layout import wrap_form
from collective.documentviewer.views import GlobalSettingsForm
class CustomGlobalSettingsForm(GlobalSettingsForm):
    fields = field.Fields(IEnhancedDocumentViewerSchema)
CustomGlobalSettingsFormView = wrap_form(CustomGlobalSettingsForm)

Then, create a customization layer for your product, extending the documentviewer layer. This will require 2 steps. First, add the layer interface::

from collective.documentviewer.interfaces import ILayer as IDocumentViewerLayer
class ICustomLayer(IDocumentViewerLayer):
    """
    custom layer class
    """

And register your layer with generic setup. Add the the xml file, browserlayer.xml, to your profile with the following contents(make sure to reinstall the product so the layer gets registered)::



    

Finally, override the global settings view with your custom form just for the layer you have registered for your product::


Wow, that was way too difficult.

qid & accept id: (14622698, 24932178) query: Customize sphinxdoc theme soup:

All I wanted is to add ReST strikethrough in my sphinx doc. Here is how I did it:

\n
$ cd my-sphinx-dir\n$ mkdir -p theme/static\n$ touch theme/theme.conf\n$ touch theme/static/style.css\n
\n

In theme/theme.conf:

\n
[theme]\ninherit = default\nstylesheet = style.css\npygments_style = pygments.css\n
\n

(this makes it look like the default theme (l. 2))

\n

In theme/static/style.css:

\n
@import url("default.css"); /* make sure to sync this with the base theme's css filename */\n\n.strike {\n    text-decoration: line-through;\n}\n
\n

Then, in your conf.py:

\n
html_theme = 'theme' # use the theme in subdir 'theme'\nhtml_theme_path = ['.'] # make sphinx search for themes in current dir\n
\n

More here: http://sphinx-doc.org/theming.html.

\n

(Optional) In global.rst:

\n
.. role:: strike\n   :class: strike\n
\n

and in a example.rst:

\n
.. include:: global.rst\n\n:strike:`This looks like it is outdated.`\n
\n soup wrap:

All I wanted is to add ReST strikethrough in my sphinx doc. Here is how I did it:

$ cd my-sphinx-dir
$ mkdir -p theme/static
$ touch theme/theme.conf
$ touch theme/static/style.css

In theme/theme.conf:

[theme]
inherit = default
stylesheet = style.css
pygments_style = pygments.css

(this makes it look like the default theme (l. 2))

In theme/static/style.css:

@import url("default.css"); /* make sure to sync this with the base theme's css filename */

.strike {
    text-decoration: line-through;
}

Then, in your conf.py:

html_theme = 'theme' # use the theme in subdir 'theme'
html_theme_path = ['.'] # make sphinx search for themes in current dir

More here: http://sphinx-doc.org/theming.html.

(Optional) In global.rst:

.. role:: strike
   :class: strike

and in a example.rst:

.. include:: global.rst

:strike:`This looks like it is outdated.`
qid & accept id: (14677341, 14758726) query: Writing to multiple files with Scrapy soup:

I ended up using command line arguments for the author scraper:

\n
class AuthorSpider(BaseSpider):\n    ...\n\n    def __init__(self, articles):\n        self.start_urls = []\n\n        for line in articles:\n            article = json.loads(line)\n            self.start_urls.append(data['author_url'])\n
\n

Then, I added the duplicates pipeline outlined in the Scrapy documentation:

\n
from scrapy import signals\nfrom scrapy.exceptions import DropItem\n\nclass DuplicatesPipeline(object):\n    def __init__(self):\n        self.ids_seen = set()\n\n    def process_item(self, item, spider):\n        if item['id'] in self.ids_seen:\n            raise DropItem("Duplicate item found: %s" % item)\n        else:\n            self.ids_seen.add(item['id'])\n            return item\n
\n

Finally, I passed the article JSON lines file into the command:

\n
$ scrapy crawl authors -o authors.json -a articles=articles.json\n
\n

It's not a great solution, but it works.

\n soup wrap:

I ended up using command line arguments for the author scraper:

class AuthorSpider(BaseSpider):
    ...

    def __init__(self, articles):
        self.start_urls = []

        for line in articles:
            article = json.loads(line)
            self.start_urls.append(data['author_url'])

Then, I added the duplicates pipeline outlined in the Scrapy documentation:

from scrapy import signals
from scrapy.exceptions import DropItem

class DuplicatesPipeline(object):
    def __init__(self):
        self.ids_seen = set()

    def process_item(self, item, spider):
        if item['id'] in self.ids_seen:
            raise DropItem("Duplicate item found: %s" % item)
        else:
            self.ids_seen.add(item['id'])
            return item

Finally, I passed the article JSON lines file into the command:

$ scrapy crawl authors -o authors.json -a articles=articles.json

It's not a great solution, but it works.

qid & accept id: (14686212, 14686233) query: SQLite Python printing in rows? soup:

Use the itertools.groupby() tool:

\n
from itertools import groupby\n\nfor letter, rows in groupby(cur, key=lambda r: r[0][0]):\n    print ' '.join([r[0] for r in rows])\n
\n

The groupby() function loops over each row in cur, take the first letter of the first column, and give you a tuples with each (letter, rows) values. The rows value is another iterable, you can loop over that (with a for loop, for example) to list all rows that have that first letter.

\n

This does rely on the rows being sorted already. If your rows alternate between first letters:

\n
A1\nA2\nB1\nB2\nA3\nA4\n
\n

it'll print those as separate groups:

\n
A1 A2\nB1 B2\nA3 A4\n
\n

You may want to add a ORDER BY firstcolumnname ordering instruction to your query to ensure correct grouping.

\n

This is what I see when I create a test db:

\n
>>> cur.execute("SELECT * FROM seats ORDER BY code")\n\n>>> for letter, rows in groupby(cur, key=lambda r: r[0][0]):\n...     print ' '.join([r[0] for r in rows])\n... \nA1 A2 A3 A4 A5 A6 A7 A8\nB1 B2 B3 B4 B5 B6 B7 B8\nC1 C2 C3 C4 C5 C6 C7 C8\n
\n soup wrap:

Use the itertools.groupby() tool:

from itertools import groupby

for letter, rows in groupby(cur, key=lambda r: r[0][0]):
    print ' '.join([r[0] for r in rows])

The groupby() function loops over each row in cur, take the first letter of the first column, and give you a tuples with each (letter, rows) values. The rows value is another iterable, you can loop over that (with a for loop, for example) to list all rows that have that first letter.

This does rely on the rows being sorted already. If your rows alternate between first letters:

A1
A2
B1
B2
A3
A4

it'll print those as separate groups:

A1 A2
B1 B2
A3 A4

You may want to add a ORDER BY firstcolumnname ordering instruction to your query to ensure correct grouping.

This is what I see when I create a test db:

>>> cur.execute("SELECT * FROM seats ORDER BY code")

>>> for letter, rows in groupby(cur, key=lambda r: r[0][0]):
...     print ' '.join([r[0] for r in rows])
... 
A1 A2 A3 A4 A5 A6 A7 A8
B1 B2 B3 B4 B5 B6 B7 B8
C1 C2 C3 C4 C5 C6 C7 C8
qid & accept id: (14692029, 14692100) query: Find all combinations of letters, selecting each letter from a different key in a dictionary soup:

Use itertools.product():

\n
for combo in itertools.product(self.data1, self.data2, self.data3, self.data4):\n    # combo is a tuple of 4 characters.\n
\n

or:

\n
for combo in itertools.product(*[d[k] for k in sorted(d.keys())]):\n    # combo is a tuple of 4 characters.\n
\n

Demo:

\n
>>> import itertools                                                                                                                >>> d = {'1': ['a', 'd', 'e', 'l', 's'], '2': ['s', 'i', 'r', 't', 'n'], '3': ['b', 'o', 'e', 'm', 'k'], '4': ['f', 'y', 'u', 'n', 'g'] }\n>>> for combo in itertools.product(*[d[k] for k in sorted(d.keys())]):\n...     print ''.join(combo)\n... \nasbf\nasby\nasbu\nasbn\nasbg\nasof\nasoy\nasou\nason\nasog\nasef\n\n...\n\nsnkf\nsnky\nsnku\nsnkn\nsnkg\n
\n soup wrap:

Use itertools.product():

for combo in itertools.product(self.data1, self.data2, self.data3, self.data4):
    # combo is a tuple of 4 characters.

or:

for combo in itertools.product(*[d[k] for k in sorted(d.keys())]):
    # combo is a tuple of 4 characters.

Demo:

>>> import itertools                                                                                                                >>> d = {'1': ['a', 'd', 'e', 'l', 's'], '2': ['s', 'i', 'r', 't', 'n'], '3': ['b', 'o', 'e', 'm', 'k'], '4': ['f', 'y', 'u', 'n', 'g'] }
>>> for combo in itertools.product(*[d[k] for k in sorted(d.keys())]):
...     print ''.join(combo)
... 
asbf
asby
asbu
asbn
asbg
asof
asoy
asou
ason
asog
asef

...

snkf
snky
snku
snkn
snkg
qid & accept id: (14710221, 14710446) query: python matplotlib dash-dot-dot - how to? soup:

You can define custom dashes:

\n
import matplotlib.pyplot as plt\n\nline, = plt.plot([1,5,2,4], '-')\nline.set_dashes([8, 4, 2, 4, 2, 4]) \nplt.show()\n
\n

enter image description here

\n

[8, 4, 2, 4, 2, 4] means

\n
    \n
  • 8 points on, (dash)
  • \n
  • 4 points off,
  • \n
  • 2 points on, (dot)
  • \n
  • 4 points off,
  • \n
  • 2 points on, (dot)
  • \n
  • 4 points off.
  • \n
\n
\n

@Achim noted you can also specify the dashes parameter:

\n
plt.plot([1,5,2,4], '-', dashes=[8, 4, 2, 4, 2, 4])\nplt.show()\n
\n

produces the same result shown above.

\n soup wrap:

You can define custom dashes:

import matplotlib.pyplot as plt

line, = plt.plot([1,5,2,4], '-')
line.set_dashes([8, 4, 2, 4, 2, 4]) 
plt.show()

enter image description here

[8, 4, 2, 4, 2, 4] means

  • 8 points on, (dash)
  • 4 points off,
  • 2 points on, (dot)
  • 4 points off,
  • 2 points on, (dot)
  • 4 points off.

@Achim noted you can also specify the dashes parameter:

plt.plot([1,5,2,4], '-', dashes=[8, 4, 2, 4, 2, 4])
plt.show()

produces the same result shown above.

qid & accept id: (14711669, 14711907) query: Create list of tuples (in a more elegant way) soup:

Python comes with batteries included! If you need to read csv files, just use the csv module:

\n
import sys, csv\n\nwith open(sys.argv[1]) as f:\n    lst = list(csv.reader(f))\n
\n

Note that this creates a list of lists, if you want tuples for some reason, then

\n
with open(sys.argv[1]) as f:\n    lst = [tuple(row) for row in csv.reader(f)]\n
\n soup wrap:

Python comes with batteries included! If you need to read csv files, just use the csv module:

import sys, csv

with open(sys.argv[1]) as f:
    lst = list(csv.reader(f))

Note that this creates a list of lists, if you want tuples for some reason, then

with open(sys.argv[1]) as f:
    lst = [tuple(row) for row in csv.reader(f)]
qid & accept id: (14720912, 14720959) query: Splitting Strings in Python with Separator variable soup:

The regex solution (to me) seems like it would be pretty easy:

\n
import re\ndef split_string(source,separators):\n    return re.split('[{0}]'.format(re.escape(separators)),source)\n
\n

example:

\n
>>> import re\n>>> def split_string(source,separators):\n...     return re.split('[{0}]'.format(re.escape(separators)),source)\n... \n>>> split_string("the;foo: went to the store",':;')\n['the', 'foo', ' went to the store']\n
\n

The reason for using a regex here is in the event that you don't want to have ' ' in your separators, this will still work ...

\n
\n

An alternative (which I think I prefer), where you could have multi-character separators is:

\n
def split_string(source,separators):\n    return re.split('|'.join(re.escape(x) for x in separators),source)\n
\n

In this case, the multi-character separators things get passed in as some sort of non-string iterable (e.g. a tuple or a list), but single character separators can still be passed in as a single string.

\n
>>> def split_string(source,separators):\n...     return re.split('|'.join(re.escape(x) for x in separators),source)\n... \n>>> split_string("the;foo: went to the store",':;')\n['the', 'foo', ' went to the store']\n>>> split_string("the;foo: went to the store",['foo','st'])\n['the;', ': went to the ', 'ore']\n
\n
\n

Or, finally, if you want to split on consecutive runs of separators as well:

\n
def split_string(source,separators):\n    return re.split('(?:'+'|'.join(re.escape(x) for x in separators)+')+',source)\n
\n

which gives:

\n
>>> split_string("Before the rain ... there was lightning and thunder.", " .")\n['Before', 'the', 'rain', 'there', 'was', 'lightning', 'and', 'thunder', '']\n
\n soup wrap:

The regex solution (to me) seems like it would be pretty easy:

import re
def split_string(source,separators):
    return re.split('[{0}]'.format(re.escape(separators)),source)

example:

>>> import re
>>> def split_string(source,separators):
...     return re.split('[{0}]'.format(re.escape(separators)),source)
... 
>>> split_string("the;foo: went to the store",':;')
['the', 'foo', ' went to the store']

The reason for using a regex here is in the event that you don't want to have ' ' in your separators, this will still work ...


An alternative (which I think I prefer), where you could have multi-character separators is:

def split_string(source,separators):
    return re.split('|'.join(re.escape(x) for x in separators),source)

In this case, the multi-character separators things get passed in as some sort of non-string iterable (e.g. a tuple or a list), but single character separators can still be passed in as a single string.

>>> def split_string(source,separators):
...     return re.split('|'.join(re.escape(x) for x in separators),source)
... 
>>> split_string("the;foo: went to the store",':;')
['the', 'foo', ' went to the store']
>>> split_string("the;foo: went to the store",['foo','st'])
['the;', ': went to the ', 'ore']

Or, finally, if you want to split on consecutive runs of separators as well:

def split_string(source,separators):
    return re.split('(?:'+'|'.join(re.escape(x) for x in separators)+')+',source)

which gives:

>>> split_string("Before the rain ... there was lightning and thunder.", " .")
['Before', 'the', 'rain', 'there', 'was', 'lightning', 'and', 'thunder', '']
qid & accept id: (14783438, 14783772) query: jinja2 print to console or logging soup:

I think you can achieve it using filters (http://jinja.pocoo.org/docs/api/#custom-filters) or extensions (http://jinja.pocoo.org/docs/extensions/#adding-extensions). The idea is to just print the filter or extension straight to console.

\n

Not tested but the filter should be something like:

\n
def debug(text):\n  print text\n  return ''\n\nenvironment.filters['debug']=debug\n
\n

To be used as:

\n
...

Hello world!

{{"debug text!"|debug}}...\n
\n

Remember to remove the debug on production code!

\n soup wrap:

I think you can achieve it using filters (http://jinja.pocoo.org/docs/api/#custom-filters) or extensions (http://jinja.pocoo.org/docs/extensions/#adding-extensions). The idea is to just print the filter or extension straight to console.

Not tested but the filter should be something like:

def debug(text):
  print text
  return ''

environment.filters['debug']=debug

To be used as:

...

Hello world!

{{"debug text!"|debug}}...

Remember to remove the debug on production code!

qid & accept id: (14799223, 14799487) query: Get list from server and print each element surrounded by span tag soup:

The best way to send and proces a list or Python object with JavaScript is to send JSON.\nYou can use json.dumps in your python code

\n
import json\n....\n\nmy_list = ['one', 'two', 'three', 'four']\nself.response.write(json.dumps(my_list)\n
\n

Now you receive a JSON string. But using jQuery it will give you a JavaScript object. See jQuery for the details: http://api.jquery.com/jQuery.getJSON/

\n

If you don not use JSON, you can use a string to send the list :

\n
my_list = ['one', 'two', 'three', 'four']\nself.response.write(','.join(my_list))\n
\n soup wrap:

The best way to send and proces a list or Python object with JavaScript is to send JSON. You can use json.dumps in your python code

import json
....

my_list = ['one', 'two', 'three', 'four']
self.response.write(json.dumps(my_list)

Now you receive a JSON string. But using jQuery it will give you a JavaScript object. See jQuery for the details: http://api.jquery.com/jQuery.getJSON/

If you don not use JSON, you can use a string to send the list :

my_list = ['one', 'two', 'three', 'four']
self.response.write(','.join(my_list))
qid & accept id: (14808945, 14809149) query: check if variable is dataframe soup:

isinstance, nothing else.

\n

PEP8 says explicitly that isinstance is the preferred way to check types

\n
Yes: if isinstance(obj, int):\nNo:  if type(obj) is type(1):\n
\n

And don't even think about

\n
if obj.__class__.__name__ = "MyInheritedClass":\n    expect_problems_some_day()\n
\n

isinstance handles inheritance (see Differences between isinstance() and type() in python). For example, it will tell you if a variable is a string (either str or unicode), because they derive from basestring)

\n
if isinstance(obj, basestring):\n    i_am_string(obj)\n
\n soup wrap:

isinstance, nothing else.

PEP8 says explicitly that isinstance is the preferred way to check types

Yes: if isinstance(obj, int):
No:  if type(obj) is type(1):

And don't even think about

if obj.__class__.__name__ = "MyInheritedClass":
    expect_problems_some_day()

isinstance handles inheritance (see Differences between isinstance() and type() in python). For example, it will tell you if a variable is a string (either str or unicode), because they derive from basestring)

if isinstance(obj, basestring):
    i_am_string(obj)
qid & accept id: (14812510, 14812819) query: Python - Print a value without intterupting a loop/function soup:

I'm not sure how or if this would work on windows, but under unix, you could do something like:

\n
import signal\nimport sys\n\ncount = 0\n\ndef handler(signum,frame):\n    global count\n    print "Value of 'i' is",i\n    count += 1\n    if count >= 2:\n        sys.exit(0)\n\nsignal.signal(signal.SIGINT,handler)\ni = 0\nwhile True:\n    i += 1\n
\n

Here I exit the program after ctrl-C is caught (twice) because I don't have any other good way of exiting the program.

\n

The use of global data here is merely for demonstration purposes -- Out in the wild, I'd use a class and pass an instance-method to the signal handler in order to maintain state between calls. -- e.g., something like:

\n
import signal\n\nclass Reporter(object):\n    def __init__(self):\n        self.retval = []\n\n    def handler(self,signum,frame):\n        print self.retval\n\nr = Reporter()\nsignal.signal(signal.SIGINT,r.handler)\n
\n

And then you can just re-bind self.retval whenever you enter/exit your function:

\n
import os\n\ndef search(path,filename):\n    global found\n    folders = []\n    retval = []\n    r.retval = retval #<--- Line added\n\n    try:    \n        for item in os.listdir(path):\n            if not os.path.isfile(os.path.join(path, item)):\n                folders.append(os.path.join(path, item))\n            else:\n                if item == filename:\n                    found += 1\n                    retval.append(os.path.join(path, item))\n    except WindowsError,e:\n        print str(e)[10:]\n\n    for folder in folders:\n        retval += search(folder,filename)\n        r.retval = retval   #<---- Line added\n    return retval\n\nfound = 0\npath = 'C:\\'\nfilename = 'test.txt'\nprint search(path,filename)\n
\n

I think I would prefer a non-recursive solution using os.walk however:

\n
import os\n\ndef search(path,filename):\n    retval = []\n    r.retval = retval\n\n    for (dirpath, dirnames, filenames) in os.walk(path):\n         retval.extend(os.path.join(path,dirpath,item) for item in filenames if item == filename)\n    return retval\n\npath = 'C:\\'\nfilename = 'test.txt'\nprint search(path,filename)\n
\n soup wrap:

I'm not sure how or if this would work on windows, but under unix, you could do something like:

import signal
import sys

count = 0

def handler(signum,frame):
    global count
    print "Value of 'i' is",i
    count += 1
    if count >= 2:
        sys.exit(0)

signal.signal(signal.SIGINT,handler)
i = 0
while True:
    i += 1

Here I exit the program after ctrl-C is caught (twice) because I don't have any other good way of exiting the program.

The use of global data here is merely for demonstration purposes -- Out in the wild, I'd use a class and pass an instance-method to the signal handler in order to maintain state between calls. -- e.g., something like:

import signal

class Reporter(object):
    def __init__(self):
        self.retval = []

    def handler(self,signum,frame):
        print self.retval

r = Reporter()
signal.signal(signal.SIGINT,r.handler)

And then you can just re-bind self.retval whenever you enter/exit your function:

import os

def search(path,filename):
    global found
    folders = []
    retval = []
    r.retval = retval #<--- Line added

    try:    
        for item in os.listdir(path):
            if not os.path.isfile(os.path.join(path, item)):
                folders.append(os.path.join(path, item))
            else:
                if item == filename:
                    found += 1
                    retval.append(os.path.join(path, item))
    except WindowsError,e:
        print str(e)[10:]

    for folder in folders:
        retval += search(folder,filename)
        r.retval = retval   #<---- Line added
    return retval

found = 0
path = 'C:\\'
filename = 'test.txt'
print search(path,filename)

I think I would prefer a non-recursive solution using os.walk however:

import os

def search(path,filename):
    retval = []
    r.retval = retval

    for (dirpath, dirnames, filenames) in os.walk(path):
         retval.extend(os.path.join(path,dirpath,item) for item in filenames if item == filename)
    return retval

path = 'C:\\'
filename = 'test.txt'
print search(path,filename)
qid & accept id: (14815365, 14819086) query: Spearman rank correlation in Python with ties soup:

scipy.stats.spearmanr will take care of computing the ranks for you, you simply have to give it the data in the correct order:

\n
>>> scipy.stats.spearmanr([0.3, 0.2, 0.2], [0.5, 0.6, 0.4])\n(0.0, 1.0)\n
\n

If you have the ranked data, you can call scipy.stats.pearsonr on it to get the same result. And as the examples below show, either of the ways you have tried will work, although I think [1, 2.5, 2.5] is more common. Also, scipy uses zero-based indexing, so the ranks internally used will be more like [0, 1.5, 1.5]:

\n
>>> scipy.stats.pearsonr([1, 2, 2], [2, 1, 3])\n(0.0, 1.0)\n>>> scipy.stats.pearsonr([1, 2.5, 2.5], [2, 1, 3])\n(0.0, 1.0)\n
\n soup wrap:

scipy.stats.spearmanr will take care of computing the ranks for you, you simply have to give it the data in the correct order:

>>> scipy.stats.spearmanr([0.3, 0.2, 0.2], [0.5, 0.6, 0.4])
(0.0, 1.0)

If you have the ranked data, you can call scipy.stats.pearsonr on it to get the same result. And as the examples below show, either of the ways you have tried will work, although I think [1, 2.5, 2.5] is more common. Also, scipy uses zero-based indexing, so the ranks internally used will be more like [0, 1.5, 1.5]:

>>> scipy.stats.pearsonr([1, 2, 2], [2, 1, 3])
(0.0, 1.0)
>>> scipy.stats.pearsonr([1, 2.5, 2.5], [2, 1, 3])
(0.0, 1.0)
qid & accept id: (14837231, 14838463) query: Converting list of dictionaries to unique list of dictionaries soup:

Here is a way:

\n
>>> from collections import defaultdict\n>>> def  combine(item):\n    # Easy return if not a list: element itself\n    if type(item) != type([]):\n        return item\n    # else call recursion\n    first_ret = [(i.items()[0][0], combine(i.items()[0][1])) for i in item]\n\n    # Here we group by same keys if any ('ROOT', for instance)\n    count_keys = defaultdict(list)\n    for couple in first_ret:\n        count_keys[couple[0]].append(couple[1])\n    return dict((k, v if len(v) > 1 else v[0]) for k, v in count_keys.iteritems())\n
\n

I had to group the ROOT nodes, but it seems to be working:

\n
>>> pprint(combine(l))\n{'ROOT': [{'SecondElem': '5.0.0.1',\n           'ThirdElem': '127.3.15.1',\n           'firstElem': 'gc-3/1/0',\n           'function': 'session',\n           'hw': '0.0.0.0',\n           'index': 16,\n           'resources': {'cpu-info': {'cpu-avg-load': 1,\n                                      'cpu-peak-load': 1},\n                         'memory-total': 1,\n                         'memory-used': 2},\n           'sw': '1.50.1.3'},\n          {'SecondElem': '5.0.0.2',\n           'ThirdElem': '127.3.4.1',\n           'firstElem': 'gc-4/1/0',\n           'function': 'stand',\n           'hw': '0.0.0.0',\n           'index': 5,\n           'resources': {'cpu-info': {'cpu-avg-load': 1,\n                                      'cpu-peak-load': 1},\n                         'memory-total': 1,\n                         'memory-used': 2},\n           'sw': '1.50.1.3'}]}\n>>> \n
\n soup wrap:

Here is a way:

>>> from collections import defaultdict
>>> def  combine(item):
    # Easy return if not a list: element itself
    if type(item) != type([]):
        return item
    # else call recursion
    first_ret = [(i.items()[0][0], combine(i.items()[0][1])) for i in item]

    # Here we group by same keys if any ('ROOT', for instance)
    count_keys = defaultdict(list)
    for couple in first_ret:
        count_keys[couple[0]].append(couple[1])
    return dict((k, v if len(v) > 1 else v[0]) for k, v in count_keys.iteritems())

I had to group the ROOT nodes, but it seems to be working:

>>> pprint(combine(l))
{'ROOT': [{'SecondElem': '5.0.0.1',
           'ThirdElem': '127.3.15.1',
           'firstElem': 'gc-3/1/0',
           'function': 'session',
           'hw': '0.0.0.0',
           'index': 16,
           'resources': {'cpu-info': {'cpu-avg-load': 1,
                                      'cpu-peak-load': 1},
                         'memory-total': 1,
                         'memory-used': 2},
           'sw': '1.50.1.3'},
          {'SecondElem': '5.0.0.2',
           'ThirdElem': '127.3.4.1',
           'firstElem': 'gc-4/1/0',
           'function': 'stand',
           'hw': '0.0.0.0',
           'index': 5,
           'resources': {'cpu-info': {'cpu-avg-load': 1,
                                      'cpu-peak-load': 1},
                         'memory-total': 1,
                         'memory-used': 2},
           'sw': '1.50.1.3'}]}
>>> 
qid & accept id: (14856385, 14856450) query: splitting string in Python (2.7) soup:

A regular expression to match those would be:

\n
r'\(\s*passengers:\s*(\d{1,3}|\?)\s+ crew:\s*(\d{1,3}|\?)\s*\)'\n
\n

with some extra whitespace tolerance thrown in.

\n

Results:

\n
>>> import re\n>>> numbers = re.compile(r'\(\s*passengers:\s*(\d{1,3}|\?)\s+ crew:\s*(\d{1,3}|\?)\s*\)')\n>>> numbers.search('26   (passengers:22  crew:4)').groups()\n('22', '4')\n>>> numbers.search('32   (passengers:?  crew: ?)').groups()\n('?', '?')\n
\n soup wrap:

A regular expression to match those would be:

r'\(\s*passengers:\s*(\d{1,3}|\?)\s+ crew:\s*(\d{1,3}|\?)\s*\)'

with some extra whitespace tolerance thrown in.

Results:

>>> import re
>>> numbers = re.compile(r'\(\s*passengers:\s*(\d{1,3}|\?)\s+ crew:\s*(\d{1,3}|\?)\s*\)')
>>> numbers.search('26   (passengers:22  crew:4)').groups()
('22', '4')
>>> numbers.search('32   (passengers:?  crew: ?)').groups()
('?', '?')
qid & accept id: (14873181, 14873300) query: Inheriting context variables inside custom template tags soup:

Using simple_tag

\n

Using simple_tag, just set takes_context=True:

\n
@register.simple_tag(takes_context=True)\ndef current_time(context, format_string):\n    timezone = context['timezone']\n    return your_get_current_time_method(timezone, format_string)\n
\n

Using a custom template tag

\n

Just use template.Variable.resolve(), ie.

\n
foo = template.Variable('some_var').resolve(context)\n
\n

See passing variables to the templatetag:

\n
\n

To use the Variable class, simply instantiate it with the name of the\n variable to be resolved, and then call variable.resolve(context). So,\n for example:

\n
class FormatTimeNode(template.Node):\n    def __init__(self, date_to_be_formatted, format_string):\n        self.date_to_be_formatted = template.Variable(date_to_be_formatted)\n        self.format_string = format_string\n\n    def render(self, context):\n        try:\n            actual_date = self.date_to_be_formatted.resolve(context)\n            return actual_date.strftime(self.format_string)\n        except template.VariableDoesNotExist:\n            return ''\n
\n

Variable resolution will throw a VariableDoesNotExist exception if it cannot resolve the string passed\n to it in the current context of the page.

\n
\n

Might be useful too: setting a variable in the context.

\n soup wrap:

Using simple_tag

Using simple_tag, just set takes_context=True:

@register.simple_tag(takes_context=True)
def current_time(context, format_string):
    timezone = context['timezone']
    return your_get_current_time_method(timezone, format_string)

Using a custom template tag

Just use template.Variable.resolve(), ie.

foo = template.Variable('some_var').resolve(context)

See passing variables to the templatetag:

To use the Variable class, simply instantiate it with the name of the variable to be resolved, and then call variable.resolve(context). So, for example:

class FormatTimeNode(template.Node):
    def __init__(self, date_to_be_formatted, format_string):
        self.date_to_be_formatted = template.Variable(date_to_be_formatted)
        self.format_string = format_string

    def render(self, context):
        try:
            actual_date = self.date_to_be_formatted.resolve(context)
            return actual_date.strftime(self.format_string)
        except template.VariableDoesNotExist:
            return ''

Variable resolution will throw a VariableDoesNotExist exception if it cannot resolve the string passed to it in the current context of the page.

Might be useful too: setting a variable in the context.

qid & accept id: (14891190, 14891450) query: How can display the lines from linux log file in browser soup:

You can use built-in template tags:

\n
{{ lines|linebreaks }}\n
\n

or

\n
{{ lines|linebreaksbr }}\n
\n soup wrap:

You can use built-in template tags:

{{ lines|linebreaks }}

or

{{ lines|linebreaksbr }}
qid & accept id: (14940338, 14940378) query: Map function and input function parameters soup:

You could use functools.partial():

\n
from functools import partial\nmap(partial(add_x_to_input, some_value_for_x), myList)\n
\n

or you could use a lambda (an anonymous, in-line function):

\n
map(lambda k: add_x_to_input(some_value_for_x, k), myList)\n
\n

or you could define an explicit new function:

\n
def wrapping_function(k):\n    return add_x_to_input(some_value_for_x, k)\n\nmap(wrapping_function, myList)\n
\n

If you are after sheer speed, the functools.partial() approach wins that hands-down; it is implemented in C code and avoids an extra Python stack push:

\n
>>> import timeit\n>>> L = range(10)\n>>> def foo(a, b): pass\n... \n>>> def p(b): return foo(1, b)\n... \n>>> timeit.timeit('map(p, L)', 'from __main__ import foo, L; from functools import partial; p = partial(foo, 1)')\n3.0008959770202637\n>>> timeit.timeit('map(p, L)', 'from __main__ import foo, L; p = lambda b: foo(1, b)')\n3.8707590103149414\n>>> timeit.timeit('map(p, L)', 'from __main__ import foo, L, p')\n3.9136409759521484\n
\n soup wrap:

You could use functools.partial():

from functools import partial
map(partial(add_x_to_input, some_value_for_x), myList)

or you could use a lambda (an anonymous, in-line function):

map(lambda k: add_x_to_input(some_value_for_x, k), myList)

or you could define an explicit new function:

def wrapping_function(k):
    return add_x_to_input(some_value_for_x, k)

map(wrapping_function, myList)

If you are after sheer speed, the functools.partial() approach wins that hands-down; it is implemented in C code and avoids an extra Python stack push:

>>> import timeit
>>> L = range(10)
>>> def foo(a, b): pass
... 
>>> def p(b): return foo(1, b)
... 
>>> timeit.timeit('map(p, L)', 'from __main__ import foo, L; from functools import partial; p = partial(foo, 1)')
3.0008959770202637
>>> timeit.timeit('map(p, L)', 'from __main__ import foo, L; p = lambda b: foo(1, b)')
3.8707590103149414
>>> timeit.timeit('map(p, L)', 'from __main__ import foo, L, p')
3.9136409759521484
qid & accept id: (14972601, 14972776) query: Exclude weekends in python django query set soup:

You can use IN clause.

\n
Sample.objects.filter(date__month=month).exclude(date__day__in = weekends)\n
\n

From django soruce code of DateField:

\n
def get_prep_lookup(self, lookup_type, value):\n    # For "__month", "__day", and "__week_day" lookups, convert the value\n    # to an int so the database backend always sees a consistent type.\n    if lookup_type in ('month', 'day', 'week_day'):\n        return int(value)\n
\n

So ideally __day should work. Can you also try to change your field name from date to something like created_date to avoid namespace clashes?

\n soup wrap:

You can use IN clause.

Sample.objects.filter(date__month=month).exclude(date__day__in = weekends)

From django soruce code of DateField:

def get_prep_lookup(self, lookup_type, value):
    # For "__month", "__day", and "__week_day" lookups, convert the value
    # to an int so the database backend always sees a consistent type.
    if lookup_type in ('month', 'day', 'week_day'):
        return int(value)

So ideally __day should work. Can you also try to change your field name from date to something like created_date to avoid namespace clashes?

qid & accept id: (15009180, 15009262) query: deleting element from python dictionary soup:

I suppose you could use:

\n
for eachitem in dicta:\n    for k in ['NAME','STATE','COUNTRY','REGION','LNAME']:\n        del eachitem[k]\n
\n

Or, if you only want 1 key:

\n
for eachitem in dicta:\n    salary = eachitem['SALARY']\n    eachitem.clear()\n    eachitem['SALARY'] = salary\n
\n

This does everything in place which I assume you want -- Otherwise, you can do it out of place simply by:

\n
eachitem = {'SALARY':eachitem['SALARY']}\n
\n soup wrap:

I suppose you could use:

for eachitem in dicta:
    for k in ['NAME','STATE','COUNTRY','REGION','LNAME']:
        del eachitem[k]

Or, if you only want 1 key:

for eachitem in dicta:
    salary = eachitem['SALARY']
    eachitem.clear()
    eachitem['SALARY'] = salary

This does everything in place which I assume you want -- Otherwise, you can do it out of place simply by:

eachitem = {'SALARY':eachitem['SALARY']}
qid & accept id: (15014326, 15102069) query: How to determine tools chosen by waf? soup:
def options(opt):\n    opt.load('compiler_c')\n    opt.load('compiler_cxx')\n\ndef configure(cfg):\n    cfg.load('compiler_c')\n    cfg.load('compiler_cxx')\n\ndef build(bld):\n    print "Compiler is CC_NAME  %s  CC  %s"%(bld.env.CC_NAME,bld.env.CC)\n    print "Compiler is CXX_NAME %s  CXX %s"%(bld.env.CXX_NAME,bld.env.CXX)\n
\n

Will give you:

\n
D:\temp>waf.bat configure build --check-c-compiler=gcc --check-cxx-compiler=g++\nSetting top to                           : D:\temp\nSetting out to                           : D:\temp\build\nChecking for 'gcc' (c compiler)          : c:\tools\gcc\bin\gcc.exe\nChecking for 'g++' (c++ compiler)        : c:\tools\gcc\bin\g++.exe\n'configure' finished successfully (0.191s)\nWaf: Entering directory `D:\temp\build'\nCompiler is CC_NAME  gcc  CC  ['c:\\tools\\gcc\\bin\\gcc.exe']\nCompiler is CXX_NAME gcc  CXX ['c:\\tools\\gcc\\bin\\g++.exe']\nWaf: Leaving directory `D:\temp\build'\n'build' finished successfully (0.008s)\n
\n soup wrap:
def options(opt):
    opt.load('compiler_c')
    opt.load('compiler_cxx')

def configure(cfg):
    cfg.load('compiler_c')
    cfg.load('compiler_cxx')

def build(bld):
    print "Compiler is CC_NAME  %s  CC  %s"%(bld.env.CC_NAME,bld.env.CC)
    print "Compiler is CXX_NAME %s  CXX %s"%(bld.env.CXX_NAME,bld.env.CXX)

Will give you:

D:\temp>waf.bat configure build --check-c-compiler=gcc --check-cxx-compiler=g++
Setting top to                           : D:\temp
Setting out to                           : D:\temp\build
Checking for 'gcc' (c compiler)          : c:\tools\gcc\bin\gcc.exe
Checking for 'g++' (c++ compiler)        : c:\tools\gcc\bin\g++.exe
'configure' finished successfully (0.191s)
Waf: Entering directory `D:\temp\build'
Compiler is CC_NAME  gcc  CC  ['c:\\tools\\gcc\\bin\\gcc.exe']
Compiler is CXX_NAME gcc  CXX ['c:\\tools\\gcc\\bin\\g++.exe']
Waf: Leaving directory `D:\temp\build'
'build' finished successfully (0.008s)
qid & accept id: (15062205, 15200385) query: Finding start and stops of consecutive values block in Python/Numpy/Pandas soup:

Below a numpy-based implementation for any dimensionnality (ndim = 2 or more) :

\n
def get_nans_blocks_length(a):\n    """\n    Returns 1D length of np.nan s block in sequence depth wise (last axis).\n    """\n    nan_mask = np.isnan(a)\n    start_nans_mask = np.concatenate((np.resize(nan_mask[...,0],a.shape[:-1]+(1,)),\n                                 np.logical_and(np.logical_not(nan_mask[...,:-1]), nan_mask[...,1:])\n                                 ), axis=a.ndim-1)\n    stop_nans_mask = np.concatenate((np.logical_and(nan_mask[...,:-1], np.logical_not(nan_mask[...,1:])),\n                                np.resize(nan_mask[...,-1], a.shape[:-1]+(1,))\n                                ), axis=a.ndim-1)\n\n    start_idxs = np.where(start_nans_mask)\n    stop_idxs = np.where(stop_nans_mask)\n    return stop_idxs[-1] - start_idxs[-1] + 1\n
\n

So that :

\n
a = np.array([\n        [1, np.nan, np.nan, np.nan],\n        [np.nan, 1, np.nan, 2], \n        [np.nan, np.nan, np.nan, np.nan]\n    ])\nget_nans_blocks_length(a)\narray([3, 1, 1, 4], dtype=int64)\n
\n

And :

\n
a = np.array([\n        [[1, np.nan], [np.nan, np.nan]],\n        [[np.nan, 1], [np.nan, 2]], \n        [[np.nan, np.nan], [np.nan, np.nan]]\n    ])\nget_nans_blocks_length(a)\narray([1, 2, 1, 1, 2, 2], dtype=int64)\n
\n soup wrap:

Below a numpy-based implementation for any dimensionnality (ndim = 2 or more) :

def get_nans_blocks_length(a):
    """
    Returns 1D length of np.nan s block in sequence depth wise (last axis).
    """
    nan_mask = np.isnan(a)
    start_nans_mask = np.concatenate((np.resize(nan_mask[...,0],a.shape[:-1]+(1,)),
                                 np.logical_and(np.logical_not(nan_mask[...,:-1]), nan_mask[...,1:])
                                 ), axis=a.ndim-1)
    stop_nans_mask = np.concatenate((np.logical_and(nan_mask[...,:-1], np.logical_not(nan_mask[...,1:])),
                                np.resize(nan_mask[...,-1], a.shape[:-1]+(1,))
                                ), axis=a.ndim-1)

    start_idxs = np.where(start_nans_mask)
    stop_idxs = np.where(stop_nans_mask)
    return stop_idxs[-1] - start_idxs[-1] + 1

So that :

a = np.array([
        [1, np.nan, np.nan, np.nan],
        [np.nan, 1, np.nan, 2], 
        [np.nan, np.nan, np.nan, np.nan]
    ])
get_nans_blocks_length(a)
array([3, 1, 1, 4], dtype=int64)

And :

a = np.array([
        [[1, np.nan], [np.nan, np.nan]],
        [[np.nan, 1], [np.nan, 2]], 
        [[np.nan, np.nan], [np.nan, np.nan]]
    ])
get_nans_blocks_length(a)
array([1, 2, 1, 1, 2, 2], dtype=int64)
qid & accept id: (15066913, 15070047) query: How to connect QLineEdit focusOutEvent soup:

Use an eventFilter:

\n
class Filter(QtCore.QObject):\n    def eventFilter(self, widget, event):\n        # FocusOut event\n        if event.type() == QtCore.QEvent.FocusOut:\n            # do custom stuff\n            print 'focus out'\n            # return False so that the widget will also handle the event\n            # otherwise it won't focus out\n            return False\n        else:\n            # we don't care about other events\n            return False\n
\n

And in your window:

\n
# ...\nself._filter = Filter()\n# adjust for your QLineEdit\nself.ui.lineEdit.installEventFilter(self._filter)\n
\n soup wrap:

Use an eventFilter:

class Filter(QtCore.QObject):
    def eventFilter(self, widget, event):
        # FocusOut event
        if event.type() == QtCore.QEvent.FocusOut:
            # do custom stuff
            print 'focus out'
            # return False so that the widget will also handle the event
            # otherwise it won't focus out
            return False
        else:
            # we don't care about other events
            return False

And in your window:

# ...
self._filter = Filter()
# adjust for your QLineEdit
self.ui.lineEdit.installEventFilter(self._filter)
qid & accept id: (15089310, 15096466) query: repeat arange with numpy soup:

There are definitely more numpythonic ways of doing things. One possibility could be something like this:

\n
import numpy as np\nfrom numpy.lib.stride_tricks import as_strided\n\ndef concatenated_ranges(ranges_list) :\n    ranges_list = np.array(ranges_list, copy=False)\n    base_range = np.arange(ranges_list.max())\n    base_range =  as_strided(base_range,\n                             shape=ranges_list.shape + base_range.shape,\n                             strides=(0,) + base_range.strides)\n    return base_range[base_range < ranges_list[:, None]]\n
\n

If you are concatenating only a few ranges, then probably Mr. E's pure python solution is your best choice, but if you have even as few as a hundred ranges to concatenate, this stars being noticeably faster. For comparison I have used this two functions extracted from the other answers:

\n
def junuxx(a) :\n    b = np.array([], dtype=np.uint8)\n    for x in a:\n        b = np.append(b, np.arange(x))\n    return b\n\ndef mr_e(a) :\n    return reduce(lambda x, y: x + range(y), a, [])\n
\n

And here are some timings:

\n
In [2]: a = [2, 1, 4, 0 ,2] # the OP's original example\n\nIn [3]: concatenated_ranges(a) # show it works!\nOut[3]: array([0, 1, 0, 0, 1, 2, 3, 0, 1])\n\nIn [4]: %timeit concatenated_ranges(a)\n10000 loops, best of 3: 31.6 us per loop\n\nIn [5]: %timeit junuxx(a)\n10000 loops, best of 3: 34 us per loop\n\nIn [6]: %timeit mr_e(a)\n100000 loops, best of 3: 2.58 us per loop\n\nIn [7]: a = np.random.randint(1, 10, size=(10,))\n\nIn [8]: %timeit concatenated_ranges(a)\n10000 loops, best of 3: 27.1 us per loop\n\nIn [9]: %timeit junuxx(a)\n10000 loops, best of 3: 79.8 us per loop\n\nIn [10]: %timeit mr_e(a)\n100000 loops, best of 3: 7.82 us per loop\n\nIn [11]: a = np.random.randint(1, 10, size=(100,))\n\nIn [12]: %timeit concatenated_ranges(a)\n10000 loops, best of 3: 57.4 us per loop\n\nIn [13]: %timeit junuxx(a)\n1000 loops, best of 3: 756 us per loop\n\nIn [14]: %timeit mr_e(a)\n10000 loops, best of 3: 149 us per loop\n\nIn [15]: a = np.random.randint(1, 10, size=(1000,))\n\nIn [16]: %timeit concatenated_ranges(a)\n1000 loops, best of 3: 358 us per loop\n\nIn [17]: %timeit junuxx(a)\n100 loops, best of 3: 9.38 ms per loop\n\nIn [18]: %timeit mr_e(a)\n100 loops, best of 3: 8.93 ms per loop\n
\n soup wrap:

There are definitely more numpythonic ways of doing things. One possibility could be something like this:

import numpy as np
from numpy.lib.stride_tricks import as_strided

def concatenated_ranges(ranges_list) :
    ranges_list = np.array(ranges_list, copy=False)
    base_range = np.arange(ranges_list.max())
    base_range =  as_strided(base_range,
                             shape=ranges_list.shape + base_range.shape,
                             strides=(0,) + base_range.strides)
    return base_range[base_range < ranges_list[:, None]]

If you are concatenating only a few ranges, then probably Mr. E's pure python solution is your best choice, but if you have even as few as a hundred ranges to concatenate, this stars being noticeably faster. For comparison I have used this two functions extracted from the other answers:

def junuxx(a) :
    b = np.array([], dtype=np.uint8)
    for x in a:
        b = np.append(b, np.arange(x))
    return b

def mr_e(a) :
    return reduce(lambda x, y: x + range(y), a, [])

And here are some timings:

In [2]: a = [2, 1, 4, 0 ,2] # the OP's original example

In [3]: concatenated_ranges(a) # show it works!
Out[3]: array([0, 1, 0, 0, 1, 2, 3, 0, 1])

In [4]: %timeit concatenated_ranges(a)
10000 loops, best of 3: 31.6 us per loop

In [5]: %timeit junuxx(a)
10000 loops, best of 3: 34 us per loop

In [6]: %timeit mr_e(a)
100000 loops, best of 3: 2.58 us per loop

In [7]: a = np.random.randint(1, 10, size=(10,))

In [8]: %timeit concatenated_ranges(a)
10000 loops, best of 3: 27.1 us per loop

In [9]: %timeit junuxx(a)
10000 loops, best of 3: 79.8 us per loop

In [10]: %timeit mr_e(a)
100000 loops, best of 3: 7.82 us per loop

In [11]: a = np.random.randint(1, 10, size=(100,))

In [12]: %timeit concatenated_ranges(a)
10000 loops, best of 3: 57.4 us per loop

In [13]: %timeit junuxx(a)
1000 loops, best of 3: 756 us per loop

In [14]: %timeit mr_e(a)
10000 loops, best of 3: 149 us per loop

In [15]: a = np.random.randint(1, 10, size=(1000,))

In [16]: %timeit concatenated_ranges(a)
1000 loops, best of 3: 358 us per loop

In [17]: %timeit junuxx(a)
100 loops, best of 3: 9.38 ms per loop

In [18]: %timeit mr_e(a)
100 loops, best of 3: 8.93 ms per loop
qid & accept id: (15104415, 15128629) query: How to print progress from this code as the subprocess is running? soup:

The stdout=subprocess.PIPE prevents the subprocess from printing to stdout and instead passes it to result using communicate similar to how in bash

\n
sshpass -[args] rsync [source] [dest]\n
\n

will print progress but

\n
sshpass -[args] rsync [source] [dest] | sort\n
\n

will not print anything until the process is complete.

\n

What you want is to tee the stdout. look here. Based on those answers you could do something like:

\n
# Caution! untested code\nresult = []\nprocess = subprocess.Popen(['sshpass', '-p', password, 'rsync', '-avz',\n                            '--info=progress2', source12, destination], \n                           stdout=subprocess.PIPE)\nwhile process.poll() is None:\n    line = process.stdout.readline()\n    print line\n    result.append(line)\nprint sort(result)\n
\n soup wrap:

The stdout=subprocess.PIPE prevents the subprocess from printing to stdout and instead passes it to result using communicate similar to how in bash

sshpass -[args] rsync [source] [dest]

will print progress but

sshpass -[args] rsync [source] [dest] | sort

will not print anything until the process is complete.

What you want is to tee the stdout. look here. Based on those answers you could do something like:

# Caution! untested code
result = []
process = subprocess.Popen(['sshpass', '-p', password, 'rsync', '-avz',
                            '--info=progress2', source12, destination], 
                           stdout=subprocess.PIPE)
while process.poll() is None:
    line = process.stdout.readline()
    print line
    result.append(line)
print sort(result)
qid & accept id: (15109165, 15109783) query: Loading a dataset from file, to use with sklearn soup:

You can use numpy's genfromtxt function to retrieve data from the file(http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html)

\n
import numpy as np\nmydata = np.genfromtxt(filename, delimiter=",")\n
\n

However, if you have textual columns, using genfromtxt is trickier, since you need to specify the data types.

\n

It will be much easier with the excellent Pandas library (http://pandas.pydata.org/)

\n
import pandas as pd\nmydata = pd.read_csv(filename)\ntarget = mydata["Label"]  #provided your csv has header row, and the label column is named "Label"\n\n#select all but the last column as data\ndata = mydata.ix[:,:-1]\n
\n soup wrap:

You can use numpy's genfromtxt function to retrieve data from the file(http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html)

import numpy as np
mydata = np.genfromtxt(filename, delimiter=",")

However, if you have textual columns, using genfromtxt is trickier, since you need to specify the data types.

It will be much easier with the excellent Pandas library (http://pandas.pydata.org/)

import pandas as pd
mydata = pd.read_csv(filename)
target = mydata["Label"]  #provided your csv has header row, and the label column is named "Label"

#select all but the last column as data
data = mydata.ix[:,:-1]
qid & accept id: (15114761, 15116519) query: Converting coordinate tuple information to numpy arrays soup:

Lets say you read your whole array into memory as an array data of shape (Nx1 * Nx2 * Nx3, 6).

\n
data = np.loadtxt('data.txt', dtype=float, delimiter=',')\n
\n

If, as your, example suggests, the points are generated in lexicographical order, you only need to grab the columns to f, g and h and reshape them:

\n
f = data[:, 3].reshape(Nx1, Nx2, Nx3)\ng = data[:, 4].reshape(Nx1, Nx2, Nx3)\nh = data[:, 5].reshape(Nx1, Nx2, Nx3)\n
\n

If you need to figure out what Nx1, Nx2 and Nx3 are, you can use np.unique:

\n
Nx1 = np.unique(data[:, 0]).shape[0]\nNx2 = np.unique(data[:, 1]).shape[0]\nNx3 = np.unique(data[:, 2]).shape[0]\n
\n

A more robust solution in case the order of the points is not guaranteed, would be to use np.unique to extract indices to the grid values:

\n
Nx1, idx1 = np.unique(data[:, 0], return_inverse=True)\nNx1 = Nx1.shape[0]\nNx2, idx2 = np.unique(data[:, 1], return_inverse=True)\nNx2 = Nx2.shape[1]\nNx3, idx3 = np.unique(data[:, 2], return_inverse=True)\nNx3 = Nx3.shape[0]\n\nf = np.empty((Nx1, Nx2, Nx3))\nf[idx1, idx2, idx3] = data[:, 3]\ng = np.empty((Nx1, Nx2, Nx3))\ng[idx1, idx2, idx3] = data[:, 4]\nh = np.empty((Nx1, Nx2, Nx3))\nh[idx1, idx2, idx3] = data[:, 5]\n
\n

This will create new arrays for f, g and h, not views into the original data array, so it will use more memory.

\n

And of course instead of my ugly code above repeating everything three times, you should use a loop, or a list comprehension!

\n soup wrap:

Lets say you read your whole array into memory as an array data of shape (Nx1 * Nx2 * Nx3, 6).

data = np.loadtxt('data.txt', dtype=float, delimiter=',')

If, as your, example suggests, the points are generated in lexicographical order, you only need to grab the columns to f, g and h and reshape them:

f = data[:, 3].reshape(Nx1, Nx2, Nx3)
g = data[:, 4].reshape(Nx1, Nx2, Nx3)
h = data[:, 5].reshape(Nx1, Nx2, Nx3)

If you need to figure out what Nx1, Nx2 and Nx3 are, you can use np.unique:

Nx1 = np.unique(data[:, 0]).shape[0]
Nx2 = np.unique(data[:, 1]).shape[0]
Nx3 = np.unique(data[:, 2]).shape[0]

A more robust solution in case the order of the points is not guaranteed, would be to use np.unique to extract indices to the grid values:

Nx1, idx1 = np.unique(data[:, 0], return_inverse=True)
Nx1 = Nx1.shape[0]
Nx2, idx2 = np.unique(data[:, 1], return_inverse=True)
Nx2 = Nx2.shape[1]
Nx3, idx3 = np.unique(data[:, 2], return_inverse=True)
Nx3 = Nx3.shape[0]

f = np.empty((Nx1, Nx2, Nx3))
f[idx1, idx2, idx3] = data[:, 3]
g = np.empty((Nx1, Nx2, Nx3))
g[idx1, idx2, idx3] = data[:, 4]
h = np.empty((Nx1, Nx2, Nx3))
h[idx1, idx2, idx3] = data[:, 5]

This will create new arrays for f, g and h, not views into the original data array, so it will use more memory.

And of course instead of my ugly code above repeating everything three times, you should use a loop, or a list comprehension!

qid & accept id: (15133537, 15133809) query: pydoc.render_doc() adds characters - how to avoid that? soup:

In python 2, you can remove the boldface sequences with pydoc.plain:

\n
pydoc.plain(pydoc.render_doc(help))\n
\n

 

\n
>>> help(pydoc.plain)\nHelp on function plain in module pydoc:\n\nplain(text)\n    Remove boldface formatting from text.\n
\n

 

\n

In python 3 pydoc.render_doc accepts a renderer:

\n
pydoc.render_doc(help, renderer=pydoc.plaintext)\n
\n

 

\n soup wrap:

In python 2, you can remove the boldface sequences with pydoc.plain:

pydoc.plain(pydoc.render_doc(help))

 

>>> help(pydoc.plain)
Help on function plain in module pydoc:

plain(text)
    Remove boldface formatting from text.

 

In python 3 pydoc.render_doc accepts a renderer:

pydoc.render_doc(help, renderer=pydoc.plaintext)

 

qid & accept id: (15148202, 15148221) query: Even number check without: if soup:

You could use two lists, one for evens and one for odds:

\n
evens = []\nodds = []\nout = [evens,odds]\nfor elem in numbers:\n    out[elem%2].append(elem)\n\nprint evens\n
\n

or you could end with:

\n
for even in evens:\n    print even\n
\n

to simulate the same style of outputting that you have currently.

\n soup wrap:

You could use two lists, one for evens and one for odds:

evens = []
odds = []
out = [evens,odds]
for elem in numbers:
    out[elem%2].append(elem)

print evens

or you could end with:

for even in evens:
    print even

to simulate the same style of outputting that you have currently.

qid & accept id: (15183718, 15183729) query: Python: How to call a class in the same file soup:

You can call One.get() directly if you turn it into a static method:

\n
class One:\n    @staticmethod\n    def get():\n        return 1\n\nclass Two:\n    def __init__(self):\n        val = One.get()\n
\n

Without the @staticmethod, you need an instance of One in order to be able to call get():

\n
class One:\n    def get(self):\n        return 1\n\nclass Two:\n    def __init__(self):\n        one = One()\n        val = one.get()\n
\n soup wrap:

You can call One.get() directly if you turn it into a static method:

class One:
    @staticmethod
    def get():
        return 1

class Two:
    def __init__(self):
        val = One.get()

Without the @staticmethod, you need an instance of One in order to be able to call get():

class One:
    def get(self):
        return 1

class Two:
    def __init__(self):
        one = One()
        val = one.get()
qid & accept id: (15211991, 15215562) query: Socket code from python to Objective C soup:

This code is a working translation. It certainly can be improved and your comments are welcome but it works fine and does exactly what the Python code did.

\n
NSString * src = @"X.X.X.X";\nNSString * mac = @"XX-XX-XX-XX-XX-XX";\n\n\nconst unsigned char byte64[] = {0x64};\nconst unsigned char byte00[] = {0x00};\n\nNSString * srcString = [src base64EncodedString];\nint srcDataLength = [srcString length];\nchar* srcDataLengthAsByte = (char*) &srcDataLength;\n\nNSString * macString = [mac base64EncodedString];\nint macDataLength = [macString length];\nchar* macDataLengthAsByte = (char*) &macDataLength;\n\nNSString * remoteString = [remote base64EncodedString];\nint remoteDataLength = [remoteString length];\nchar* remoteDataLengthAsByte = (char*) &remoteDataLength;\n\nNSString * appString = [app base64EncodedString];\nint appDataLength = [appString length];\nchar* appDataLengthAsByte = (char*) &appDataLength;\n\nNSMutableData * msgData = [NSMutableData data];\n[msgData appendBytes:byte64 length:1];\n[msgData appendBytes:byte00 length:1];\n[msgData appendBytes:srcDataLengthAsByte length:1];\n[msgData appendBytes:byte00 length:1];\n[msgData appendData:[srcString dataUsingEncoding:NSASCIIStringEncoding]];\n[msgData appendBytes:macDataLengthAsByte length:1];\n[msgData appendBytes:byte00 length:1];\n[msgData appendData:[macString dataUsingEncoding:NSASCIIStringEncoding]];\n[msgData appendBytes:remoteDataLengthAsByte length:1];\n[msgData appendBytes:byte00 length:1];\n[msgData appendData:[remoteString dataUsingEncoding:NSASCIIStringEncoding]];\nint msgDataLength = [msgData length];\nchar* msgDataLengthAsByte = (char*) &msgDataLength;\n\nNSMutableData * packet = [NSMutableData data];\n[packet appendBytes:byte00 length:1];\n[packet appendBytes:appDataLengthAsByte length:1];\n[packet appendBytes:byte00 length:1];\n[packet appendData:[appString dataUsingEncoding:NSASCIIStringEncoding]];\n[packet appendBytes:msgDataLengthAsByte length:1];\n[packet appendBytes:byte00 length:1];\n[packet appendData:msgData];\n[self send:packet];\n
\n

And the socket part :

\n
- (BOOL)connect\n{\n    struct sockaddr_in addr;\n    sockfd = socket( AF_INET, SOCK_STREAM, 0 );\n    addr.sin_family = AF_INET;\n    addr.sin_addr.s_addr = inet_addr([TV_IP UTF8String]);\n    addr.sin_port = htons(TV_PORT);\n    return connect(sockfd, (struct sockaddr*)&addr, sizeof(addr))==0;\n}\n\n- (long)send:(NSData*)data\n{\n    long sent = send(sockfd, [data bytes], [data length], 0);\n\n    VADebugLog(@"sent data:(%ld bytes) = [%@]",sent,[data description]);\n\n    return sent;\n}\n\n-(void)close\n{\n    close(sockfd);\n}\n
\n

And the include with base64 from : https://github.com/nicklockwood/Base64

\n
#include \n#include \n#import "Base64.h"\n
\n soup wrap:

This code is a working translation. It certainly can be improved and your comments are welcome but it works fine and does exactly what the Python code did.

NSString * src = @"X.X.X.X";
NSString * mac = @"XX-XX-XX-XX-XX-XX";


const unsigned char byte64[] = {0x64};
const unsigned char byte00[] = {0x00};

NSString * srcString = [src base64EncodedString];
int srcDataLength = [srcString length];
char* srcDataLengthAsByte = (char*) &srcDataLength;

NSString * macString = [mac base64EncodedString];
int macDataLength = [macString length];
char* macDataLengthAsByte = (char*) &macDataLength;

NSString * remoteString = [remote base64EncodedString];
int remoteDataLength = [remoteString length];
char* remoteDataLengthAsByte = (char*) &remoteDataLength;

NSString * appString = [app base64EncodedString];
int appDataLength = [appString length];
char* appDataLengthAsByte = (char*) &appDataLength;

NSMutableData * msgData = [NSMutableData data];
[msgData appendBytes:byte64 length:1];
[msgData appendBytes:byte00 length:1];
[msgData appendBytes:srcDataLengthAsByte length:1];
[msgData appendBytes:byte00 length:1];
[msgData appendData:[srcString dataUsingEncoding:NSASCIIStringEncoding]];
[msgData appendBytes:macDataLengthAsByte length:1];
[msgData appendBytes:byte00 length:1];
[msgData appendData:[macString dataUsingEncoding:NSASCIIStringEncoding]];
[msgData appendBytes:remoteDataLengthAsByte length:1];
[msgData appendBytes:byte00 length:1];
[msgData appendData:[remoteString dataUsingEncoding:NSASCIIStringEncoding]];
int msgDataLength = [msgData length];
char* msgDataLengthAsByte = (char*) &msgDataLength;

NSMutableData * packet = [NSMutableData data];
[packet appendBytes:byte00 length:1];
[packet appendBytes:appDataLengthAsByte length:1];
[packet appendBytes:byte00 length:1];
[packet appendData:[appString dataUsingEncoding:NSASCIIStringEncoding]];
[packet appendBytes:msgDataLengthAsByte length:1];
[packet appendBytes:byte00 length:1];
[packet appendData:msgData];
[self send:packet];

And the socket part :

- (BOOL)connect
{
    struct sockaddr_in addr;
    sockfd = socket( AF_INET, SOCK_STREAM, 0 );
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = inet_addr([TV_IP UTF8String]);
    addr.sin_port = htons(TV_PORT);
    return connect(sockfd, (struct sockaddr*)&addr, sizeof(addr))==0;
}

- (long)send:(NSData*)data
{
    long sent = send(sockfd, [data bytes], [data length], 0);

    VADebugLog(@"sent data:(%ld bytes) = [%@]",sent,[data description]);

    return sent;
}

-(void)close
{
    close(sockfd);
}

And the include with base64 from : https://github.com/nicklockwood/Base64

#include 
#include 
#import "Base64.h"
qid & accept id: (15238389, 15238410) query: How to add a string to a specific line soup:

Have you tried something like this?:

\n
exp = 20 # the line where text need to be added or exp that calculates it for ex %2\n\nwith open(filename, 'r') as f:\n    lines = f.readlines()\n\nwith open(filename, 'w') as f:\n    for i,line in enumerate(lines):\n        if i == exp:\n            f.write('------')\n        f.write(line)\n
\n

If you need to edit diff number of lines you can update code above this way:

\n
def update_file(filename, ln):\n    with open(filename, 'r') as f:\n        lines = f.readlines()\n\n    with open(filename, 'w') as f:\n        for idx,line in enumerate(lines):\n            (idx in ln and f.write('------'))\n            f.write(line)\n
\n soup wrap:

Have you tried something like this?:

exp = 20 # the line where text need to be added or exp that calculates it for ex %2

with open(filename, 'r') as f:
    lines = f.readlines()

with open(filename, 'w') as f:
    for i,line in enumerate(lines):
        if i == exp:
            f.write('------')
        f.write(line)

If you need to edit diff number of lines you can update code above this way:

def update_file(filename, ln):
    with open(filename, 'r') as f:
        lines = f.readlines()

    with open(filename, 'w') as f:
        for idx,line in enumerate(lines):
            (idx in ln and f.write('------'))
            f.write(line)
qid & accept id: (15283478, 15283773) query: having category headings for list dictionaries in python/django soup:

A solution using defaultdict to group all the items by category:

\n
from collections import defaultdict\n\nitems = [{'category':'apple','item':'granny smith'},\n {'category':'apple','item':'cox'},\n {'category':'apple','item':'pixie'},\n {'category':'orange','item':'premier'},\n {'category':'orange','item':'queen'},\n {'category':'orange','item':'westin'},\n {'category':'tea','item':'breakfast'},\n {'category':'tea','item':'lady grey'},\n {'category':'tea','item':'builders'},\n {'category':'coffee','item':'colombia'},\n {'category':'coffee','item':'kenya'},\n {'category':'coffee','item':'brazil'}]\n\nresult = defaultdict(list)\nfor item in items:\n    result[item['category']].append(item['item'])\n
\n

And in the template:

\n
{% for key, values in result.items() %}\n    {{key}}\n    
    \n {% for item in values %}\n
  • {{item}}
  • \n {% endfor %}\n
\n{% endfor %}\n
\n soup wrap:

A solution using defaultdict to group all the items by category:

from collections import defaultdict

items = [{'category':'apple','item':'granny smith'},
 {'category':'apple','item':'cox'},
 {'category':'apple','item':'pixie'},
 {'category':'orange','item':'premier'},
 {'category':'orange','item':'queen'},
 {'category':'orange','item':'westin'},
 {'category':'tea','item':'breakfast'},
 {'category':'tea','item':'lady grey'},
 {'category':'tea','item':'builders'},
 {'category':'coffee','item':'colombia'},
 {'category':'coffee','item':'kenya'},
 {'category':'coffee','item':'brazil'}]

result = defaultdict(list)
for item in items:
    result[item['category']].append(item['item'])

And in the template:

{% for key, values in result.items() %}
    {{key}}
    
    {% for item in values %}
  • {{item}}
  • {% endfor %}
{% endfor %}
qid & accept id: (15304076, 15304172) query: Making use of piped data in python soup:

Expanding on my comment above, I would say the following code would do the trick (NOTE: this is not Python 3, but Python 2.7. Some quick Googling told me they are similar in this however).

\n

In echo.py:

\n
import sys\nprint sys.stdin.read()\n
\n

Then in your terminal call it like this:

\n
python echo.py < test.txt\n
\n

This should echo the contents of test.txt on your terminal.

\n soup wrap:

Expanding on my comment above, I would say the following code would do the trick (NOTE: this is not Python 3, but Python 2.7. Some quick Googling told me they are similar in this however).

In echo.py:

import sys
print sys.stdin.read()

Then in your terminal call it like this:

python echo.py < test.txt

This should echo the contents of test.txt on your terminal.

qid & accept id: (15312953, 15313024) query: Choose a file starting with a given string soup:

Try using os.listdir,os.path.join and os.path.isfile.
\nIn long form (with for loops),

\n
import os\npath = 'C:/'\nfiles = []\nfor i in os.listdir(path):\n    if os.path.isfile(os.path.join(path,i)) and '001_MN_DX' in i:\n        files.append(i)\n
\n

Code, with list-comprehensions is

\n
import os\npath = 'C:/'\nfiles = [i for i in os.listdir(path) if os.path.isfile(os.path.join(path,i)) and \\n         '001_MN_DX' in i]\n
\n

Check here for the long explanation...

\n soup wrap:

Try using os.listdir,os.path.join and os.path.isfile.
In long form (with for loops),

import os
path = 'C:/'
files = []
for i in os.listdir(path):
    if os.path.isfile(os.path.join(path,i)) and '001_MN_DX' in i:
        files.append(i)

Code, with list-comprehensions is

import os
path = 'C:/'
files = [i for i in os.listdir(path) if os.path.isfile(os.path.join(path,i)) and \
         '001_MN_DX' in i]

Check here for the long explanation...

qid & accept id: (15326069, 15326764) query: Matplotlib half black and half white circle soup:

The easiest way is to use two Wedges. (This doesn't automatically rescale the axes, but that's easy to add, if you'd like.)

\n

As a quick example:

\n
import matplotlib.pyplot as plt\nfrom matplotlib.patches import Wedge\n\ndef main():\n    fig, ax = plt.subplots()\n    dual_half_circle((0.5, 0.5), radius=0.3, angle=90, ax=ax)\n    ax.axis('equal')\n    plt.show()\n\ndef dual_half_circle(center, radius, angle=0, ax=None, colors=('w','k'),\n                     **kwargs):\n    """\n    Add two half circles to the axes *ax* (or the current axes) with the \n    specified facecolors *colors* rotated at *angle* (in degrees).\n    """\n    if ax is None:\n        ax = plt.gca()\n    theta1, theta2 = angle, angle + 180\n    w1 = Wedge(center, radius, theta1, theta2, fc=colors[0], **kwargs)\n    w2 = Wedge(center, radius, theta2, theta1, fc=colors[1], **kwargs)\n    for wedge in [w1, w2]:\n        ax.add_artist(wedge)\n    return [w1, w2]\n\nmain()\n
\n

enter image description here

\n

If you'd like it to always be at the origin, you can specify the transform to be ax.transAxes, and turn clipping off.

\n

E.g.

\n
import matplotlib.pyplot as plt\nfrom matplotlib.patches import Wedge\n\ndef main():\n    fig, ax = plt.subplots()\n    dual_half_circle(radius=0.1, angle=90, ax=ax)\n    ax.axis('equal')\n    plt.show()\n\ndef dual_half_circle(radius, angle=0, ax=None, colors=('w','k'), **kwargs):\n    """\n    Add two half circles to the axes *ax* (or the current axes) at the lower\n    left corner of the axes with the specified facecolors *colors* rotated at\n    *angle* (in degrees).\n    """\n    if ax is None:\n        ax = plt.gca()\n    kwargs.update(transform=ax.transAxes, clip_on=False)\n    center = (0, 0)\n    theta1, theta2 = angle, angle + 180\n    w1 = Wedge(center, radius, theta1, theta2, fc=colors[0], **kwargs)\n    w2 = Wedge(center, radius, theta2, theta1, fc=colors[1], **kwargs)\n    for wedge in [w1, w2]:\n        ax.add_artist(wedge)\n    return [w1, w2]\n\nmain()\n
\n

However, this will make the "circularity" of the circle depend on aspect ratio of the outline of the axes. (You can get around that in a couple of ways, but it gets more complex. Let me know if that's what you had in mind and I can show a more elaborate example.) I also may have misunderstood what you meant "at the origin".

\n soup wrap:

The easiest way is to use two Wedges. (This doesn't automatically rescale the axes, but that's easy to add, if you'd like.)

As a quick example:

import matplotlib.pyplot as plt
from matplotlib.patches import Wedge

def main():
    fig, ax = plt.subplots()
    dual_half_circle((0.5, 0.5), radius=0.3, angle=90, ax=ax)
    ax.axis('equal')
    plt.show()

def dual_half_circle(center, radius, angle=0, ax=None, colors=('w','k'),
                     **kwargs):
    """
    Add two half circles to the axes *ax* (or the current axes) with the 
    specified facecolors *colors* rotated at *angle* (in degrees).
    """
    if ax is None:
        ax = plt.gca()
    theta1, theta2 = angle, angle + 180
    w1 = Wedge(center, radius, theta1, theta2, fc=colors[0], **kwargs)
    w2 = Wedge(center, radius, theta2, theta1, fc=colors[1], **kwargs)
    for wedge in [w1, w2]:
        ax.add_artist(wedge)
    return [w1, w2]

main()

enter image description here

If you'd like it to always be at the origin, you can specify the transform to be ax.transAxes, and turn clipping off.

E.g.

import matplotlib.pyplot as plt
from matplotlib.patches import Wedge

def main():
    fig, ax = plt.subplots()
    dual_half_circle(radius=0.1, angle=90, ax=ax)
    ax.axis('equal')
    plt.show()

def dual_half_circle(radius, angle=0, ax=None, colors=('w','k'), **kwargs):
    """
    Add two half circles to the axes *ax* (or the current axes) at the lower
    left corner of the axes with the specified facecolors *colors* rotated at
    *angle* (in degrees).
    """
    if ax is None:
        ax = plt.gca()
    kwargs.update(transform=ax.transAxes, clip_on=False)
    center = (0, 0)
    theta1, theta2 = angle, angle + 180
    w1 = Wedge(center, radius, theta1, theta2, fc=colors[0], **kwargs)
    w2 = Wedge(center, radius, theta2, theta1, fc=colors[1], **kwargs)
    for wedge in [w1, w2]:
        ax.add_artist(wedge)
    return [w1, w2]

main()

However, this will make the "circularity" of the circle depend on aspect ratio of the outline of the axes. (You can get around that in a couple of ways, but it gets more complex. Let me know if that's what you had in mind and I can show a more elaborate example.) I also may have misunderstood what you meant "at the origin".

qid & accept id: (15329797, 15329844) query: list comprehension on multiple lists of lists soup:

Sounds like you are looking for zip? It takes a pair of lists and turns it into a list of pairs.

\n
[\n    [my_operation(x,y) for x,y in zip(xs, ys)]\n    for xs, ys in zip(a, b)\n]\n
\n

-- Edit. Requirements changed:

\n
[\n    [[regex(p, s) for p in patterns] for s in strings]\n    for strings, patterns in zip(a, b)\n]\n
\n soup wrap:

Sounds like you are looking for zip? It takes a pair of lists and turns it into a list of pairs.

[
    [my_operation(x,y) for x,y in zip(xs, ys)]
    for xs, ys in zip(a, b)
]

-- Edit. Requirements changed:

[
    [[regex(p, s) for p in patterns] for s in strings]
    for strings, patterns in zip(a, b)
]
qid & accept id: (15340713, 15341114) query: regex/python to find and replace specific number within string soup:

You can use a callback function inside the sub part of the regex:

\n
import re\n\ndef callback(match):\n    return match.group(0).replace('17', '19')\n\ns = "[ 17 plane_17 \ 23 25 17 99 150 248 \ noname ]"\n\ns = re.compile(r'\\.+?\\').sub(callback, s)\n\nprint s\n
\n

Prints:

\n
[ 17 plane_17 \ 23 25 19 99 150 248 \ noname ]\n
\n soup wrap:

You can use a callback function inside the sub part of the regex:

import re

def callback(match):
    return match.group(0).replace('17', '19')

s = "[ 17 plane_17 \ 23 25 17 99 150 248 \ noname ]"

s = re.compile(r'\\.+?\\').sub(callback, s)

print s

Prints:

[ 17 plane_17 \ 23 25 19 99 150 248 \ noname ]
qid & accept id: (15353972, 15403649) query: PyQt4 Local Directory view with option to select folders soup:

You can subclass QDirModel, and reimplement data(index,role) method, where you should check, if role is QtCore.Qt.CheckStateRole. If it is, you should return either QtCore.Qt.Checked or QtCore.Qt.Unchecked. Also, you need to reimplement setData method as well, to handle user checks/unchecks, and flags to return QtCore.Qt.ItemIsUserCheckable flag, which enables user checking/unchecking. I.e.:

\n
class CheckableDirModel(QtGui.QDirModel):\ndef __init__(self, parent=None):\n    QtGui.QDirModel.__init__(self, None)\n    self.checks = {}\n\ndef data(self, index, role=QtCore.Qt.DisplayRole):\n    if role != QtCore.Qt.CheckStateRole:\n        return QtGui.QDirModel.data(self, index, role)\n    else:\n        if index.column() == 0:\n            return self.checkState(index)\n\ndef flags(self, index):\n    return QtGui.QDirModel.flags(self, index) | QtCore.Qt.ItemIsUserCheckable\n\ndef checkState(self, index):\n    if index in self.checks:\n        return self.checks[index]\n    else:\n        return QtCore.Qt.Unchecked\n\ndef setData(self, index, value, role):\n    if (role == QtCore.Qt.CheckStateRole and index.column() == 0):\n        self.checks[index] = value\n        self.emit(QtCore.SIGNAL("dataChanged(QModelIndex,QModelIndex)"), index, index)\n        return True \n\n    return QtGui.QDirModel.setData(self, index, value, role)\n
\n

Then you use this class instead of QDirModel:

\n
model = CheckableDirModel()\ntree = QtGui.QTreeView()\ntree.setModel(model)\n
\n soup wrap:

You can subclass QDirModel, and reimplement data(index,role) method, where you should check, if role is QtCore.Qt.CheckStateRole. If it is, you should return either QtCore.Qt.Checked or QtCore.Qt.Unchecked. Also, you need to reimplement setData method as well, to handle user checks/unchecks, and flags to return QtCore.Qt.ItemIsUserCheckable flag, which enables user checking/unchecking. I.e.:

class CheckableDirModel(QtGui.QDirModel):
def __init__(self, parent=None):
    QtGui.QDirModel.__init__(self, None)
    self.checks = {}

def data(self, index, role=QtCore.Qt.DisplayRole):
    if role != QtCore.Qt.CheckStateRole:
        return QtGui.QDirModel.data(self, index, role)
    else:
        if index.column() == 0:
            return self.checkState(index)

def flags(self, index):
    return QtGui.QDirModel.flags(self, index) | QtCore.Qt.ItemIsUserCheckable

def checkState(self, index):
    if index in self.checks:
        return self.checks[index]
    else:
        return QtCore.Qt.Unchecked

def setData(self, index, value, role):
    if (role == QtCore.Qt.CheckStateRole and index.column() == 0):
        self.checks[index] = value
        self.emit(QtCore.SIGNAL("dataChanged(QModelIndex,QModelIndex)"), index, index)
        return True 

    return QtGui.QDirModel.setData(self, index, value, role)

Then you use this class instead of QDirModel:

model = CheckableDirModel()
tree = QtGui.QTreeView()
tree.setModel(model)
qid & accept id: (15398904, 15398993) query: Iterate over Python list, preserving structure of embedded lists soup:

Using list comprehensions:

\n
>>> [[tuple(map(int, pair)) + (2,) for pair in pairs] for pairs in l]\n[[(100, 200, 2), (300, 400, 2), (500, 600, 2)], [(100, 200, 2)], [(100, 200, 2)]]\n
\n

Or without the map:

\n
>>> [[(int(a), int(b), 2) for a, b in pairs] for pairs in l]\n[[(100, 200, 2), (300, 400, 2), (500, 600, 2)], [(100, 200, 2)], [(100, 200, 2)]]\n
\n

Edit

\n

Even with further checks, you can still use list comprehension. I assume that the if/else section you have added to your question should be applied to every pair, and the resulting tuple would be (addr_from, addr_to, 2) then, right?

\n
def processPair(a, b):\n    if a.isdigit():\n        a = int(a)\n    elif a.isalnum():\n        a = re.sub(r'((?:[A-Z].*?)?(?:\d.*?)?[A-Z]+)(\d+)', r'\1%\2', a)\n    if b.isdigit():\n        b = int(b) + 2\n    elif b.isalnum():\n        b = re.sub(r'((?:[A-Z].*?)?(?:\d.*?)?[A-Z]+)(\d+)', r'\1%\2', b)\n    return (a, b, 2)\n
\n

Here I have defined a function that processes the tuple (a, b) as you did in your question. Note that I have changed it to just modify the values of the variables and return a finished tuple (with the added 2) instead of appending it to some global list.

\n

I have also simplified it a bit. a.isdigit() is True is the same as a.isdigit() as that already returns a boolean value. Same for a.isdigit() == False, which is the same as not a.isdigit(). In that situation you can also remove redundant checks. After checking a.isdigit() on the if, you do not need to check its opposite on the elif; it is guaranteed to be false, as you have already checked it before.

\n

That being said, when you have said function, you can then use list comprehensions again, to get your output. Of course with your example l, this is a bit boring:

\n
>>> [[processPair(*pair) for pair in pairs] for pairs in l]\n[[(100, 202, 2), (300, 402, 2), (500, 602, 2)], [(100, 202, 2)], [(100, 202, 2)]]\n
\n soup wrap:

Using list comprehensions:

>>> [[tuple(map(int, pair)) + (2,) for pair in pairs] for pairs in l]
[[(100, 200, 2), (300, 400, 2), (500, 600, 2)], [(100, 200, 2)], [(100, 200, 2)]]

Or without the map:

>>> [[(int(a), int(b), 2) for a, b in pairs] for pairs in l]
[[(100, 200, 2), (300, 400, 2), (500, 600, 2)], [(100, 200, 2)], [(100, 200, 2)]]

Edit

Even with further checks, you can still use list comprehension. I assume that the if/else section you have added to your question should be applied to every pair, and the resulting tuple would be (addr_from, addr_to, 2) then, right?

def processPair(a, b):
    if a.isdigit():
        a = int(a)
    elif a.isalnum():
        a = re.sub(r'((?:[A-Z].*?)?(?:\d.*?)?[A-Z]+)(\d+)', r'\1%\2', a)
    if b.isdigit():
        b = int(b) + 2
    elif b.isalnum():
        b = re.sub(r'((?:[A-Z].*?)?(?:\d.*?)?[A-Z]+)(\d+)', r'\1%\2', b)
    return (a, b, 2)

Here I have defined a function that processes the tuple (a, b) as you did in your question. Note that I have changed it to just modify the values of the variables and return a finished tuple (with the added 2) instead of appending it to some global list.

I have also simplified it a bit. a.isdigit() is True is the same as a.isdigit() as that already returns a boolean value. Same for a.isdigit() == False, which is the same as not a.isdigit(). In that situation you can also remove redundant checks. After checking a.isdigit() on the if, you do not need to check its opposite on the elif; it is guaranteed to be false, as you have already checked it before.

That being said, when you have said function, you can then use list comprehensions again, to get your output. Of course with your example l, this is a bit boring:

>>> [[processPair(*pair) for pair in pairs] for pairs in l]
[[(100, 202, 2), (300, 402, 2), (500, 602, 2)], [(100, 202, 2)], [(100, 202, 2)]]
qid & accept id: (15401415, 15401640) query: Regex Python findall. Making things nonredundant soup:

If the possible [AG].. should be included in the length requirement you can use:

\n
r'(?x) (?: [AG].. ATG | ATG G.. )  (?:...){7,}? (?:TAA|TAG|TGA)'\n
\n

Or if you don't want to include [AG].. in the match you could use lookarounds:

\n
r'(?x) ATG (?: (?<=[AG].. ATG) | (?=G) ) (?:...){8,}? (?:TAA|TAG|TGA)'\n
\n soup wrap:

If the possible [AG].. should be included in the length requirement you can use:

r'(?x) (?: [AG].. ATG | ATG G.. )  (?:...){7,}? (?:TAA|TAG|TGA)'

Or if you don't want to include [AG].. in the match you could use lookarounds:

r'(?x) ATG (?: (?<=[AG].. ATG) | (?=G) ) (?:...){8,}? (?:TAA|TAG|TGA)'
qid & accept id: (15418751, 15418945) query: iterating over list of string and combining string values Python soup:
>>> chained = itertools.chain.from_iterable(sixbit)\n>>> [''.join(bits) for bits in itertools.izip(*[chained]*8)]\n['00001100', '00010101', '00100001']\n
\n

Explanation

\n

chained is just an iterator of all the letters of the original strings. It uses chained function from itertools.

\n
>>> chained = itertools.chain.from_iterable(sixbit)\n>>> list(chained)\n['0', '0', '0', '0', '1', '1', '0', '0', '0', '0', '0', '1', '0', '1', '0', '1', '0', '0', '1', '0', '0', '0', '0', '1']\n
\n

[chained]*8 creates list containing the same chained object 8 times.

\n

* just unpacks those 8 elements into izip parameters.

\n

izip just return tuples, the first one of which contains the first letters of each chained iterator in the parameters, the second tuple contains second letters, etc. There are 8 chained objects, so there are 8 letters in each tuple.

\n

Most importantly, the letters are taken from each iterator, but it is in fact 8 instances of the same iterator. And it is consumed by each call. So the first tuple contains the first 8 letters of the chained iterator.

\n
>>> chained = itertools.chain.from_iterable(sixbit)\n>>> list(itertools.izip(*[chained]*8))\n[('0', '0', '0', '0', '1', '1', '0', '0'), ('0', '0', '0', '1', '0', '1', '0', '1'), ('0', '0', '1', '0', '0', '0', '0', '1')]\n
\n

At the last step, we join them in list comprehension:

\n
>>> chained = itertools.chain.from_iterable(sixbit)\n>>> [''.join(bits) for bits in itertools.izip(*[chained]*8)]\n['00001100', '00010101', '00100001']\n
\n soup wrap:
>>> chained = itertools.chain.from_iterable(sixbit)
>>> [''.join(bits) for bits in itertools.izip(*[chained]*8)]
['00001100', '00010101', '00100001']

Explanation

chained is just an iterator of all the letters of the original strings. It uses chained function from itertools.

>>> chained = itertools.chain.from_iterable(sixbit)
>>> list(chained)
['0', '0', '0', '0', '1', '1', '0', '0', '0', '0', '0', '1', '0', '1', '0', '1', '0', '0', '1', '0', '0', '0', '0', '1']

[chained]*8 creates list containing the same chained object 8 times.

* just unpacks those 8 elements into izip parameters.

izip just return tuples, the first one of which contains the first letters of each chained iterator in the parameters, the second tuple contains second letters, etc. There are 8 chained objects, so there are 8 letters in each tuple.

Most importantly, the letters are taken from each iterator, but it is in fact 8 instances of the same iterator. And it is consumed by each call. So the first tuple contains the first 8 letters of the chained iterator.

>>> chained = itertools.chain.from_iterable(sixbit)
>>> list(itertools.izip(*[chained]*8))
[('0', '0', '0', '0', '1', '1', '0', '0'), ('0', '0', '0', '1', '0', '1', '0', '1'), ('0', '0', '1', '0', '0', '0', '0', '1')]

At the last step, we join them in list comprehension:

>>> chained = itertools.chain.from_iterable(sixbit)
>>> [''.join(bits) for bits in itertools.izip(*[chained]*8)]
['00001100', '00010101', '00100001']
qid & accept id: (15425455, 15425693) query: Insert tree kind of data taken from a database into a python dictionary soup:

The problem is that you can't directly see the nodes for the second level after you insert them. Try this:

\n
conx = sqlite3.connect( 'nameofdatabase.db' )\ncurs = conx.cursor()\ncurs.execute( 'SELECT COMPONENT_ID, LEVEL, COMPONENT_NAME, PARENT ' +\n              'FROM DOMAIN_HIERARCHY' )\nrows = curs.fetchall()\ncmap = {}\nhrcy = None\nfor row in rows:\n    entry = (row[2], {})\n    cmap[row[0]] = entry\n    if row[1] == 1:\n        hrcy = {entry[0]: entry[1]}\n\n# raise if hrcy is None\n\nfor row in rows:\n    item = cmap[row[0]]\n    parent = cmap.get(row[3], None)\n    if parent is not None:\n        parent[1][row[2]] = item[1]\n\nprint hrcy\n
\n

By keeping each component's map of subcomponents in cmap, I can always reach each parent's map to add the next component to it. I tried it with the following test data:

\n
rows = [(1,1,'A',0),\n        (2,2,'AA',1),\n        (3,2,'AB',1),\n        (4,3,'AAA',2),\n        (5,3,'AAB',2),\n        (6,3,'ABA',3),\n        (7,3,'ABB',3)]       \n
\n

The output was this:

\n
{'A': {'AA': {'AAA': {}, 'AAB': {}}, 'AB': {'ABA': {}, 'ABB': {}}}}\n
\n soup wrap:

The problem is that you can't directly see the nodes for the second level after you insert them. Try this:

conx = sqlite3.connect( 'nameofdatabase.db' )
curs = conx.cursor()
curs.execute( 'SELECT COMPONENT_ID, LEVEL, COMPONENT_NAME, PARENT ' +
              'FROM DOMAIN_HIERARCHY' )
rows = curs.fetchall()
cmap = {}
hrcy = None
for row in rows:
    entry = (row[2], {})
    cmap[row[0]] = entry
    if row[1] == 1:
        hrcy = {entry[0]: entry[1]}

# raise if hrcy is None

for row in rows:
    item = cmap[row[0]]
    parent = cmap.get(row[3], None)
    if parent is not None:
        parent[1][row[2]] = item[1]

print hrcy

By keeping each component's map of subcomponents in cmap, I can always reach each parent's map to add the next component to it. I tried it with the following test data:

rows = [(1,1,'A',0),
        (2,2,'AA',1),
        (3,2,'AB',1),
        (4,3,'AAA',2),
        (5,3,'AAB',2),
        (6,3,'ABA',3),
        (7,3,'ABB',3)]       

The output was this:

{'A': {'AA': {'AAA': {}, 'AAB': {}}, 'AB': {'ABA': {}, 'ABB': {}}}}
qid & accept id: (15432775, 15433131) query: lxml findall div and span tags soup:
import lxml.html\nfrom lxml.cssselect import CSSSelector\ncontent = result.read()\npage_html = lxml.html.fromstring(content)\n\nelements = page_html.xpath('//*[self::div or self::span]')\n
\n

or

\n
sd_selector = CSSSelector('span,div')\nelements = sd_selector(page_html)\n
\n soup wrap:
import lxml.html
from lxml.cssselect import CSSSelector
content = result.read()
page_html = lxml.html.fromstring(content)

elements = page_html.xpath('//*[self::div or self::span]')

or

sd_selector = CSSSelector('span,div')
elements = sd_selector(page_html)
qid & accept id: (15456166, 15456185) query: python periodic looping idiom? soup:

sleep_until(timestamp) is basically time.sleep(timestamp - time.time()).

\n

Your code is fine actually (making sure you don't pass negative times to sleep is still a good idea though):

\n
import time\n\nminute = 60\nnext_time = time.time()\nwhile True:\n    doSomeWork()\n    next_time += minute\n    sleep_time = next_time - time.time()\n    if sleep_time > 0:\n        time.sleep(sleep_time)\n
\n

 

\n

I personally would make a generator of 60-second-spaced timestamps and use it:

\n
import time\nimport itertools\n\nminute = 60\n\nfor next_time in itertools.count(time.time() + minute, minute):\n    doSomeWork()\n    sleep_time = next_time - time.time()\n    if sleep_time > 0:\n        time.sleep(sleep_time)\n
\n soup wrap:

sleep_until(timestamp) is basically time.sleep(timestamp - time.time()).

Your code is fine actually (making sure you don't pass negative times to sleep is still a good idea though):

import time

minute = 60
next_time = time.time()
while True:
    doSomeWork()
    next_time += minute
    sleep_time = next_time - time.time()
    if sleep_time > 0:
        time.sleep(sleep_time)

 

I personally would make a generator of 60-second-spaced timestamps and use it:

import time
import itertools

minute = 60

for next_time in itertools.count(time.time() + minute, minute):
    doSomeWork()
    sleep_time = next_time - time.time()
    if sleep_time > 0:
        time.sleep(sleep_time)
qid & accept id: (15459675, 15460528) query: How to get python dictionaries into a pandas time series dataframe where key is date object soup:

use DataFrame.from_dict:

\n
import pandas as pd\nimport datetime\ntimeseries = {datetime.datetime(2013, 3, 17, 18, 19): {'t2': 400, 't1': 1000},\n                 datetime.datetime(2013, 3, 17, 18, 20): {'t2': 300, 't1': 3000}\n                }\nprint pd.DataFrame.from_dict(timeseries, orient="index")\n
\n

output:

\n
                      t2    t1\n2013-03-17 18:19:00  400  1000\n2013-03-17 18:20:00  300  3000\n
\n soup wrap:

use DataFrame.from_dict:

import pandas as pd
import datetime
timeseries = {datetime.datetime(2013, 3, 17, 18, 19): {'t2': 400, 't1': 1000},
                 datetime.datetime(2013, 3, 17, 18, 20): {'t2': 300, 't1': 3000}
                }
print pd.DataFrame.from_dict(timeseries, orient="index")

output:

                      t2    t1
2013-03-17 18:19:00  400  1000
2013-03-17 18:20:00  300  3000
qid & accept id: (15462548, 15462584) query: How do I make a function to accept an argument that is another function? soup:

You almost got it; you just need to not call bear_room when you're passing it as an argument:

\n
    elif next == 'exit':\n        exit_game(bear_room)\n
\n

Conversely, you need to call stage as a function:

\n
    elif con_ext == 'no':\n        stage()\n
\n soup wrap:

You almost got it; you just need to not call bear_room when you're passing it as an argument:

    elif next == 'exit':
        exit_game(bear_room)

Conversely, you need to call stage as a function:

    elif con_ext == 'no':
        stage()
qid & accept id: (15489749, 15489765) query: Create a list of keys given a value in a dictionary soup:

Use a list comprehension over the dict's items:

\n
[k for k, v in child_parent.items() if v == 0]\n
\n

 

\n
>>> [k for k, v in child_parent.items() if v == 0]\n [1, 2]\n\n>>> [k for k, v in child_parent.items() if v == 2]\n [3, 4]\n
\n soup wrap:

Use a list comprehension over the dict's items:

[k for k, v in child_parent.items() if v == 0]

 

>>> [k for k, v in child_parent.items() if v == 0]
 [1, 2]

>>> [k for k, v in child_parent.items() if v == 2]
 [3, 4]
qid & accept id: (15508970, 15509225) query: Query endpoint user by email soup:

I'm assuming User is some custom model that inherits from EndpointsModel. If not, this will fail. In other words, you've done something like this:

\n
from google.appengine.ext import ndb\nfrom endpoints_proto_datastore.ndb import EndpointsModel\n\nclass User(EndpointsModel):\n    email = ndb.StringProperty()\n    ...\n
\n

There are two primary approaches to solving this problem, you could either use the email as the key for the entity or roll your own query and try to fetch two entities to see if your result is unique and exists.

\n

OPTION 1: Use email as the key

\n

Instead of doing a full-on query, you could do a simple get.

\n
from google.appengine.ext import endpoints\n\n@endpoints.api(...)\nclass SomeClass(...):\n\n    @User.method(request_fields=('email',),\n                 path='get_by_mail/{email}',\n                 http_method='GET', name='user.get_by_email')\n    def get_by_email(self, user):\n        if not user.from_datastore:\n            raise endpoints.NotFoundException('User not found.')\n        return user\n
\n

by using the email as the datastore key for each entity, as is done in the custom alias properties sample. For example:

\n
from endpoints_proto_datastore.ndb import EndpointsAliasProperty\n\nclass User(EndpointsModel):\n    # remove email here, as it will be an alias property \n    ...\n\n    def EmailSet(self, value):\n        # Validate the value any way you like\n        self.UpdateFromKey(ndb.Key(User, value))\n\n    @EndpointsAliasProperty(setter=IdSet, required=True)\n    def email(self):\n        if self.key is not None: return self.key.string_id()\n
\n

OPTION 2: Roll your own query

\n
    @User.method(request_fields=('email',),\n                 path='get_by_mail/{email}',\n                 http_method='GET', name='user.get_by_email')\n    def get_by_email(self, user):\n        query = User.query(User.email == user.email)\n        # We fetch 2 to make sure we have\n        matched_users = query.fetch(2)\n        if len(matched_users == 0):\n            raise endpoints.NotFoundException('User not found.')\n        elif len(matched_users == 2):\n            raise endpoints.BadRequestException('User not unique.')\n        else:\n            return matched_users[0]\n
\n soup wrap:

I'm assuming User is some custom model that inherits from EndpointsModel. If not, this will fail. In other words, you've done something like this:

from google.appengine.ext import ndb
from endpoints_proto_datastore.ndb import EndpointsModel

class User(EndpointsModel):
    email = ndb.StringProperty()
    ...

There are two primary approaches to solving this problem, you could either use the email as the key for the entity or roll your own query and try to fetch two entities to see if your result is unique and exists.

OPTION 1: Use email as the key

Instead of doing a full-on query, you could do a simple get.

from google.appengine.ext import endpoints

@endpoints.api(...)
class SomeClass(...):

    @User.method(request_fields=('email',),
                 path='get_by_mail/{email}',
                 http_method='GET', name='user.get_by_email')
    def get_by_email(self, user):
        if not user.from_datastore:
            raise endpoints.NotFoundException('User not found.')
        return user

by using the email as the datastore key for each entity, as is done in the custom alias properties sample. For example:

from endpoints_proto_datastore.ndb import EndpointsAliasProperty

class User(EndpointsModel):
    # remove email here, as it will be an alias property 
    ...

    def EmailSet(self, value):
        # Validate the value any way you like
        self.UpdateFromKey(ndb.Key(User, value))

    @EndpointsAliasProperty(setter=IdSet, required=True)
    def email(self):
        if self.key is not None: return self.key.string_id()

OPTION 2: Roll your own query

    @User.method(request_fields=('email',),
                 path='get_by_mail/{email}',
                 http_method='GET', name='user.get_by_email')
    def get_by_email(self, user):
        query = User.query(User.email == user.email)
        # We fetch 2 to make sure we have
        matched_users = query.fetch(2)
        if len(matched_users == 0):
            raise endpoints.NotFoundException('User not found.')
        elif len(matched_users == 2):
            raise endpoints.BadRequestException('User not unique.')
        else:
            return matched_users[0]
qid & accept id: (15510367, 15510911) query: Return All Matching Lines in a Logfile soup:

you should probably use regular expressions (also called regex) for that. \nPython has the re module which implements regex for python.

\n

See this as an example for the direction to look at: stackoverflow question finding multiple matches in a string.

\n

Excerpt from the above:\nLogfile looks like:

\n
[1242248375] SERVICE ALERT: myhostname.com;DNS: Recursive;CRITICAL\n
\n

regex looks like:

\n
regexp = re.compile(r'\[(\d+)\] SERVICE NOTIFICATION: (.+)')\n
\n

which goes like this:

\n
    \n
  • r => raw string (alway recommended in regexes)
  • \n
  • \[ => matches the square bracket (which would be a special character otherwise)
  • \n
  • (\d+) => matches one ore more decimals \d = decimals and the + for 1 or more
  • \n
  • \] => followed by a closing square bracket
  • \n
  • SERVICE NOTIFICATION: => matches exactly these characters in sequence.
  • \n
  • (.+) => the . (dot) matches any character. And again the + means 1 or more
  • \n
\n

Parantheses group the results.

\n

I made a short regex to start with your logfile format. Assuming your log from above is saved as log.txt.

\n
import re\nregexp = re.compile(r'\[(\d{2}:\d{2}:\d{2}\.xxx\d{3})\][\s]+status[\s]+XYZ[\s]+ID:([0-9A-Zx]+)(.+)')\n\nf = open("log.txt", "r")\nfor line in f.readlines():\n    print line\n    m = re.match(regexp, line)\n    #print m\n    if m:\n        print m.groups()\n
\n

Regexes are not that easy looking or straightforward at first glance but if you search for regex or re AND python you will find helpful examples.

\n

Outpus this for me:

\n
[13:40:19.xxx021] status    XYZ  ID:22P00935xxx -4  3.92     quote:    0.98/   1.02  avg:   -0.98   -0.16\n\n('13:40:19.xxx021', '22P00935xxx', ' -4  3.92     quote:    0.98/   1.02  avg:   -0.98   -0.16')\n[13:40:19.xxx024] status    XYZ  ID:22C0099xxx0 -2  26.4     quote:   11.60/  11.85  avg:  -13.20    2.70\n\n('13:40:19.xxx024', '22C0099xxx0', ' -2  26.4     quote:   11.60/  11.85  avg:  -13.20    2.70')\n[13:40:19.xxx027] status    XYZ  ID:22P0099xxx0 10  -17.18   quote:    1.86/   1.90  avg:   -1.72    1.42\n\n('13:40:19.xxx027', '22P0099xxx0', ' 10  -17.18   quote:    1.86/   1.90  avg:   -1.72    1.42')\n[13:40:19.xxx029] status    XYZ  ID:22C00995xxx 4   -42.5    quote:    8.20/   8.30  avg:  -10.62   -9.70\n\n('13:40:19.xxx029', '22C00995xxx', ' 4   -42.5    quote:    8.20/   8.30  avg:  -10.62   -9.70')\n[13:40:19.xxx031] status    XYZ  ID:22P00995xxx 2   9.66     quote:    3.30/   3.40  avg:    4.83   16.26\n('13:40:19.xxx031', '22P00995xxx', ' 2   9.66     quote:    3.30/   3.40  avg:    4.83   16.26')\n
\n

Every second line is the output which is a list containing the matched groups.

\n

If you add this to the programm above:

\n
print "ID is : ", m.groups()[1]\n
\n

the output is:

\n
[13:40:19.xxx021] status    XYZ  ID:22P00935xxx -4  3.92     quote:    0.98/   1.02  avg:   -0.98   -0.16\n\nID is :  22P00935xxx\n\n[13:40:19.xxx024] status    XYZ  ID:22C0099xxx0 -2  26.4     quote:   11.60/  11.85  avg:  -13.20    2.70\n\nID is :  22C0099xxx0\n
\n

Which matches your IDs you want to compare. Just play with it a little to get the result you really want.

\n

Final example \ncatches the ID, tests if its already there and adds the matched lines to a dictionary which has te IDs as its key:

\n

import re\nregexp = re.compile(r'[(\d{2}:\d{2}:\d{2}.xxx\d{3})][\s]+status[\s]+XYZ[\s]+ID:([0-9A-Zx]+)(.+)')

\n
res = {}\n\nf = open("log.txt", "r")\nfor line in f.readlines():\n    print line\n    m = re.match(regexp, line)  \n    if m:\n        print m.groups()\n        id = m.groups()[1]\n        if id in res:\n            #print "added to existing ID"\n            res[id].append([m.groups()[0], m.groups()[2]])\n        else:\n            #print "new ID"\n            res[id] = [m.groups()[0], m.groups()[2]]\n\nfor id in res:\n    print "ID: ", id\n    print res[id]\n
\n

Now you can play around and fine tune it to adapt it to your needs.

\n soup wrap:

you should probably use regular expressions (also called regex) for that. Python has the re module which implements regex for python.

See this as an example for the direction to look at: stackoverflow question finding multiple matches in a string.

Excerpt from the above: Logfile looks like:

[1242248375] SERVICE ALERT: myhostname.com;DNS: Recursive;CRITICAL

regex looks like:

regexp = re.compile(r'\[(\d+)\] SERVICE NOTIFICATION: (.+)')

which goes like this:

  • r => raw string (alway recommended in regexes)
  • \[ => matches the square bracket (which would be a special character otherwise)
  • (\d+) => matches one ore more decimals \d = decimals and the + for 1 or more
  • \] => followed by a closing square bracket
  • SERVICE NOTIFICATION: => matches exactly these characters in sequence.
  • (.+) => the . (dot) matches any character. And again the + means 1 or more

Parantheses group the results.

I made a short regex to start with your logfile format. Assuming your log from above is saved as log.txt.

import re
regexp = re.compile(r'\[(\d{2}:\d{2}:\d{2}\.xxx\d{3})\][\s]+status[\s]+XYZ[\s]+ID:([0-9A-Zx]+)(.+)')

f = open("log.txt", "r")
for line in f.readlines():
    print line
    m = re.match(regexp, line)
    #print m
    if m:
        print m.groups()

Regexes are not that easy looking or straightforward at first glance but if you search for regex or re AND python you will find helpful examples.

Outpus this for me:

[13:40:19.xxx021] status    XYZ  ID:22P00935xxx -4  3.92     quote:    0.98/   1.02  avg:   -0.98   -0.16

('13:40:19.xxx021', '22P00935xxx', ' -4  3.92     quote:    0.98/   1.02  avg:   -0.98   -0.16')
[13:40:19.xxx024] status    XYZ  ID:22C0099xxx0 -2  26.4     quote:   11.60/  11.85  avg:  -13.20    2.70

('13:40:19.xxx024', '22C0099xxx0', ' -2  26.4     quote:   11.60/  11.85  avg:  -13.20    2.70')
[13:40:19.xxx027] status    XYZ  ID:22P0099xxx0 10  -17.18   quote:    1.86/   1.90  avg:   -1.72    1.42

('13:40:19.xxx027', '22P0099xxx0', ' 10  -17.18   quote:    1.86/   1.90  avg:   -1.72    1.42')
[13:40:19.xxx029] status    XYZ  ID:22C00995xxx 4   -42.5    quote:    8.20/   8.30  avg:  -10.62   -9.70

('13:40:19.xxx029', '22C00995xxx', ' 4   -42.5    quote:    8.20/   8.30  avg:  -10.62   -9.70')
[13:40:19.xxx031] status    XYZ  ID:22P00995xxx 2   9.66     quote:    3.30/   3.40  avg:    4.83   16.26
('13:40:19.xxx031', '22P00995xxx', ' 2   9.66     quote:    3.30/   3.40  avg:    4.83   16.26')

Every second line is the output which is a list containing the matched groups.

If you add this to the programm above:

print "ID is : ", m.groups()[1]

the output is:

[13:40:19.xxx021] status    XYZ  ID:22P00935xxx -4  3.92     quote:    0.98/   1.02  avg:   -0.98   -0.16

ID is :  22P00935xxx

[13:40:19.xxx024] status    XYZ  ID:22C0099xxx0 -2  26.4     quote:   11.60/  11.85  avg:  -13.20    2.70

ID is :  22C0099xxx0

Which matches your IDs you want to compare. Just play with it a little to get the result you really want.

Final example catches the ID, tests if its already there and adds the matched lines to a dictionary which has te IDs as its key:

import re regexp = re.compile(r'[(\d{2}:\d{2}:\d{2}.xxx\d{3})][\s]+status[\s]+XYZ[\s]+ID:([0-9A-Zx]+)(.+)')

res = {}

f = open("log.txt", "r")
for line in f.readlines():
    print line
    m = re.match(regexp, line)  
    if m:
        print m.groups()
        id = m.groups()[1]
        if id in res:
            #print "added to existing ID"
            res[id].append([m.groups()[0], m.groups()[2]])
        else:
            #print "new ID"
            res[id] = [m.groups()[0], m.groups()[2]]

for id in res:
    print "ID: ", id
    print res[id]

Now you can play around and fine tune it to adapt it to your needs.

qid & accept id: (15516876, 15517167) query: How can I get a Tuple from a list in python (3.3) soup:

Let list_ be the list of tuples and c_code the company code, read from input via raw_input or from some GUI via some control (if you need help with that, please tell me.

\n

You could use either list comprehension:

\n
matching_results = [t for t in list_ if t[0] == c_code]\n
\n

or the built-in filter function:

\n
matching_results = filter(lambda t: t[0]==c_code, list_)\n
\n

Be careful with version 2: in Python 3, filter is generator-style, i.e. it does not create a list, but you can iterate over it. To get a list in Python 3, you would have to call list(...) on this generator.

\n

EDIT

\n

If you have a list of company codes, c_codes, you can do

\n
matching_results = [t for t in list_ if t[0] in c_codes]\n
\n

This should be the easiest possible way.

\n soup wrap:

Let list_ be the list of tuples and c_code the company code, read from input via raw_input or from some GUI via some control (if you need help with that, please tell me.

You could use either list comprehension:

matching_results = [t for t in list_ if t[0] == c_code]

or the built-in filter function:

matching_results = filter(lambda t: t[0]==c_code, list_)

Be careful with version 2: in Python 3, filter is generator-style, i.e. it does not create a list, but you can iterate over it. To get a list in Python 3, you would have to call list(...) on this generator.

EDIT

If you have a list of company codes, c_codes, you can do

matching_results = [t for t in list_ if t[0] in c_codes]

This should be the easiest possible way.

qid & accept id: (15547217, 15547295) query: Handling months in python datetimes soup:

Use a timedelta(days=1) offset of the beginning of this month:

\n
import datetime\n\ndef get_start_of_previous_month(dt):\n    '''\n    Return the datetime corresponding to the start of the month\n    before the provided datetime.\n    '''\n    previous = dt.date().replace(day=1) - datetime.timedelta(days=1)\n    return datetime.datetime.combine(previous.replace(day=1), datetime.time.min)\n
\n

.replace(day=1) returns a new date that is at the start of the current month, after which subtracting a day is going to guarantee that we end up in the month before. Then we pull the same trick again to get the first day of that month.

\n

Demo (on Python 2.4 to be sure):

\n
>>> get_start_of_previous_month(datetime.datetime.now())\ndatetime.datetime(2013, 2, 1, 0, 0)\n>>> get_start_of_previous_month(datetime.datetime(2013, 1, 21, 12, 23))\ndatetime.datetime(2012, 12, 1, 0, 0)\n
\n soup wrap:

Use a timedelta(days=1) offset of the beginning of this month:

import datetime

def get_start_of_previous_month(dt):
    '''
    Return the datetime corresponding to the start of the month
    before the provided datetime.
    '''
    previous = dt.date().replace(day=1) - datetime.timedelta(days=1)
    return datetime.datetime.combine(previous.replace(day=1), datetime.time.min)

.replace(day=1) returns a new date that is at the start of the current month, after which subtracting a day is going to guarantee that we end up in the month before. Then we pull the same trick again to get the first day of that month.

Demo (on Python 2.4 to be sure):

>>> get_start_of_previous_month(datetime.datetime.now())
datetime.datetime(2013, 2, 1, 0, 0)
>>> get_start_of_previous_month(datetime.datetime(2013, 1, 21, 12, 23))
datetime.datetime(2012, 12, 1, 0, 0)
qid & accept id: (15580917, 15581020) query: Python: Data validation using regular expression soup:

In a regex, the metacharacters ^ and $ mean "start-of-string" and "end-of-string" (respectively); so, rather than seeing what matches, and comparing it to the whole string, you can simply require that the regex match the whole string to begin with:

\n
import re\ndata = "asdsaq2323-asds"\nif re.compile("^[a-zA-Z0-9*]+$").match(data):\n    print "match"\nelse:\n    print "no match"\n
\n

In addition, since you're only using the regex once — you compile it and immediately use it — you can use the convenience method re.match to handle that as a single step:

\n
import re\ndata = "asdsaq2323-asds"\nif re.match("^[a-zA-Z0-9*]+$", data):\n    print "match"\nelse:\n    print "no match"\n
\n soup wrap:

In a regex, the metacharacters ^ and $ mean "start-of-string" and "end-of-string" (respectively); so, rather than seeing what matches, and comparing it to the whole string, you can simply require that the regex match the whole string to begin with:

import re
data = "asdsaq2323-asds"
if re.compile("^[a-zA-Z0-9*]+$").match(data):
    print "match"
else:
    print "no match"

In addition, since you're only using the regex once — you compile it and immediately use it — you can use the convenience method re.match to handle that as a single step:

import re
data = "asdsaq2323-asds"
if re.match("^[a-zA-Z0-9*]+$", data):
    print "match"
else:
    print "no match"
qid & accept id: (15620039, 15620214) query: Best way to reset keys which expires in few minutes in python soup:

In outline:

\n
    \n
  1. Create an object class to represent each key;
  2. \n
  3. Have as an element of each instance of that object it's timed life;
  4. \n
  5. have a call-back to call the instance at the appropriate time;
  6. \n
  7. the object decides to die or re-up at the time-out
  8. \n
\n

Here is a trivial example:

\n
import threading, time, random\n\nclass Key(object):\n    results={}\n    def __init__(self,refresh,name):\n        self.refresh=refresh\n        self.name=name\n        self.t0=time.time()\n        self.t=threading.Timer(refresh,self.now_what)\n        self.t.start()\n\n    def now_what(self):\n        s='{}: {:6.4f}'.format(self.name,time.time()-self.t0)\n        Key.results.setdefault(self.refresh,[]).append(s)\n        # do the thing you want at this time ref with the Key...\n\n    def time_left(self):\n        return max(self.t0+self.refresh-time.time(),0)\n\nkeys=[Key(random.randint(2,15),'Key {}'.format(i)) for i in range(1,1001)]\nt=time.time()\nwhile any(key.time_left() for key in keys):\n    if time.time()-t > 1:\n        kc=filter(lambda x: x, (key.time_left() for key in keys))\n        if kc:\n            tmpl='{} keys; max life: {:.2f}; average life: {:.2f}'\n            print tmpl.format(len(kc),max(kc),sum(kc)/len(kc))\n            t=time.time()\n\nfor k in sorted(Key.results):\n    print '\nKeys with {} secs life:'.format(k)\n    for e in Key.results[k]:\n        print '\t{}'.format(e)\n
\n

Prints:

\n
1000 keys; max life: 13.98; average life: 7.38\n933 keys; max life: 12.98; average life: 6.85\n870 keys; max life: 11.97; average life: 6.29\n796 keys; max life: 10.97; average life: 5.80\n729 keys; max life: 9.97; average life: 5.26\n666 keys; max life: 8.96; average life: 4.68\n594 keys; max life: 7.96; average life: 4.16\n504 keys; max life: 6.96; average life: 3.77\n427 keys; max life: 5.96; average life: 3.32\n367 keys; max life: 4.95; average life: 2.74\n304 keys; max life: 3.95; average life: 2.16\n215 keys; max life: 2.95; average life: 1.76\n138 keys; max life: 1.95; average life: 1.32\n84 keys; max life: 0.95; average life: 0.72\n\nKeys with 2 secs life:\n    Key 26: 2.0052\n    Key 27: 2.0053\n    Key 41: 2.0048\n    ...\nKeys with 3 secs life:\n    Key 4: 3.0040\n    Key 31: 3.0065\n    Key 32: 3.0111\n    ...\nKeys with 4 secs life:\n...\n
\n

You can see that there is some variability in accuracy, but it is with 1/100 sec for most purposes.

\n soup wrap:

In outline:

  1. Create an object class to represent each key;
  2. Have as an element of each instance of that object it's timed life;
  3. have a call-back to call the instance at the appropriate time;
  4. the object decides to die or re-up at the time-out

Here is a trivial example:

import threading, time, random

class Key(object):
    results={}
    def __init__(self,refresh,name):
        self.refresh=refresh
        self.name=name
        self.t0=time.time()
        self.t=threading.Timer(refresh,self.now_what)
        self.t.start()

    def now_what(self):
        s='{}: {:6.4f}'.format(self.name,time.time()-self.t0)
        Key.results.setdefault(self.refresh,[]).append(s)
        # do the thing you want at this time ref with the Key...

    def time_left(self):
        return max(self.t0+self.refresh-time.time(),0)

keys=[Key(random.randint(2,15),'Key {}'.format(i)) for i in range(1,1001)]
t=time.time()
while any(key.time_left() for key in keys):
    if time.time()-t > 1:
        kc=filter(lambda x: x, (key.time_left() for key in keys))
        if kc:
            tmpl='{} keys; max life: {:.2f}; average life: {:.2f}'
            print tmpl.format(len(kc),max(kc),sum(kc)/len(kc))
            t=time.time()

for k in sorted(Key.results):
    print '\nKeys with {} secs life:'.format(k)
    for e in Key.results[k]:
        print '\t{}'.format(e)

Prints:

1000 keys; max life: 13.98; average life: 7.38
933 keys; max life: 12.98; average life: 6.85
870 keys; max life: 11.97; average life: 6.29
796 keys; max life: 10.97; average life: 5.80
729 keys; max life: 9.97; average life: 5.26
666 keys; max life: 8.96; average life: 4.68
594 keys; max life: 7.96; average life: 4.16
504 keys; max life: 6.96; average life: 3.77
427 keys; max life: 5.96; average life: 3.32
367 keys; max life: 4.95; average life: 2.74
304 keys; max life: 3.95; average life: 2.16
215 keys; max life: 2.95; average life: 1.76
138 keys; max life: 1.95; average life: 1.32
84 keys; max life: 0.95; average life: 0.72

Keys with 2 secs life:
    Key 26: 2.0052
    Key 27: 2.0053
    Key 41: 2.0048
    ...
Keys with 3 secs life:
    Key 4: 3.0040
    Key 31: 3.0065
    Key 32: 3.0111
    ...
Keys with 4 secs life:
...

You can see that there is some variability in accuracy, but it is with 1/100 sec for most purposes.

qid & accept id: (15635341, 15637045) query: Run parts of a ipython notebook in a loop / with different input parameter soup:

What I usually do in these scenarios is wrap the important cells as functions (you don't have to merge any of them) and have a certain master cell that iterates over a list of parameters and calls these functions. E.g. this is what a "master cell" looks like in one of my notebooks:

\n
import itertools\n# parameters\nP_peak_all = [100, 200]\nidle_ratio_all = [0., 0.3, 0.6]\n# iterate through these parameters and call the notebook's logic\nfor P_peak, idle_ratio in itertools.product(P_peak_all, idle_ratio_all):\n    print(P_peak, idle_ratio, P_peak*idle_ratio)\n    print('========================')\n    m_synth, m_synth_ns = build_synth_measurement(P_peak, idle_ratio)\n    compare_measurements(m_synth, m_synth_ns, "Peak pauser", "No scheduler", file_note="-%d-%d" % (P_peak, int(idle_ratio*100)))\n
\n

You can still have some data dragging throughout the notebook (i.e. calling each function at the bottom of the cell with your data) to be able to test stuff live for individual cells. For example some cell might state:

\n
def square(x):\n    y = x**2\n    return y\nsquare(x) # where x is your data running from the prior cells \n
\n

Which lets you experiment live and still call the generic functionality from the master cell.

\n

I know it's some additional work to refactor your notebook using functions, but I found it actually increases my notebook's readability which is useful when you come back to it after a longer period and it's easier to convert it to a "proper" script or module if necessary.

\n soup wrap:

What I usually do in these scenarios is wrap the important cells as functions (you don't have to merge any of them) and have a certain master cell that iterates over a list of parameters and calls these functions. E.g. this is what a "master cell" looks like in one of my notebooks:

import itertools
# parameters
P_peak_all = [100, 200]
idle_ratio_all = [0., 0.3, 0.6]
# iterate through these parameters and call the notebook's logic
for P_peak, idle_ratio in itertools.product(P_peak_all, idle_ratio_all):
    print(P_peak, idle_ratio, P_peak*idle_ratio)
    print('========================')
    m_synth, m_synth_ns = build_synth_measurement(P_peak, idle_ratio)
    compare_measurements(m_synth, m_synth_ns, "Peak pauser", "No scheduler", file_note="-%d-%d" % (P_peak, int(idle_ratio*100)))

You can still have some data dragging throughout the notebook (i.e. calling each function at the bottom of the cell with your data) to be able to test stuff live for individual cells. For example some cell might state:

def square(x):
    y = x**2
    return y
square(x) # where x is your data running from the prior cells 

Which lets you experiment live and still call the generic functionality from the master cell.

I know it's some additional work to refactor your notebook using functions, but I found it actually increases my notebook's readability which is useful when you come back to it after a longer period and it's easier to convert it to a "proper" script or module if necessary.

qid & accept id: (15649257, 15658817) query: Django wildcard query soup:

There is no direct wildcard parameter, so what you have is perfectly acceptable. Code readability also counts, so even though you might end up with more code, it may be more maintainable.

\n

You could chain the queryset like this:

\n
provider = request.POST.get('provider')\n\norder_items = OrderItem.objects.all()\nif provider is not None:\n    order_items = order_items.filter(provider=provider)\n
\n

Or you could set up an empty dictionary of kwargs for a filter() call like this, but it's less readable in my opinion:

\n
provider = request.POST.get('provider')\nkwargs = {}\nif provider is not None:\n    kwargs['provider'] = provider\n\norder_items = OrderItem.objects.filter(**kwargs)\n
\n

This could be turned into a function call like this:

\n
def all_or_filter_args(request, item):\n    """Return dictionary of arguments for filter() if item is specified in request."""\n    value = request.get(item)\n    if value is None:\n        return {}\n    return { item : value }\n
\n

and then the following one-liner used for queries:

\n
order_items = OrderItem.objects.filter(**all_or_filter_args(request, 'provider'))\n
\n

But again, I don't find this as readable as the example you've given.

\n

Another approach is to write a custom manager which allows you to use your own functions for filtering. With a custom manager you could implement code to allow queries like this, where you provide a function for all_or_filtered to apply the correct filtering:

\n
order_items = OrderItem.objects.all_or_filtered('provider', request.POST.get('provider'))\n
\n soup wrap:

There is no direct wildcard parameter, so what you have is perfectly acceptable. Code readability also counts, so even though you might end up with more code, it may be more maintainable.

You could chain the queryset like this:

provider = request.POST.get('provider')

order_items = OrderItem.objects.all()
if provider is not None:
    order_items = order_items.filter(provider=provider)

Or you could set up an empty dictionary of kwargs for a filter() call like this, but it's less readable in my opinion:

provider = request.POST.get('provider')
kwargs = {}
if provider is not None:
    kwargs['provider'] = provider

order_items = OrderItem.objects.filter(**kwargs)

This could be turned into a function call like this:

def all_or_filter_args(request, item):
    """Return dictionary of arguments for filter() if item is specified in request."""
    value = request.get(item)
    if value is None:
        return {}
    return { item : value }

and then the following one-liner used for queries:

order_items = OrderItem.objects.filter(**all_or_filter_args(request, 'provider'))

But again, I don't find this as readable as the example you've given.

Another approach is to write a custom manager which allows you to use your own functions for filtering. With a custom manager you could implement code to allow queries like this, where you provide a function for all_or_filtered to apply the correct filtering:

order_items = OrderItem.objects.all_or_filtered('provider', request.POST.get('provider'))
qid & accept id: (15668416, 15669247) query: Implementing 2D slicing in Python soup:

You pretty much have to do something like this… but at least you can remove some duplication.

\n

First, it's probably reasonable to consider [1,] to mean "row 1", just like [1]. (numpy does this.) That means you don't need the tuple-vs.-int thing; just treat an int as a 1-element tuple. In other words:

\n
def __getitem__(self, idx):\n    if isinstance(idx, numbers.Integral):\n        idx = (idx, slice(None, None, None))\n    # now the rest of your code only needs to handle tuples\n
\n

Second, although your sample code only handles the case of two slices, your real code has to handle two slices, or a slice and an int, or an int and a slice, or two ints, or a slice, or an int. If you can factor out the slice-handling code, you don't need to duplicate it over and over again.

\n

One trick for handling int-vs.-slice is to treat [n] as a wrapper that does, in essence, [n:n+1][0], which lets you reduce everything even more. (It's a tiny bit trickier than this, because you have to special-case either negative numbers in general, or just -1, because obviously n[-1] != n[-1:0][0].) For 1-D arrays this may not be worth it, but for 2D arrays it probably is, because it means while you're dealing with the column, you've always got a list of rows rather than just a row.

\n

On the other hand, you may want to share some code between __getitem__ and __setitem__… which makes some of these tricks either impossible or a lot harder. So, there's a tradeoff.

\n

At any rate, here's an example that does all the simplification and pre/postprocessing I could think of (possibly more than you want) so that ultimately you're always looking up a pair of slices:

\n
class Matrix(object):\n    def __init__(self):\n        self.m = [[row + col/10. for col in range(4)] for row in range(4)]\n    def __getitem__(self, idx):\n        if isinstance(idx, (numbers.Integral, slice)):\n            idx = (idx, slice(None, None, None))\n        elif len(idx) == 1:\n            idx = (idx[0], slice(None, None, None))\n        rowidx, colidx = idx\n        rowslice, colslice = True, True\n        if isinstance(rowidx, numbers.Integral):\n            rowidx, rowslice = slice(rowidx, rowidx+1), False\n        if isinstance(colidx, numbers.Integral):\n            colidx, colslice = slice(colidx, colidx+1), False\n        ret = self.m[rowidx][colidx]\n        if not colslice:\n            ret = [row[0] for row in ret]\n        if not rowslice:\n            ret = ret[0]\n        return ret\n
\n

Or it might be nicer if you refactored things along the other axis: Get the row(s), and then get the column(s) within it/them:

\n
def _getrow(self, idx):\n    return self.m[idx]\n\ndef __getitem__(self, idx):\n    if isinstance(idx, (numbers.Integral, slice)):\n        return self._getrow(idx)\n    rowidx, colidx = idx\n    if isinstance(rowidx, numbers.Integral):\n        return self._getrow(rowidx)[colidx]\n    else:\n        return [row[colidx] for row in self._getrow(rowidx)]\n
\n

This looks a whole lot simpler, but I'm cheating here by forwarding the second index to the normal list, which only works because my underlying storage is a list of lists. But if you have any kind of indexable row object to defer to (and it doesn't waste unacceptable time/space to create those objects unnecessarily), you can use the same cheat.

\n
\n

If you're objecting to the need to type-switch on the index parameter, yes, that does seem generally unpythonic, but unfortunately it's how __getitem__ generally works. If you want to use the usual EAFTP try logic, you can, but I don't think it's more readable when you have to try two different APIs (e.g., [0] for tuples, and .start for slices) in multiple places. You end up doing "duck-type-switching" up at the top, like this:

\n
try:\n    idx[0]\nexcept AttributeError:\n    idx = (idx, slice(None, None, None))\n
\n

… and so on, and this is just twice as much code as normal type-switching without any of the usual benefits.

\n soup wrap:

You pretty much have to do something like this… but at least you can remove some duplication.

First, it's probably reasonable to consider [1,] to mean "row 1", just like [1]. (numpy does this.) That means you don't need the tuple-vs.-int thing; just treat an int as a 1-element tuple. In other words:

def __getitem__(self, idx):
    if isinstance(idx, numbers.Integral):
        idx = (idx, slice(None, None, None))
    # now the rest of your code only needs to handle tuples

Second, although your sample code only handles the case of two slices, your real code has to handle two slices, or a slice and an int, or an int and a slice, or two ints, or a slice, or an int. If you can factor out the slice-handling code, you don't need to duplicate it over and over again.

One trick for handling int-vs.-slice is to treat [n] as a wrapper that does, in essence, [n:n+1][0], which lets you reduce everything even more. (It's a tiny bit trickier than this, because you have to special-case either negative numbers in general, or just -1, because obviously n[-1] != n[-1:0][0].) For 1-D arrays this may not be worth it, but for 2D arrays it probably is, because it means while you're dealing with the column, you've always got a list of rows rather than just a row.

On the other hand, you may want to share some code between __getitem__ and __setitem__… which makes some of these tricks either impossible or a lot harder. So, there's a tradeoff.

At any rate, here's an example that does all the simplification and pre/postprocessing I could think of (possibly more than you want) so that ultimately you're always looking up a pair of slices:

class Matrix(object):
    def __init__(self):
        self.m = [[row + col/10. for col in range(4)] for row in range(4)]
    def __getitem__(self, idx):
        if isinstance(idx, (numbers.Integral, slice)):
            idx = (idx, slice(None, None, None))
        elif len(idx) == 1:
            idx = (idx[0], slice(None, None, None))
        rowidx, colidx = idx
        rowslice, colslice = True, True
        if isinstance(rowidx, numbers.Integral):
            rowidx, rowslice = slice(rowidx, rowidx+1), False
        if isinstance(colidx, numbers.Integral):
            colidx, colslice = slice(colidx, colidx+1), False
        ret = self.m[rowidx][colidx]
        if not colslice:
            ret = [row[0] for row in ret]
        if not rowslice:
            ret = ret[0]
        return ret

Or it might be nicer if you refactored things along the other axis: Get the row(s), and then get the column(s) within it/them:

def _getrow(self, idx):
    return self.m[idx]

def __getitem__(self, idx):
    if isinstance(idx, (numbers.Integral, slice)):
        return self._getrow(idx)
    rowidx, colidx = idx
    if isinstance(rowidx, numbers.Integral):
        return self._getrow(rowidx)[colidx]
    else:
        return [row[colidx] for row in self._getrow(rowidx)]

This looks a whole lot simpler, but I'm cheating here by forwarding the second index to the normal list, which only works because my underlying storage is a list of lists. But if you have any kind of indexable row object to defer to (and it doesn't waste unacceptable time/space to create those objects unnecessarily), you can use the same cheat.


If you're objecting to the need to type-switch on the index parameter, yes, that does seem generally unpythonic, but unfortunately it's how __getitem__ generally works. If you want to use the usual EAFTP try logic, you can, but I don't think it's more readable when you have to try two different APIs (e.g., [0] for tuples, and .start for slices) in multiple places. You end up doing "duck-type-switching" up at the top, like this:

try:
    idx[0]
except AttributeError:
    idx = (idx, slice(None, None, None))

… and so on, and this is just twice as much code as normal type-switching without any of the usual benefits.

qid & accept id: (15688462, 15688485) query: How to update entire column with values in list using Sqlite3 soup:

Use the .executemany() method instead:

\n
curr.executemany('UPDATE test SET myCol= ?', myList)\n
\n

However, myList must be a sequence of tuples here. If it is not, a generator expression will do to create these tuples:

\n
curr.executemany('UPDATE test SET myCol= ?', ((val,) for val in myList))\n
\n

Demonstration (with insertions):

\n
>>> import sqlite3\n>>> conn=sqlite3.connect(':memory:')\n>>> conn.execute('CREATE TABLE test (myCol)')\n\n>>> conn.commit()\n>>> myList = ('foo', 'bar', 'spam')\n>>> conn.executemany('INSERT into test values (?)', ((val,) for val in myList))\n\n>>> list(conn.execute('select * from test'))\n[(u'foo',), (u'bar',), (u'spam',)]\n
\n

Note that for an UPDATE statement, you have to have a WHERE statement to identify what row to update. An UPDATE without a WHERE filter will update all rows.

\n

Make sure your data has a row identifier to go with it; in this case it may be easier to use named parameters instead:

\n
my_data = ({id=1, value='foo'}, {id=2, value='bar'})\ncursor.executemany('UPDATE test SET myCol=:value WHERE rowId=:id', my_data)\n
\n soup wrap:

Use the .executemany() method instead:

curr.executemany('UPDATE test SET myCol= ?', myList)

However, myList must be a sequence of tuples here. If it is not, a generator expression will do to create these tuples:

curr.executemany('UPDATE test SET myCol= ?', ((val,) for val in myList))

Demonstration (with insertions):

>>> import sqlite3
>>> conn=sqlite3.connect(':memory:')
>>> conn.execute('CREATE TABLE test (myCol)')

>>> conn.commit()
>>> myList = ('foo', 'bar', 'spam')
>>> conn.executemany('INSERT into test values (?)', ((val,) for val in myList))

>>> list(conn.execute('select * from test'))
[(u'foo',), (u'bar',), (u'spam',)]

Note that for an UPDATE statement, you have to have a WHERE statement to identify what row to update. An UPDATE without a WHERE filter will update all rows.

Make sure your data has a row identifier to go with it; in this case it may be easier to use named parameters instead:

my_data = ({id=1, value='foo'}, {id=2, value='bar'})
cursor.executemany('UPDATE test SET myCol=:value WHERE rowId=:id', my_data)
qid & accept id: (15702753, 15702883) query: Python BeautifulSoup how to get the index or of the HTML table soup:
In [78]: anchor = soup.findAll(text=re.compile("Assi"))[0]\nIn [77]: ' '.join(anchor.find_next('td').stripped_strings)\nOut[77]: u'Ishida Co., Ltd. (Kyoto, JP )'\n
\n
\n
import bs4 as bs\nimport urllib2\nimport re\n\nurl = 'http://patft.uspto.gov//netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f=G&l=50&co1=AND&d=PTXT&s1=%22X+ray%22.ABTX.&s2=detect.ABTX.&OS=ABST/%22X+ray%22+AND+ABST/detect&RS=ABST/%22X+ray%22+AND+ABST/detect'\nsoup = bs.BeautifulSoup(urllib2.urlopen(url).read())\n\nanchor = soup.findAll(text=re.compile("Assi"))[0]\nassignee = ' '.join(anchor.find_next('td').stripped_strings)\nprint(assignee)\n
\n

yields

\n
Ishida Co., Ltd. (Kyoto, JP )\n
\n soup wrap:
In [78]: anchor = soup.findAll(text=re.compile("Assi"))[0]
In [77]: ' '.join(anchor.find_next('td').stripped_strings)
Out[77]: u'Ishida Co., Ltd. (Kyoto, JP )'

import bs4 as bs
import urllib2
import re

url = 'http://patft.uspto.gov//netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f=G&l=50&co1=AND&d=PTXT&s1=%22X+ray%22.ABTX.&s2=detect.ABTX.&OS=ABST/%22X+ray%22+AND+ABST/detect&RS=ABST/%22X+ray%22+AND+ABST/detect'
soup = bs.BeautifulSoup(urllib2.urlopen(url).read())

anchor = soup.findAll(text=re.compile("Assi"))[0]
assignee = ' '.join(anchor.find_next('td').stripped_strings)
print(assignee)

yields

Ishida Co., Ltd. (Kyoto, JP )
qid & accept id: (15708416, 15708877) query: Django: Lookup by length of text field soup:

I think regex lookup can help you:

\n
ModelWithTextField.objects.filter(text_field__iregex=r'^.{7,}$')\n
\n

or you can always perform raw SQL queries on Django model:

\n
ModelWithTextField.objects.raw('SELECT * FROM model_with_text_field WHERE LEN_FUNC_NAME(text_field) > 7')\n
\n

where len_func_name is the name of "string length" function for your DBMS. For example in mysql it's named "length".

\n soup wrap:

I think regex lookup can help you:

ModelWithTextField.objects.filter(text_field__iregex=r'^.{7,}$')

or you can always perform raw SQL queries on Django model:

ModelWithTextField.objects.raw('SELECT * FROM model_with_text_field WHERE LEN_FUNC_NAME(text_field) > 7')

where len_func_name is the name of "string length" function for your DBMS. For example in mysql it's named "length".

qid & accept id: (15711845, 15891203) query: How do I change the built-in button labels on a gtk.Assistant? soup:

I managed to find a solution while experimenting with workarounds.

\n

gtk.Assistant overrides the gtk.Container.get_children() method with something that returns the list of pages, but it is still in fact the parent of a gtk.HBox() which contains the buttons for 'Next', 'Apply', 'Cancel', etc.

\n

The method gtk.Assistant.add_action_widget() adds a widget to the so-called "action area". It turns out this is the HBox containing the relevant buttons. The following function will produce a reference to the HBox:

\n
def get_buttons_hbox(assistant):\n    # temporarily add a widget to the action area and get its parent\n    label = gtk.Label('')\n    assistant.add_action_widget(label)\n    hbox = label.get_parent()\n    hbox.remove(label)\n    return hbox\n
\n

Then the buttons are retrieved using get_buttons_hbox(a).get_children().

\n
for child in get_buttons_hbox(a).get_children():\n    print child.get_label()\n
\n

This prints:

\n
gtk-goto-last\ngtk-go-back\ngtk-go-forward\ngtk-apply\ngtk-cancel\ngtk-close\n
\n

So the following code solves the problem (using get_buttons_hbox() defined above):

\n
for child in get_buttons_hbox(a).get_children():\n    label = child.get_label()\n    if label == 'gtk-apply':\n        child.set_label('Start')\n
\n soup wrap:

I managed to find a solution while experimenting with workarounds.

gtk.Assistant overrides the gtk.Container.get_children() method with something that returns the list of pages, but it is still in fact the parent of a gtk.HBox() which contains the buttons for 'Next', 'Apply', 'Cancel', etc.

The method gtk.Assistant.add_action_widget() adds a widget to the so-called "action area". It turns out this is the HBox containing the relevant buttons. The following function will produce a reference to the HBox:

def get_buttons_hbox(assistant):
    # temporarily add a widget to the action area and get its parent
    label = gtk.Label('')
    assistant.add_action_widget(label)
    hbox = label.get_parent()
    hbox.remove(label)
    return hbox

Then the buttons are retrieved using get_buttons_hbox(a).get_children().

for child in get_buttons_hbox(a).get_children():
    print child.get_label()

This prints:

gtk-goto-last
gtk-go-back
gtk-go-forward
gtk-apply
gtk-cancel
gtk-close

So the following code solves the problem (using get_buttons_hbox() defined above):

for child in get_buttons_hbox(a).get_children():
    label = child.get_label()
    if label == 'gtk-apply':
        child.set_label('Start')
qid & accept id: (15721679, 25564098) query: Update and render a value from Flask periodically soup:

Using an Ajax request

\n

Python

\n
@app.route('/_stuff', methods= ['GET'])\ndef stuff():\n    cpu=round(getCpuLoad())\n    ram=round(getVmem())\n    disk=round(getDisk())\n    return jsonify(cpu=cpu, ram=ram, disk=disk)\n
\n

Javascript

\n
function update_values() {\n            $SCRIPT_ROOT = {{ request.script_root|tojson|safe }};\n            $.getJSON($SCRIPT_ROOT+"/_stuff",\n                function(data) {\n                    $("#cpuload").text(data.cpu+" %")\n                    $("#ram").text(data.ram+" %")\n                    $("#disk").text(data.disk+" %")\n                });\n        }\n
\n

Using Websockets

\n

project/app/views/request/websockets.py

\n
# -*- coding: utf-8 -*-\n\n# OS Imports\nimport json\n\n# Local Imports\nfrom app import sockets\nfrom app.functions import get_cpu_load, get_disk_usage, get_vmem\n\n@sockets.route('/_socket_system')\ndef socket_system(ws):\n    """\n    Returns the system informations, JSON Format\n    CPU, RAM, and Disk Usage\n    """\n    while True:\n        message = ws.receive()\n        if message == "update":\n            cpu = round(get_cpu_load())\n            ram = round(get_vmem())\n            disk = round(get_disk_usage())\n            ws.send(json.dumps(dict(received=message, cpu=cpu, ram=ram, disk=disk)))\n        else:\n            ws.send(json.dumps(dict(received=message)))\n
\n

project/app/__init__.py

\n
# -*- coding: utf-8 -*-\nfrom flask import Flask\nfrom flask_sockets import Sockets\n\n\napp = Flask(__name__)\nsockets = Sockets(app)\napp.config.from_object('config')\nfrom app import views\n
\n

Using Flask-Websockets made my life a lot easier. Here is the launcher :\nlaunchwithsockets.sh

\n
#!/bin/sh\n\ngunicorn -k flask_sockets.worker app:app\n
\n

Finally, here is the client code :
\ncustom.js
\nThe code is a bit too long, so here it is.
\nNote that I'm NOT using things like socket.io, that's why the code is long. This code also tries to reconnect to the server periodically, and can stop trying to reconnect on a user action. I use the Messenger lib to notify the user that something went wrong. Of course it's a bit more complicated than using socket.io but I really enjoyed coding the client side.

\n soup wrap:

Using an Ajax request

Python

@app.route('/_stuff', methods= ['GET'])
def stuff():
    cpu=round(getCpuLoad())
    ram=round(getVmem())
    disk=round(getDisk())
    return jsonify(cpu=cpu, ram=ram, disk=disk)

Javascript

function update_values() {
            $SCRIPT_ROOT = {{ request.script_root|tojson|safe }};
            $.getJSON($SCRIPT_ROOT+"/_stuff",
                function(data) {
                    $("#cpuload").text(data.cpu+" %")
                    $("#ram").text(data.ram+" %")
                    $("#disk").text(data.disk+" %")
                });
        }

Using Websockets

project/app/views/request/websockets.py

# -*- coding: utf-8 -*-

# OS Imports
import json

# Local Imports
from app import sockets
from app.functions import get_cpu_load, get_disk_usage, get_vmem

@sockets.route('/_socket_system')
def socket_system(ws):
    """
    Returns the system informations, JSON Format
    CPU, RAM, and Disk Usage
    """
    while True:
        message = ws.receive()
        if message == "update":
            cpu = round(get_cpu_load())
            ram = round(get_vmem())
            disk = round(get_disk_usage())
            ws.send(json.dumps(dict(received=message, cpu=cpu, ram=ram, disk=disk)))
        else:
            ws.send(json.dumps(dict(received=message)))

project/app/__init__.py

# -*- coding: utf-8 -*-
from flask import Flask
from flask_sockets import Sockets


app = Flask(__name__)
sockets = Sockets(app)
app.config.from_object('config')
from app import views

Using Flask-Websockets made my life a lot easier. Here is the launcher : launchwithsockets.sh

#!/bin/sh

gunicorn -k flask_sockets.worker app:app

Finally, here is the client code :
custom.js
The code is a bit too long, so here it is.
Note that I'm NOT using things like socket.io, that's why the code is long. This code also tries to reconnect to the server periodically, and can stop trying to reconnect on a user action. I use the Messenger lib to notify the user that something went wrong. Of course it's a bit more complicated than using socket.io but I really enjoyed coding the client side.

qid & accept id: (15782606, 17929506) query: Executing Ipython Script from System Shell soup:

Not sure if I fully understand your problem, but you can create a py.ini file on Windows as described in the Customized Commands section of PEP 397 of which PyLauncher is an implementation.

\n
[commands]\nipython=C:\Anaconda\Scripts\ipython.exe -v\n
\n

Changing the path to where your local IPython is installed. If you associate the .ipy file extension with the pylauncher executable ( typically C:\Windows\py.exe and you can save the py.ini file to the same path ) and use the shebang below at the top of your .py/.ipy files they should run with ipython and the options specified in the py.ini file

\n
#! ipython\n
\n

You can also associate the ipython.exe with .ipy files on Windows and it will run the .ipy files.

\n soup wrap:

Not sure if I fully understand your problem, but you can create a py.ini file on Windows as described in the Customized Commands section of PEP 397 of which PyLauncher is an implementation.

[commands]
ipython=C:\Anaconda\Scripts\ipython.exe -v

Changing the path to where your local IPython is installed. If you associate the .ipy file extension with the pylauncher executable ( typically C:\Windows\py.exe and you can save the py.ini file to the same path ) and use the shebang below at the top of your .py/.ipy files they should run with ipython and the options specified in the py.ini file

#! ipython

You can also associate the ipython.exe with .ipy files on Windows and it will run the .ipy files.

qid & accept id: (15793715, 15793754) query: Other ways to replace single character soup:

Just tested some solutions to find the best performance,

\n

The tester source code was:

\n
import __main__\nfrom itertools import permutations\nfrom time import time\n\ndef replace1(txt, pos, new_char):\n    return txt[:pos] + new_char + txt[pos+1:]\n\ndef replace2(txt, pos, new_char):\n    return '{0}{1}{2}'.format(txt[:pos], new_char, txt[pos+1:])\n\ndef replace3(txt, pos, new_char):\n    return ''.join({pos: new_char}.get(idx, c) for idx, c in enumerate(txt))\n\ndef replace4(txt, pos, new_char):    \n    txt = list('12345')\n    txt[pos] = new_char\n    ''.join(txt)\n\ndef replace5(txt, pos, new_char):\n    return '%s%s%s' % (txt[:pos], new_char, txt[pos+1:])\n\n\nwords = [''.join(x) for x in permutations('abcdefgij')]\n\nfor i in range(1, 6):\n    func = getattr(__main__, 'replace{}'.format(i))\n\n    start = time()\n    for word in words:\n        result = func(word, 2, 'X')\n    print time() - start\n
\n

And it's the result:

\n
0.233116149902\n0.409259080887\n2.64006495476\n0.612321138382\n0.302225828171\n
\n soup wrap:

Just tested some solutions to find the best performance,

The tester source code was:

import __main__
from itertools import permutations
from time import time

def replace1(txt, pos, new_char):
    return txt[:pos] + new_char + txt[pos+1:]

def replace2(txt, pos, new_char):
    return '{0}{1}{2}'.format(txt[:pos], new_char, txt[pos+1:])

def replace3(txt, pos, new_char):
    return ''.join({pos: new_char}.get(idx, c) for idx, c in enumerate(txt))

def replace4(txt, pos, new_char):    
    txt = list('12345')
    txt[pos] = new_char
    ''.join(txt)

def replace5(txt, pos, new_char):
    return '%s%s%s' % (txt[:pos], new_char, txt[pos+1:])


words = [''.join(x) for x in permutations('abcdefgij')]

for i in range(1, 6):
    func = getattr(__main__, 'replace{}'.format(i))

    start = time()
    for word in words:
        result = func(word, 2, 'X')
    print time() - start

And it's the result:

0.233116149902
0.409259080887
2.64006495476
0.612321138382
0.302225828171
qid & accept id: (15795416, 15928594) query: Disable pagination in Django tastypie? soup:

To do this you need to set at least two different things.

\n

In the site settings file, set

\n
API_LIMIT_PER_PAGE = 0\n
\n

In the resource Meta class that you want to disable pagination for, set:

\n
class MyResource(ModelResource):\n    ...\n    class Meta:\n        ...\n        max_limit = None\n
\n

Then if you navigate to the list view of the resource, the returned content should show a limit of 0.

\n soup wrap:

To do this you need to set at least two different things.

In the site settings file, set

API_LIMIT_PER_PAGE = 0

In the resource Meta class that you want to disable pagination for, set:

class MyResource(ModelResource):
    ...
    class Meta:
        ...
        max_limit = None

Then if you navigate to the list view of the resource, the returned content should show a limit of 0.

qid & accept id: (15815999, 15816186) query: Is there a standard way to store XY data in python? soup:

One nice way is with a structured array. This gives all the advantages of numpy arrays, but a convenient access structure.

\n

All you need to do to make your numpy array a "structured" one is to give it the dtype argument. This gives each "field" a name and type. They can even have more complex shapes and hierarchies if you wish, but here's how I keep my x-y data:

\n
In [175]: import numpy as np\n\nIn [176]: x = np.random.random(10)\n\nIn [177]: y = np.random.random(10)\n\nIn [179]: zip(x,y)\nOut[179]: \n[(0.27432965895978034, 0.034808254176554643),\n (0.10231729328413885, 0.3311112896885462),\n (0.87724361175443311, 0.47852682944121905),\n (0.24291769332378499, 0.50691735432715967),\n (0.47583427680221879, 0.04048957803763753),\n (0.70710641602121627, 0.27331443495117813),\n (0.85878694702522784, 0.61993945461613498),\n (0.28840423235739054, 0.11954319357707233),\n (0.22084849730366296, 0.39880927226467255),\n (0.42915612628398903, 0.19197320645915561)]\n\nIn [180]: data = np.array( zip(x,y), dtype=[('x',float),('y',float)])\n\nIn [181]: data['x']\nOut[181]: \narray([ 0.27432966,  0.10231729,  0.87724361,  0.24291769,  0.47583428,\n        0.70710642,  0.85878695,  0.28840423,  0.2208485 ,  0.42915613])\n\nIn [182]: data['y']\nOut[182]: \narray([ 0.03480825,  0.33111129,  0.47852683,  0.50691735,  0.04048958,\n        0.27331443,  0.61993945,  0.11954319,  0.39880927,  0.19197321])\n\nIn [183]: data[0]\nOut[183]: (0.27432965895978034, 0.03480825417655464)\n
\n

Others will probably suggest using pandas, but if your data is relatively simple, plain numpy might be easier.

\n

You can add hierarchy if you wish, but often it's more complicated than necessary.

\n

For example:

\n
In [200]: t = np.arange(10)\n\nIn [202]: dt = np.dtype([('t',int),('pos',[('x',float),('y',float)])])\n\nIn [203]: alldata = np.array(zip(t, zip(x,y)), dtype=dt)\n\nIn [204]: alldata\nOut[204]: \narray([(0, (0.27432965895978034, 0.03480825417655464)),\n       (1, (0.10231729328413885, 0.3311112896885462)),\n       (2, (0.8772436117544331, 0.47852682944121905)),\n       (3, (0.242917693323785, 0.5069173543271597)),\n       (4, (0.4758342768022188, 0.04048957803763753)),\n       (5, (0.7071064160212163, 0.27331443495117813)),\n       (6, (0.8587869470252278, 0.619939454616135)),\n       (7, (0.28840423235739054, 0.11954319357707233)),\n       (8, (0.22084849730366296, 0.39880927226467255)),\n       (9, (0.429156126283989, 0.1919732064591556))], \n      dtype=[('t', '
\n soup wrap:

One nice way is with a structured array. This gives all the advantages of numpy arrays, but a convenient access structure.

All you need to do to make your numpy array a "structured" one is to give it the dtype argument. This gives each "field" a name and type. They can even have more complex shapes and hierarchies if you wish, but here's how I keep my x-y data:

In [175]: import numpy as np

In [176]: x = np.random.random(10)

In [177]: y = np.random.random(10)

In [179]: zip(x,y)
Out[179]: 
[(0.27432965895978034, 0.034808254176554643),
 (0.10231729328413885, 0.3311112896885462),
 (0.87724361175443311, 0.47852682944121905),
 (0.24291769332378499, 0.50691735432715967),
 (0.47583427680221879, 0.04048957803763753),
 (0.70710641602121627, 0.27331443495117813),
 (0.85878694702522784, 0.61993945461613498),
 (0.28840423235739054, 0.11954319357707233),
 (0.22084849730366296, 0.39880927226467255),
 (0.42915612628398903, 0.19197320645915561)]

In [180]: data = np.array( zip(x,y), dtype=[('x',float),('y',float)])

In [181]: data['x']
Out[181]: 
array([ 0.27432966,  0.10231729,  0.87724361,  0.24291769,  0.47583428,
        0.70710642,  0.85878695,  0.28840423,  0.2208485 ,  0.42915613])

In [182]: data['y']
Out[182]: 
array([ 0.03480825,  0.33111129,  0.47852683,  0.50691735,  0.04048958,
        0.27331443,  0.61993945,  0.11954319,  0.39880927,  0.19197321])

In [183]: data[0]
Out[183]: (0.27432965895978034, 0.03480825417655464)

Others will probably suggest using pandas, but if your data is relatively simple, plain numpy might be easier.

You can add hierarchy if you wish, but often it's more complicated than necessary.

For example:

In [200]: t = np.arange(10)

In [202]: dt = np.dtype([('t',int),('pos',[('x',float),('y',float)])])

In [203]: alldata = np.array(zip(t, zip(x,y)), dtype=dt)

In [204]: alldata
Out[204]: 
array([(0, (0.27432965895978034, 0.03480825417655464)),
       (1, (0.10231729328413885, 0.3311112896885462)),
       (2, (0.8772436117544331, 0.47852682944121905)),
       (3, (0.242917693323785, 0.5069173543271597)),
       (4, (0.4758342768022188, 0.04048957803763753)),
       (5, (0.7071064160212163, 0.27331443495117813)),
       (6, (0.8587869470252278, 0.619939454616135)),
       (7, (0.28840423235739054, 0.11954319357707233)),
       (8, (0.22084849730366296, 0.39880927226467255)),
       (9, (0.429156126283989, 0.1919732064591556))], 
      dtype=[('t', '
qid & accept id: (15820788, 15955816) query: Using Twitter Bootstrap radio buttons with Flask soup:

If you have big sets of similar controls — best way to use loops for them. Let's imagine we use list for storing all button names, and other list of buttons that are acive. This will give us some controller:

\n
from flask import render_template\n\n@app.route('/form/')\ndef hello(name=None):\n    return render_template('hello.html', buttons=['A', 'B', 'C'], active_btns=['A', 'C'])\n
\n

So, in template we'll have something like:

\n
\n{% for button in buttons %}\n {% if button in active_btns %}\n \n {% else %}\n \n {% endif %}\n{% endfor %}\n
\n
\n

Actually, expression inside for loop can be simplified, using Jinja's If expression, and should look like:

\n
\n
\n

But I don't have Jinja2 now, and could be wrong in syntax.

\n

If I didn't got your question right — please write some comment, I'll try to update my answer :)

\n soup wrap:

If you have big sets of similar controls — best way to use loops for them. Let's imagine we use list for storing all button names, and other list of buttons that are acive. This will give us some controller:

from flask import render_template

@app.route('/form/')
def hello(name=None):
    return render_template('hello.html', buttons=['A', 'B', 'C'], active_btns=['A', 'C'])

So, in template we'll have something like:

{% for button in buttons %} {% if button in active_btns %} {% else %} {% endif %} {% endfor %}

Actually, expression inside for loop can be simplified, using Jinja's If expression, and should look like:


But I don't have Jinja2 now, and could be wrong in syntax.

If I didn't got your question right — please write some comment, I'll try to update my answer :)

qid & accept id: (15869158, 16512082) query: Python Socket Listening soup:

Playing around with this for a while finally got it working nice with a telnet session locally using python 2.7.

\n

What it does is it sets up a thread that runs when the client connects listening for client stuff.

\n

When the client sends a return ("\r\n" might have to change that if your interacting with a Linux system?) the message gets printed to the server, while this is happening if there is a raw input at the server side this will get sent to the client:

\n
import socket\nimport threading\nhost = ''\nport = 50000\nconnectionSevered=0\n\nclass client(threading.Thread):\n    def __init__(self, conn):\n        super(client, self).__init__()\n        self.conn = conn\n        self.data = ""\n    def run(self):\n        while True:\n            self.data = self.data + self.conn.recv(1024)\n            if self.data.endswith(u"\r\n"):\n                print self.data\n                self.data = ""\n\n    def send_msg(self,msg):\n        self.conn.send(msg)\n\n    def close(self):\n        self.conn.close()\n\ntry:\n    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n    s.bind((host,port))\n    s.listen(5)\nexcept socket.error:\n    print 'Failed to create socket'\n    sys.exit()\n\nprint '[+] Listening for connections on port: {0}'.format(port)\n\n\nconn, address = s.accept()\nc = client(conn)\nc.start()\nprint '[+] Client connected: {0}'.format(address[0])\nc.send_msg(u"\r\n")\nprint "connectionSevered:{0}".format(connectionSevered) \nwhile (connectionSevered==0):\n    try:\n        response = raw_input()  \n        c.send_msg(response + u"\r\n")\n    except:\n        c.close()\n
\n

The above answer will not work for more than a single connection. I have updated it by adding another thread for taking connections. It it now possible to have more than a single user connect.

\n
import socket\nimport threading\nimport sys\nhost = ''\nport = 50000\n\nclass client(threading.Thread):\n    def __init__(self, conn):\n        super(client, self).__init__()\n        self.conn = conn\n        self.data = ""\n\n    def run(self):\n        while True:\n            self.data = self.data + self.conn.recv(1024)\n            if self.data.endswith(u"\r\n"):\n                print self.data\n                self.data = ""\n\n    def send_msg(self,msg):\n        self.conn.send(msg)\n\n    def close(self):\n        self.conn.close()\n\nclass connectionThread(threading.Thread):\n    def __init__(self, host, port):\n        super(connectionThread, self).__init__()\n        try:\n            self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)\n            self.s.bind((host,port))\n            self.s.listen(5)\n        except socket.error:\n            print 'Failed to create socket'\n            sys.exit()\n        self.clients = []\n\n    def run(self):\n        while True:\n            conn, address = self.s.accept()\n            c = client(conn)\n            c.start()\n            c.send_msg(u"\r\n")\n            self.clients.append(c)\n            print '[+] Client connected: {0}'.format(address[0])\n\n\n\ndef main():\n    get_conns = connectionThread(host, port)\n    get_conns.start()\n    while True:\n        try:\n            response = raw_input() \n            for c in get_conns.clients:\n                c.send_msg(response + u"\r\n")\n        except KeyboardInterrupt:\n            sys.exit()\n\nif __name__ == '__main__':\n    main()\n
\n

Clients are not able to see what other clients say, messages from the server will be sent to all clients. I will leave that as an exercise for the reader.

\n soup wrap:

Playing around with this for a while finally got it working nice with a telnet session locally using python 2.7.

What it does is it sets up a thread that runs when the client connects listening for client stuff.

When the client sends a return ("\r\n" might have to change that if your interacting with a Linux system?) the message gets printed to the server, while this is happening if there is a raw input at the server side this will get sent to the client:

import socket
import threading
host = ''
port = 50000
connectionSevered=0

class client(threading.Thread):
    def __init__(self, conn):
        super(client, self).__init__()
        self.conn = conn
        self.data = ""
    def run(self):
        while True:
            self.data = self.data + self.conn.recv(1024)
            if self.data.endswith(u"\r\n"):
                print self.data
                self.data = ""

    def send_msg(self,msg):
        self.conn.send(msg)

    def close(self):
        self.conn.close()

try:
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.bind((host,port))
    s.listen(5)
except socket.error:
    print 'Failed to create socket'
    sys.exit()

print '[+] Listening for connections on port: {0}'.format(port)


conn, address = s.accept()
c = client(conn)
c.start()
print '[+] Client connected: {0}'.format(address[0])
c.send_msg(u"\r\n")
print "connectionSevered:{0}".format(connectionSevered) 
while (connectionSevered==0):
    try:
        response = raw_input()  
        c.send_msg(response + u"\r\n")
    except:
        c.close()

The above answer will not work for more than a single connection. I have updated it by adding another thread for taking connections. It it now possible to have more than a single user connect.

import socket
import threading
import sys
host = ''
port = 50000

class client(threading.Thread):
    def __init__(self, conn):
        super(client, self).__init__()
        self.conn = conn
        self.data = ""

    def run(self):
        while True:
            self.data = self.data + self.conn.recv(1024)
            if self.data.endswith(u"\r\n"):
                print self.data
                self.data = ""

    def send_msg(self,msg):
        self.conn.send(msg)

    def close(self):
        self.conn.close()

class connectionThread(threading.Thread):
    def __init__(self, host, port):
        super(connectionThread, self).__init__()
        try:
            self.s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            self.s.bind((host,port))
            self.s.listen(5)
        except socket.error:
            print 'Failed to create socket'
            sys.exit()
        self.clients = []

    def run(self):
        while True:
            conn, address = self.s.accept()
            c = client(conn)
            c.start()
            c.send_msg(u"\r\n")
            self.clients.append(c)
            print '[+] Client connected: {0}'.format(address[0])



def main():
    get_conns = connectionThread(host, port)
    get_conns.start()
    while True:
        try:
            response = raw_input() 
            for c in get_conns.clients:
                c.send_msg(response + u"\r\n")
        except KeyboardInterrupt:
            sys.exit()

if __name__ == '__main__':
    main()

Clients are not able to see what other clients say, messages from the server will be sent to all clients. I will leave that as an exercise for the reader.

qid & accept id: (15892598, 15892622) query: Compare values from two different dictionaries in Python? soup:
for key in set(sourceDict).intersection(targetDict):\n    # Now we have only keys that occur in both dicts\n    if sourceDict[key] != targetDict[key]:\n        diffList.append(key)\n
\n

As DSM noted in his (now deleted) answer, you can do this with a list comprehension or generator:

\n
(k for k in set(sourceDict).intersection(targetDict) if sourceDict[key] != targetDict[key])\n
\n soup wrap:
for key in set(sourceDict).intersection(targetDict):
    # Now we have only keys that occur in both dicts
    if sourceDict[key] != targetDict[key]:
        diffList.append(key)

As DSM noted in his (now deleted) answer, you can do this with a list comprehension or generator:

(k for k in set(sourceDict).intersection(targetDict) if sourceDict[key] != targetDict[key])
qid & accept id: (15905215, 15905932) query: Python: Alphanumeric Serial Number with some rules soup:

You can use recursion for this task:

\n
def next_string(s):\n    if len(s) == 0:\n        return '1'\n    head = s[0:-1]\n    tail = s[-1]\n    if tail == 'Z':\n        return next_string(head) + '1'\n    if tail == '9':\n        return head+'A'\n    if tail == 'H':\n        return head+'J'\n    if tail == 'N':\n        return head+'P'\n    return head + chr(ord(tail)+1)\n
\n

This probably isn't the most pythonic code, but this shows how to think about it.

\n
>>> next_string('11A')\n'11B'\n>>> next_string('11A')\n'11B'\n>>> next_string('11Z')\n'121'\n>>> next_string('119')\n'11A'\n>>> next_string('1')\n'2'\n>>> next_string('ZZ')\n'111'\n>>> next_string('ZZ1')\n'ZZ2'\n>>> next_string('ZZ9')\n'ZZA'\n>>> next_string('ZZH')\n'ZZJ'\n
\n soup wrap:

You can use recursion for this task:

def next_string(s):
    if len(s) == 0:
        return '1'
    head = s[0:-1]
    tail = s[-1]
    if tail == 'Z':
        return next_string(head) + '1'
    if tail == '9':
        return head+'A'
    if tail == 'H':
        return head+'J'
    if tail == 'N':
        return head+'P'
    return head + chr(ord(tail)+1)

This probably isn't the most pythonic code, but this shows how to think about it.

>>> next_string('11A')
'11B'
>>> next_string('11A')
'11B'
>>> next_string('11Z')
'121'
>>> next_string('119')
'11A'
>>> next_string('1')
'2'
>>> next_string('ZZ')
'111'
>>> next_string('ZZ1')
'ZZ2'
>>> next_string('ZZ9')
'ZZA'
>>> next_string('ZZH')
'ZZJ'
qid & accept id: (15926531, 15926765) query: Deleting specific text files soup:

If the files are rather small, then the following simple solution would be adequate:

\n
if os.path.isfile(file_path): # or some other condition\n    delete = True             # Standard action: delete\n    try:\n        with open(file_path) as infile:\n            if "dollar" in infile.read(): # don't delete if "dollar" is found\n                delete = False \n    except IOError:\n        print("Could not access file {}".format(file_path))\n    if delete: \n        os.unlink(file_path)\n
\n

If the files are very large and you don't want to load them entirely into memory (especially if you expect that search text to occur early in the file), replace the above with block with the following:

\n
        with open(file_path) as infile:\n            for line in file:\n                if "dollar" in line:\n                    delete = False\n                    break\n
\n soup wrap:

If the files are rather small, then the following simple solution would be adequate:

if os.path.isfile(file_path): # or some other condition
    delete = True             # Standard action: delete
    try:
        with open(file_path) as infile:
            if "dollar" in infile.read(): # don't delete if "dollar" is found
                delete = False 
    except IOError:
        print("Could not access file {}".format(file_path))
    if delete: 
        os.unlink(file_path)

If the files are very large and you don't want to load them entirely into memory (especially if you expect that search text to occur early in the file), replace the above with block with the following:

        with open(file_path) as infile:
            for line in file:
                if "dollar" in line:
                    delete = False
                    break
qid & accept id: (15971308, 15971505) query: Get seconds since midnight in python soup:

It is better to make a single call to a function that returns the current date/time:

\n
from datetime import datetime\n\nnow = datetime.now()\nseconds_since_midnight = (now - now.replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds()\n
\n

Or does

\n
datetime.now() - datetime.now()\n
\n

return zero timedelta for anyone here?

\n soup wrap:

It is better to make a single call to a function that returns the current date/time:

from datetime import datetime

now = datetime.now()
seconds_since_midnight = (now - now.replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds()

Or does

datetime.now() - datetime.now()

return zero timedelta for anyone here?

qid & accept id: (16019624, 16019938) query: How can I cluster a list of lists in Python based on string indices? Need insight soup:

maybe using itertools.groupby?

\n
from itertools import groupby\ndef key(item):\n    return [int(x) for x in item[1].split()[:3]]\n\nmaster_lst = [['Introduction', '0 11 0 1 0'],\n              ['Floating', '0 11 33 1 0'],\n              ['point', '0 11 33 1 1'],\n              ['numbers', '0 11 33 1 2'],\n              ['IEEE', '0 11 58 1 0'],\n              ['Standard', '0 11 58 1 1'], \n              ['754', '0 11 58 1 2']]\n\nfor k,v in groupby(master_lst,key=key):\n    print ' '.join(x[0] for x in v) +' ' + ' '.join(str(x) for x in k)\n
\n

Results in:

\n
Introduction 0 11 0\nFloating point numbers 0 11 33\nIEEE Standard 754 0 11 58\n
\n soup wrap:

maybe using itertools.groupby?

from itertools import groupby
def key(item):
    return [int(x) for x in item[1].split()[:3]]

master_lst = [['Introduction', '0 11 0 1 0'],
              ['Floating', '0 11 33 1 0'],
              ['point', '0 11 33 1 1'],
              ['numbers', '0 11 33 1 2'],
              ['IEEE', '0 11 58 1 0'],
              ['Standard', '0 11 58 1 1'], 
              ['754', '0 11 58 1 2']]

for k,v in groupby(master_lst,key=key):
    print ' '.join(x[0] for x in v) +' ' + ' '.join(str(x) for x in k)

Results in:

Introduction 0 11 0
Floating point numbers 0 11 33
IEEE Standard 754 0 11 58
qid & accept id: (16021184, 16129065) query: How to log e-mail details in AppEngine Admin console? soup:

In result, I've used the following solution:

\n
if msg.is_multipart():\n    for part in msg.walk():\n        if part.get_content_type() and part.get_content_type()=='text/plain': # ignore text/html\n            charset = part.get_content_charset()\n            body = part.get_payload(decode=True).decode(charset)\nelse:\n    body = msg.get_payload(decode=True)\n    body = body.decode('utf-8')\n
\n

and in order to display Russian text correctly in developer's admin console (under Windows 7):

\n
def logging_debug(what):\n    ''' Function to support cp866 encoding in developers admin console\n    '''\n    if os.environ.get('SERVER_SOFTWARE','').startswith('Devel'):\n        logging.debug(what.encode('cp866'))\n    else:\n        logging.debug(what)\n
\n soup wrap:

In result, I've used the following solution:

if msg.is_multipart():
    for part in msg.walk():
        if part.get_content_type() and part.get_content_type()=='text/plain': # ignore text/html
            charset = part.get_content_charset()
            body = part.get_payload(decode=True).decode(charset)
else:
    body = msg.get_payload(decode=True)
    body = body.decode('utf-8')

and in order to display Russian text correctly in developer's admin console (under Windows 7):

def logging_debug(what):
    ''' Function to support cp866 encoding in developers admin console
    '''
    if os.environ.get('SERVER_SOFTWARE','').startswith('Devel'):
        logging.debug(what.encode('cp866'))
    else:
        logging.debug(what)
qid & accept id: (16056574, 16056691) query: How does python prevent a class from being subclassed? soup:

The bool type is defined in C, and its tp_flags slot deliberately does not include the Py_TPFLAGS_BASETYPE flag.

\n

C types need to mark themselves explicitly as subclassable.

\n

To do this for custom Python classes, use a metaclass:

\n
class Final(type):\n    def __new__(cls, name, bases, classdict):\n        for b in bases:\n            if isinstance(b, Final):\n                raise TypeError("type '{0}' is not an acceptable base type".format(b.__name__))\n        return type.__new__(cls, name, bases, dict(classdict))\n\nclass Foo:\n    __metaclass__ = Final\n\nclass Bar(Foo):\n    pass\n
\n

gives:

\n
>>> class Bar(Foo):\n...     pass\n... \nTraceback (most recent call last):\n  File "", line 1, in \n  File "", line 5, in __new__\nTypeError: type 'Foo' is not an acceptable base type\n
\n soup wrap:

The bool type is defined in C, and its tp_flags slot deliberately does not include the Py_TPFLAGS_BASETYPE flag.

C types need to mark themselves explicitly as subclassable.

To do this for custom Python classes, use a metaclass:

class Final(type):
    def __new__(cls, name, bases, classdict):
        for b in bases:
            if isinstance(b, Final):
                raise TypeError("type '{0}' is not an acceptable base type".format(b.__name__))
        return type.__new__(cls, name, bases, dict(classdict))

class Foo:
    __metaclass__ = Final

class Bar(Foo):
    pass

gives:

>>> class Bar(Foo):
...     pass
... 
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 5, in __new__
TypeError: type 'Foo' is not an acceptable base type
qid & accept id: (16057689, 16057932) query: How to rename variables in a loop in Python soup:

Using a dict:

\n
arraysDict = {}\nfor i in range(0,3):\n    arraysDict['x{0}'.format(i)] = [1,2,3]\n\nprint arraysDict\n# {'x2': [1, 2, 3], 'x0': [1, 2, 3], 'x1': [1, 2, 3]}\nprint arraysDict['x1']\n# [1,2,3]\n
\n

Using a list:

\n
arraysList = []\nfor i in range(0,3):\n    arraysList.append([1,2,3])\n\nprint arraysList\n# [[1, 2, 3], [1, 2, 3], [1, 2, 3]]\nprint arraysList[1]\n# [1, 2, 3]\n
\n soup wrap:

Using a dict:

arraysDict = {}
for i in range(0,3):
    arraysDict['x{0}'.format(i)] = [1,2,3]

print arraysDict
# {'x2': [1, 2, 3], 'x0': [1, 2, 3], 'x1': [1, 2, 3]}
print arraysDict['x1']
# [1,2,3]

Using a list:

arraysList = []
for i in range(0,3):
    arraysList.append([1,2,3])

print arraysList
# [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
print arraysList[1]
# [1, 2, 3]
qid & accept id: (16070219, 16070341) query: how to interpolate points in a specific interval on a plot formed by loading a txt file in to scipy program? soup:

You can create a function using scipy.interp1d:

\n
import numpy as np\nfrom scipy import interpolate\n\ndata = np.genfromtxt('data.txt')\n\nx = data[:,0]  #first column\ny = data[:,1]  #second column\n\nf = interpolate.interp1d(x, y)\n\nxnew = np.arange(1, 5.1, 0.1) # this could be over the entire range, depending on what your data is\nynew = f(xnew)   # use interpolation function returned by `interp1d`\n\nfig = plt.figure()\nax1 = fig.add_subplot(111)\n\nax1.set_title("Plot B vs H")    \nax1.set_xlabel('B')\nax1.set_ylabel('H')\n\nax1.plot(x,y, c='r', label='the data')\nax1.plot(xnew, ynew, 'o', label='the interpolation')\n\nleg = ax1.legend()\nplt.show()\n
\n

If you want to smooth your data, you can use the univariatespline, just replace the f = interpolate... line with:

\n
f = interpolate.UnivariateSpline(x, y)\n
\n

To change how much it smooths, you can fiddle with the s and k options:

\n
f = interpolate.UnivariateSpline(x, y, k=3, s=1)\n
\n

As described at the documentation

\n soup wrap:

You can create a function using scipy.interp1d:

import numpy as np
from scipy import interpolate

data = np.genfromtxt('data.txt')

x = data[:,0]  #first column
y = data[:,1]  #second column

f = interpolate.interp1d(x, y)

xnew = np.arange(1, 5.1, 0.1) # this could be over the entire range, depending on what your data is
ynew = f(xnew)   # use interpolation function returned by `interp1d`

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.set_title("Plot B vs H")    
ax1.set_xlabel('B')
ax1.set_ylabel('H')

ax1.plot(x,y, c='r', label='the data')
ax1.plot(xnew, ynew, 'o', label='the interpolation')

leg = ax1.legend()
plt.show()

If you want to smooth your data, you can use the univariatespline, just replace the f = interpolate... line with:

f = interpolate.UnivariateSpline(x, y)

To change how much it smooths, you can fiddle with the s and k options:

f = interpolate.UnivariateSpline(x, y, k=3, s=1)

As described at the documentation

qid & accept id: (16090146, 16105354) query: Sympy library solve to an unknown variable soup:

As I have mentioned in a comment this problem is numerical in nature, so it is better to try to solve it with numpy/scipy. Nonetheless it is an amusing example of how to do numerics in sympy so here is one suggested workflow.

\n

First of all, if it was not for the relative complexity of the expressions here, scipy would have been definitely the better option over sympy. But the expression is rather complicated, so we can first simplify it in sympy and only then feed it to scipy:

\n
>>> C_b\n38.0∗C0\n+3.0∗((0.17∗C0+0.076)∗∗2+(2.0∗C0+0.0066)∗∗2)∗∗0.5\n+3.0∗((0.35∗C0+0.076)∗∗2+(2.0∗C0+0.0066)∗∗2)∗∗0.5\n+3.0∗((2.0∗C0+0.0066)∗∗2+0.0058)∗∗0.5\n+9.4\n\n>>> simplify(C_b)\n38.0∗C0\n+3.0∗(4.0∗C0∗∗2+0.027∗C0+0.0058)∗∗0.5\n+3.0∗(4.1∗C0∗∗2+0.053∗C0+0.0058)∗∗0.5\n+3.0∗(4.2∗C0∗∗2+0.08∗C0+0.0058)∗∗0.5\n+9.4\n
\n

Now given that you are not interested in symbolics and that the simplification was not that good, it would be useless to continue using sympy instead of scipy, but if you insist you can do it.

\n
>>> nsolve(C_b - 10.4866, C0, 1) # for numerical solution\n0.00970963412692139\n
\n

If you try to use solve instead of nsolve you will just waste a lot of resources in searching for a symbolic solution (that may not even exist in elementary terms) when a numeric one is instantaneous.

\n soup wrap:

As I have mentioned in a comment this problem is numerical in nature, so it is better to try to solve it with numpy/scipy. Nonetheless it is an amusing example of how to do numerics in sympy so here is one suggested workflow.

First of all, if it was not for the relative complexity of the expressions here, scipy would have been definitely the better option over sympy. But the expression is rather complicated, so we can first simplify it in sympy and only then feed it to scipy:

>>> C_b
38.0∗C0
+3.0∗((0.17∗C0+0.076)∗∗2+(2.0∗C0+0.0066)∗∗2)∗∗0.5
+3.0∗((0.35∗C0+0.076)∗∗2+(2.0∗C0+0.0066)∗∗2)∗∗0.5
+3.0∗((2.0∗C0+0.0066)∗∗2+0.0058)∗∗0.5
+9.4

>>> simplify(C_b)
38.0∗C0
+3.0∗(4.0∗C0∗∗2+0.027∗C0+0.0058)∗∗0.5
+3.0∗(4.1∗C0∗∗2+0.053∗C0+0.0058)∗∗0.5
+3.0∗(4.2∗C0∗∗2+0.08∗C0+0.0058)∗∗0.5
+9.4

Now given that you are not interested in symbolics and that the simplification was not that good, it would be useless to continue using sympy instead of scipy, but if you insist you can do it.

>>> nsolve(C_b - 10.4866, C0, 1) # for numerical solution
0.00970963412692139

If you try to use solve instead of nsolve you will just waste a lot of resources in searching for a symbolic solution (that may not even exist in elementary terms) when a numeric one is instantaneous.

qid & accept id: (16107526, 16107860) query: How to flexibly change PYTHONPATH soup:
\n

To specify the classpath in Java, I use the -cp or -classpath option\n to java. What is the equivalent option in Python?

\n
\n

Well, there's no "equivalent option" in Python as far as I'm aware, but any Unix-like shell will let you set/override it on a per-process basis, if you were to run Python like this...

\n
$ PYTHONPATH=/put/path/here python myscript.py\n
\n

...a syntax which you could also use for Java with...

\n
$ CLASSPATH=/put/path/here java MyMainClass\n
\n

The closest Windows equivalent to this would be...

\n
> cmd /c "set PYTHONPATH=\put\path\here && python myscript.py"\n
\n

...if you don't want the environment variable to be set in the calling cmd.exe.

\n
\n

I sometimes use PyDev in Eclipse. It can handle multiple source\n directories. How?

\n
\n

When running code, it probably does something similar by setting the variable in the execve(2) call.

\n soup wrap:

To specify the classpath in Java, I use the -cp or -classpath option to java. What is the equivalent option in Python?

Well, there's no "equivalent option" in Python as far as I'm aware, but any Unix-like shell will let you set/override it on a per-process basis, if you were to run Python like this...

$ PYTHONPATH=/put/path/here python myscript.py

...a syntax which you could also use for Java with...

$ CLASSPATH=/put/path/here java MyMainClass

The closest Windows equivalent to this would be...

> cmd /c "set PYTHONPATH=\put\path\here && python myscript.py"

...if you don't want the environment variable to be set in the calling cmd.exe.

I sometimes use PyDev in Eclipse. It can handle multiple source directories. How?

When running code, it probably does something similar by setting the variable in the execve(2) call.

qid & accept id: (16110307, 16110349) query: Splitting a list of lists and strings by a string soup:

Take a look at itertools.groupby:

\n
In [1]: from itertools import groupby\n\nIn [2]: lst = [[ 'something', ',', 'eh' ], ',', ['more'], ',', 'yet more', '|', 'even more' ]\n\nIn [3]: [list(group) for key, group in groupby(lst, lambda x: x!=',') if key]\nOut[3]: [[['something', ',', 'eh']], [['more']], ['yet more', '|', 'even more']]\n
\n

It basically splits items in your list into groups based on a criteria (item != ',') and the comprehension check if k filters out the groups that are False – that is the items that are equal to ','.

\n
In [4]: for key, group in groupby(lst, lambda x: x!=','):\n   ...:     print key, list(group)\n   ...:     \nTrue [['something', ',', 'eh']]\nFalse [',']\nTrue [['more']]\nFalse [',']\nTrue ['yet more', '|', 'even more']\n
\n soup wrap:

Take a look at itertools.groupby:

In [1]: from itertools import groupby

In [2]: lst = [[ 'something', ',', 'eh' ], ',', ['more'], ',', 'yet more', '|', 'even more' ]

In [3]: [list(group) for key, group in groupby(lst, lambda x: x!=',') if key]
Out[3]: [[['something', ',', 'eh']], [['more']], ['yet more', '|', 'even more']]

It basically splits items in your list into groups based on a criteria (item != ',') and the comprehension check if k filters out the groups that are False – that is the items that are equal to ','.

In [4]: for key, group in groupby(lst, lambda x: x!=','):
   ...:     print key, list(group)
   ...:     
True [['something', ',', 'eh']]
False [',']
True [['more']]
False [',']
True ['yet more', '|', 'even more']
qid & accept id: (16123732, 16123840) query: Python, remove specific columns from file soup:

You have a custom ASCII-table-like format with fixed-with columns:

\n
*********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************\n*    Row   * Instance * test_string * test_string * test_string * test_string * test_string * test_string * test_string * string__722 * string__722 * string__722 * string__722 * string__722 * string__722 * string__722 * string__720 * string__720 * string__720 * string__720 * string__720 * string__720 * string__720 * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * string__718 * string__718 * string__718 * string__718 * string__718 * string__718 * string__718 * string__719 * string__719 * string__719 * string__719 * string__719 * string__719 * string__719 * string__723 * string__723 * string__723 * string__723 * string__723 * string__723 * string__723 * string__721 * string__721 * string__721 * string__721 * string__721 * string__721 * string__721 * another_str * another_str * another_str * another_str * another_str * another_str * another_str * another_str * another_str *\n*********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************\n*        0 *        0 *           0 *    50331648 * test_string *           2 *           1 *          13 * 5.76460e+18 *           0 *    50331648 * string__722 *           2 *           1 *         606 * 5.83666e+18 *           0 *    50331648 * string__720 *           2 *           1 *         575 * 5.83666e+18 *           0 *    50331648 * HCAL_SlowDa *           2 *           1 *          36 * 5.76460e+18 *           0 *    50331648 * string__718 *           2 *           1 *         529 * 5.83666e+18 *           0 *    50331648 * string__719 *           2 *           1 *         529 * 5.83666e+18 *           0 *    50331648 * string__723 *           2 *           1 *         529 * 5.83666e+18 *           0 *    50331648 * string__721 *           2 *           1 *         529 * 5.83666e+18 *           0 *    50331648 *      212135 *       15080 *           1 *           1 *        3340 *        1057 * 1.399999976 *\n*        0 *        1 *           0 *    50331648 *             *           2 *           1 *          13 *           0 *           0 *    50331648 *             *           2 *           1 *         606 *       53440 *           0 *    50331648 *             *           2 *           1 *         575 *       53440 *           0 *    50331648 *             *           2 *           1 *          36 *           0 *           0 *    50331648 *             *           2 *           1 *         529 *       53440 *           0 *    50331648 *             *           2 *           1 *         529 *       53440 *           0 *    50331648 *             *           2 *           1 *         529 *       53440 *           0 *    50331648 *             *           2 *           1 *         529 *       53440 *           0 *    50331648 *      212135 *             *           1 *           1 *        3340 *        1057 * 1.399999976 *\n*        0 *        2 *           0 *    50331648 *             *           2 *           1 *          13 *  4294970636 *           0 *    50331648 *             *           2 *           1 *         606 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *         575 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *          36 * 2.70217e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.09780e+16 *           0 *    50331648 *      212135 *             *           1 *           1 *        3340 *        1057 * 1.399999976 *\n*        0 *        3 *           0 *    50331648 *             *           2 *           1 *          13 *   352321545 *           0 *    50331648 *             *           2 *           1 *         606 * 2.30610e+18 *           0 *    50331648 *             *           2 *           1 *         575 * 2.30610e+18 *           0 *    50331648 *             *           2 *           1 *          36 * 7.30102e+18 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *      212135 *             *           1 *           1 *        3340 *        1057 * 1.399999976 *\n*        0 *        4 *           0 *    50331648 *             *           2 *           1 *          13 *           0 *           0 *    50331648 *             *           2 *           1 *         606 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         575 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *          36 * 2.82590e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *      212135 *             *           1 *           1 *        3340 *        1057 * 1.399999976 *\n
\n

If we assume that none of the actual data fields contain asterisks themselves, the easiest way to read each row is to use a regular expression to split out the lines.

\n

To output, I'd still use the csv module, because that would make future processing that much easier:

\n
import csv\nimport re\nfrom itertools import islice\n\nrow_split = re.compile('\s*\*\s*')\n\nwith open(someinputfile, 'rb') as infile, open(outputfile, 'wb') as outfile:\n    writer = csv.writer(outfile, delimiter='\t')\n\n    next(islice(infile, 3, 3), None) # skip the first 3 lines in the input file\n\n    for line in infile:\n        row = row_split.split(line)[1:-1]\n        if not row: continue\n        writer.writerow(row[8::7])\n
\n

This skips empty rows, and writes only every 7th column (counting from number nine) and skips the rest.

\n

The first row thus is:

\n
['5.76460e+18', '5.83666e+18', '5.83666e+18', '5.76460e+18', '5.83666e+18', '5.83666e+18', '5.83666e+18', '5.83666e+18', '3340']\n
\n soup wrap:

You have a custom ASCII-table-like format with fixed-with columns:

*********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
*    Row   * Instance * test_string * test_string * test_string * test_string * test_string * test_string * test_string * string__722 * string__722 * string__722 * string__722 * string__722 * string__722 * string__722 * string__720 * string__720 * string__720 * string__720 * string__720 * string__720 * string__720 * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * HCAL_SlowDa * string__718 * string__718 * string__718 * string__718 * string__718 * string__718 * string__718 * string__719 * string__719 * string__719 * string__719 * string__719 * string__719 * string__719 * string__723 * string__723 * string__723 * string__723 * string__723 * string__723 * string__723 * string__721 * string__721 * string__721 * string__721 * string__721 * string__721 * string__721 * another_str * another_str * another_str * another_str * another_str * another_str * another_str * another_str * another_str *
*********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
*        0 *        0 *           0 *    50331648 * test_string *           2 *           1 *          13 * 5.76460e+18 *           0 *    50331648 * string__722 *           2 *           1 *         606 * 5.83666e+18 *           0 *    50331648 * string__720 *           2 *           1 *         575 * 5.83666e+18 *           0 *    50331648 * HCAL_SlowDa *           2 *           1 *          36 * 5.76460e+18 *           0 *    50331648 * string__718 *           2 *           1 *         529 * 5.83666e+18 *           0 *    50331648 * string__719 *           2 *           1 *         529 * 5.83666e+18 *           0 *    50331648 * string__723 *           2 *           1 *         529 * 5.83666e+18 *           0 *    50331648 * string__721 *           2 *           1 *         529 * 5.83666e+18 *           0 *    50331648 *      212135 *       15080 *           1 *           1 *        3340 *        1057 * 1.399999976 *
*        0 *        1 *           0 *    50331648 *             *           2 *           1 *          13 *           0 *           0 *    50331648 *             *           2 *           1 *         606 *       53440 *           0 *    50331648 *             *           2 *           1 *         575 *       53440 *           0 *    50331648 *             *           2 *           1 *          36 *           0 *           0 *    50331648 *             *           2 *           1 *         529 *       53440 *           0 *    50331648 *             *           2 *           1 *         529 *       53440 *           0 *    50331648 *             *           2 *           1 *         529 *       53440 *           0 *    50331648 *             *           2 *           1 *         529 *       53440 *           0 *    50331648 *      212135 *             *           1 *           1 *        3340 *        1057 * 1.399999976 *
*        0 *        2 *           0 *    50331648 *             *           2 *           1 *          13 *  4294970636 *           0 *    50331648 *             *           2 *           1 *         606 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *         575 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *          36 * 2.70217e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.09780e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.09780e+16 *           0 *    50331648 *      212135 *             *           1 *           1 *        3340 *        1057 * 1.399999976 *
*        0 *        3 *           0 *    50331648 *             *           2 *           1 *          13 *   352321545 *           0 *    50331648 *             *           2 *           1 *         606 * 2.30610e+18 *           0 *    50331648 *             *           2 *           1 *         575 * 2.30610e+18 *           0 *    50331648 *             *           2 *           1 *          36 * 7.30102e+18 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *      212135 *             *           1 *           1 *        3340 *        1057 * 1.399999976 *
*        0 *        4 *           0 *    50331648 *             *           2 *           1 *          13 *           0 *           0 *    50331648 *             *           2 *           1 *         606 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         575 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *          36 * 2.82590e+16 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *             *           2 *           1 *         529 * 1.15294e+19 *           0 *    50331648 *      212135 *             *           1 *           1 *        3340 *        1057 * 1.399999976 *

If we assume that none of the actual data fields contain asterisks themselves, the easiest way to read each row is to use a regular expression to split out the lines.

To output, I'd still use the csv module, because that would make future processing that much easier:

import csv
import re
from itertools import islice

row_split = re.compile('\s*\*\s*')

with open(someinputfile, 'rb') as infile, open(outputfile, 'wb') as outfile:
    writer = csv.writer(outfile, delimiter='\t')

    next(islice(infile, 3, 3), None) # skip the first 3 lines in the input file

    for line in infile:
        row = row_split.split(line)[1:-1]
        if not row: continue
        writer.writerow(row[8::7])

This skips empty rows, and writes only every 7th column (counting from number nine) and skips the rest.

The first row thus is:

['5.76460e+18', '5.83666e+18', '5.83666e+18', '5.76460e+18', '5.83666e+18', '5.83666e+18', '5.83666e+18', '5.83666e+18', '3340']
qid & accept id: (16142829, 16143455) query: numpy: slicing and vectorized looping with 1d and 2d arrays soup:

The following does what you want:

\n
A = np.array([[0., 1., 0., 2.],\n             [1., 0., 3., 0.],\n             [0., 0., 0., 4.],\n             [2., 0., 4., 0.]]) # quadratic, not symmetric Matrix, shape (i, i)\nB = np.array([2., 4., 2., 1.]) # vector shape (i)\n\nC = A*(B[:,None]-B)\n
\n

C is

\n
array([[ 0., -2.,  0.,  2.],\n       [ 2.,  0.,  6.,  0.],\n       [ 0., -0.,  0.,  4.],\n       [-2., -0., -4.,  0.]])\n
\n

A little explanation:
\nB[:,None] converts B to a column vector of shape [4,1]. B[:,None]-B automatically broadcast the result to a 4x4 matrix that you can simply multiply by A

\n soup wrap:

The following does what you want:

A = np.array([[0., 1., 0., 2.],
             [1., 0., 3., 0.],
             [0., 0., 0., 4.],
             [2., 0., 4., 0.]]) # quadratic, not symmetric Matrix, shape (i, i)
B = np.array([2., 4., 2., 1.]) # vector shape (i)

C = A*(B[:,None]-B)

C is

array([[ 0., -2.,  0.,  2.],
       [ 2.,  0.,  6.,  0.],
       [ 0., -0.,  0.,  4.],
       [-2., -0., -4.,  0.]])

A little explanation:
B[:,None] converts B to a column vector of shape [4,1]. B[:,None]-B automatically broadcast the result to a 4x4 matrix that you can simply multiply by A

qid & accept id: (16143648, 16143696) query: Escape string to be valid python expression soup:

You can use the unicode_escape codec; this produces a bytes instance:

\n
>>> example = 'Foo \'" \\ Bar'\n>>> print(example)\nFoo '" \ Bar\n>>> print(example.encode('unicode_escape'))\nb'Foo \'" \\\\ Bar'\n>>> example.encode('unicode_escape')\nb'Foo \'" \\\\ Bar'\n
\n

unicode_escape expliticly produces valid python string literals:

\n
\n

Produce a string that is suitable as Unicode literal in Python source code

\n
\n

To go back to Unicode, simply decode from ASCII:

\n
>>> print(example.encode('unicode_escape').decode('ascii'))\nFoo '" \\ Bar\n>>> example.encode('unicode_escape').decode('ascii')\n'Foo \'" \\\\ Bar'\n
\n

Alternatively, use repr():

\n
>>> repr(example)\n'\'Foo \\\'" \\\\ Bar\''\n>>> print(repr(example))\n'Foo \'" \\ Bar'\n
\n
\n

Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.

\n
\n

The output of repr() of a string can be pasted straight back into a Python interpreter without additional formatting.

\n

Note that repr() and unicode_escape only escape quotes when absolutely necessary. Only when both styles of quoting, single and double, are present does one of these get escaped:

\n
>>> print(repr('\''))\n"'"\n>>> print(repr('\"'))\n'"'\n>>> print(repr('\'"'))\n'\'"'\n
\n soup wrap:

You can use the unicode_escape codec; this produces a bytes instance:

>>> example = 'Foo \'" \\ Bar'
>>> print(example)
Foo '" \ Bar
>>> print(example.encode('unicode_escape'))
b'Foo \'" \\\\ Bar'
>>> example.encode('unicode_escape')
b'Foo \'" \\\\ Bar'

unicode_escape expliticly produces valid python string literals:

Produce a string that is suitable as Unicode literal in Python source code

To go back to Unicode, simply decode from ASCII:

>>> print(example.encode('unicode_escape').decode('ascii'))
Foo '" \\ Bar
>>> example.encode('unicode_escape').decode('ascii')
'Foo \'" \\\\ Bar'

Alternatively, use repr():

>>> repr(example)
'\'Foo \\\'" \\\\ Bar\''
>>> print(repr(example))
'Foo \'" \\ Bar'

Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object.

The output of repr() of a string can be pasted straight back into a Python interpreter without additional formatting.

Note that repr() and unicode_escape only escape quotes when absolutely necessary. Only when both styles of quoting, single and double, are present does one of these get escaped:

>>> print(repr('\''))
"'"
>>> print(repr('\"'))
'"'
>>> print(repr('\'"'))
'\'"'
qid & accept id: (16155921, 16156065) query: Using fping to get ping times in Python soup:

I don't know much about fping, but something like this...

\n
import subprocess\n\nCMD = ['fping', 'param1', 'param2']\n\nresult = subprocess.check_output(CMD)\n
\n

...will run fping param1 param2, and put the output as a string into the result variable, once the fping process has terminated.

\n

You can split the output into lines with result.splitlines().

\n

Here's quick one-liner example using ping to grab three ping times to localhost...

\n
>>> [line.rpartition('=')[-1] for line in subprocess.check_output(['ping', '-c', '3', 'localhost']).splitlines()[1:-4]]\n['0.028 ms', '0.023 ms', '0.025 ms']\n
\n soup wrap:

I don't know much about fping, but something like this...

import subprocess

CMD = ['fping', 'param1', 'param2']

result = subprocess.check_output(CMD)

...will run fping param1 param2, and put the output as a string into the result variable, once the fping process has terminated.

You can split the output into lines with result.splitlines().

Here's quick one-liner example using ping to grab three ping times to localhost...

>>> [line.rpartition('=')[-1] for line in subprocess.check_output(['ping', '-c', '3', 'localhost']).splitlines()[1:-4]]
['0.028 ms', '0.023 ms', '0.025 ms']
qid & accept id: (16160474, 16160968) query: GNOME configuration database type-inference soup:

Rather than invoking the command line tool, try using the gconf module included in the GNOME Python bindings:

\n
>>> import gconf\n>>> client = gconf.Client()\n>>> # Get a value and introspect its type:\n>>> value = client.get('/apps/gnome-terminal/profiles/Default/background_color')\n>>> value.type\n\n>>> value.get_string()\n'#FFFFFFFFDDDD'\n
\n

For lists, you can introspect the list value type:

\n
>>> value = client.get('/apps/compiz-1/general/screen0/options/active_plugins')\n>>> value.type\n\n>>> value.get_list_type()\n\n>>> value.get_list()\n(, , ...)\n
\n

In general though, you should know the types of the keys you're manipulating and use the appropriate type specific access methods directly (e.g. Client.get_string and Client.set_string).

\n soup wrap:

Rather than invoking the command line tool, try using the gconf module included in the GNOME Python bindings:

>>> import gconf
>>> client = gconf.Client()
>>> # Get a value and introspect its type:
>>> value = client.get('/apps/gnome-terminal/profiles/Default/background_color')
>>> value.type

>>> value.get_string()
'#FFFFFFFFDDDD'

For lists, you can introspect the list value type:

>>> value = client.get('/apps/compiz-1/general/screen0/options/active_plugins')
>>> value.type

>>> value.get_list_type()

>>> value.get_list()
(, , ...)

In general though, you should know the types of the keys you're manipulating and use the appropriate type specific access methods directly (e.g. Client.get_string and Client.set_string).

qid & accept id: (16207633, 16207769) query: Represent a class as a dict or list soup:

Do you mean something like this? If so you have to define a __iter__ method that yield's key-value pairs:

\n
In [1]: class A(object):\n   ...:     def __init__(self):\n   ...:        self.pairs = ((1,2),(2,3))\n   ...:     def __iter__(self):\n   ...:         return iter(self.pairs)\n   ...:     \n\nIn [2]: a = A()\n\nIn [3]: dict(a)\nOut[3]: {1: 2, 2: 3}\n
\n
\n

Also, it seems that dict tries to call the .keys / __getitem__ methods before __iter__, so you can make list(instance) and dict(instance) return something completely different.

\n
In [4]: class B(object):\n    ...:     def __init__(self):\n    ...:        self.d = {'key':'value'}\n    ...:        self.l = [1,2,3,4]\n    ...:     def keys(self):\n    ...:         return self.d.keys()\n    ...:     def __getitem__(self, item):\n    ...:         return self.d[item]\n    ...:     def __iter__(self):        \n    ...:         return iter(self.l)\n    ...:     \n\nIn [5]: b = B()\n\nIn [6]: list(b)\nOut[6]: [1, 2, 3, 4]\n\nIn [7]: dict(b)\nOut[7]: {'key': 'value'}\n
\n soup wrap:

Do you mean something like this? If so you have to define a __iter__ method that yield's key-value pairs:

In [1]: class A(object):
   ...:     def __init__(self):
   ...:        self.pairs = ((1,2),(2,3))
   ...:     def __iter__(self):
   ...:         return iter(self.pairs)
   ...:     

In [2]: a = A()

In [3]: dict(a)
Out[3]: {1: 2, 2: 3}

Also, it seems that dict tries to call the .keys / __getitem__ methods before __iter__, so you can make list(instance) and dict(instance) return something completely different.

In [4]: class B(object):
    ...:     def __init__(self):
    ...:        self.d = {'key':'value'}
    ...:        self.l = [1,2,3,4]
    ...:     def keys(self):
    ...:         return self.d.keys()
    ...:     def __getitem__(self, item):
    ...:         return self.d[item]
    ...:     def __iter__(self):        
    ...:         return iter(self.l)
    ...:     

In [5]: b = B()

In [6]: list(b)
Out[6]: [1, 2, 3, 4]

In [7]: dict(b)
Out[7]: {'key': 'value'}
qid & accept id: (16218482, 16218589) query: Setting path in Python soup:

Something like this should work...

\n
import os\n\npsqldir = 'C:/Program Files/PostgreSQL/9.2/bin'\nos.environ['PATH'] = '%s;%s' % (os.environ['PATH'], psqldir)\nos.system('foo')\n
\n

...or just call foo.exe by its full path...

\n
os.system('C:/Program Files/PostgreSQL/9.2/bin/foo')\n
\n

However, as kindall's (now-deleted) answer suggested, it's worth noting this paragraph from the os.system() documentation...

\n
\n

The subprocess module provides more powerful facilities for spawning new \n processes and retrieving their results; using that module is preferable to using\n this function. See the Replacing Older Functions with the subprocess Module\n section in the subprocess documentation for some helpful recipes.

\n
\n soup wrap:

Something like this should work...

import os

psqldir = 'C:/Program Files/PostgreSQL/9.2/bin'
os.environ['PATH'] = '%s;%s' % (os.environ['PATH'], psqldir)
os.system('foo')

...or just call foo.exe by its full path...

os.system('C:/Program Files/PostgreSQL/9.2/bin/foo')

However, as kindall's (now-deleted) answer suggested, it's worth noting this paragraph from the os.system() documentation...

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.

qid & accept id: (16233528, 16569556) query: Average multiple vectors of points of different lengths in python soup:

Unfortunately no takers, here is a solution I find works ok.

\n

I had to change the format of the data to solve this.\nSo instead of having a list of trials with variable number of (x, y) points:\n[[(x, y), (x, y), ...], [(x, y), (...), ...]]

\n

I now have 3 numpy.arrays:

\n

sx = array([ 23, 34, 42, ..., 56, 56, 63])

\n

sy = array([ 78, 94, 20, ..., 44, 38, 34])

\n

st = array([1, 1, 1, ..., 293, 293, 293])

\n

All vectors are the same length as they are essentially part of a table, where sx is the column with all the x positions, sy is all the y positions and st is the trial number (or the list ID of the x and y positions). st is basically a bunch of repeated numbers [1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,...]

\n

(I'm actually using HDF5/pytables to store my data and its a direct read from the table that contains the tracking data)

\n

This solution uses interp1d

\n
from scipy.interpolate import interp1d\n
\n

as well as numpy of course

\n
import numpy as np\n
\n

I admit its a hacked solution and not really fast but it works :) On the other hand re-reading my own question made me think it was not a very clear exposition of my problem... sorry for that. Anyway here is the solution.

\n

The following func receives the 3 vecs I described above, a trialList, which is a list of trials to collapse across and a kind, which is the type of collapsing you want to do, can be mean or median for now. It will return the collapsed trajectory i.e. the x and y positions of the average or the median of the trialList

\n
def collapseTrajectories(sx, sy, st, trialList, kind='median'):\n    # find the longest trial to use as template\n    l = 0\n    tr = []\n    for t in trialList:\n        if len(st[st==t]) > l:\n            l = len(st[st==t])\n            tr = t\n\n    # Make all vectors the same length by interpolating the values\n    xnew = np.linspace(0, 640, l)\n    ynew = np.linspace(0, 480, l)\n    sx_new = []\n    sy_new = []\n\n    for t in trialList:\n        if len(st[st==t]) > 3:\n            X = sx[st==t]\n            Y = sy[st==t]\n            x = np.linspace(0,640, len(X))\n            y = np.linspace(0,480,len(Y))\n            fx = interp1d(x, X, kind='cubic')\n            fy = interp1d(y, Y, kind='cubic')\n            sx_new.append(fx(xnew))\n            sy_new.append(fy(ynew))\n\n    # Collapse using the appropriate kind\n    if kind == 'median':\n        out_x = np.median(sx_new, axis=0)\n        out_y = np.median(sy_new, axis=0)\n    elif kind=='mean':\n        out_x = np.mean(sx_new, axis=0)\n        out_y = np.mean(sy_new, axis=0)\n\n    return out_x, out_y\n
\n soup wrap:

Unfortunately no takers, here is a solution I find works ok.

I had to change the format of the data to solve this. So instead of having a list of trials with variable number of (x, y) points: [[(x, y), (x, y), ...], [(x, y), (...), ...]]

I now have 3 numpy.arrays:

sx = array([ 23, 34, 42, ..., 56, 56, 63])

sy = array([ 78, 94, 20, ..., 44, 38, 34])

st = array([1, 1, 1, ..., 293, 293, 293])

All vectors are the same length as they are essentially part of a table, where sx is the column with all the x positions, sy is all the y positions and st is the trial number (or the list ID of the x and y positions). st is basically a bunch of repeated numbers [1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,...]

(I'm actually using HDF5/pytables to store my data and its a direct read from the table that contains the tracking data)

This solution uses interp1d

from scipy.interpolate import interp1d

as well as numpy of course

import numpy as np

I admit its a hacked solution and not really fast but it works :) On the other hand re-reading my own question made me think it was not a very clear exposition of my problem... sorry for that. Anyway here is the solution.

The following func receives the 3 vecs I described above, a trialList, which is a list of trials to collapse across and a kind, which is the type of collapsing you want to do, can be mean or median for now. It will return the collapsed trajectory i.e. the x and y positions of the average or the median of the trialList

def collapseTrajectories(sx, sy, st, trialList, kind='median'):
    # find the longest trial to use as template
    l = 0
    tr = []
    for t in trialList:
        if len(st[st==t]) > l:
            l = len(st[st==t])
            tr = t

    # Make all vectors the same length by interpolating the values
    xnew = np.linspace(0, 640, l)
    ynew = np.linspace(0, 480, l)
    sx_new = []
    sy_new = []

    for t in trialList:
        if len(st[st==t]) > 3:
            X = sx[st==t]
            Y = sy[st==t]
            x = np.linspace(0,640, len(X))
            y = np.linspace(0,480,len(Y))
            fx = interp1d(x, X, kind='cubic')
            fy = interp1d(y, Y, kind='cubic')
            sx_new.append(fx(xnew))
            sy_new.append(fy(ynew))

    # Collapse using the appropriate kind
    if kind == 'median':
        out_x = np.median(sx_new, axis=0)
        out_y = np.median(sy_new, axis=0)
    elif kind=='mean':
        out_x = np.mean(sx_new, axis=0)
        out_y = np.mean(sy_new, axis=0)

    return out_x, out_y
qid & accept id: (16253958, 16253985) query: How can I create a list with the first column? soup:

Something like this, considering the data is stored in a text file:

\n
In [15]: with open("abc") as f:\n   ....:     for line in f:\n   ....:         spl=line.split()\n   ....:         if '18' in spl:\n   ....:             print line\n   ....:             break\n   ....:             \n18  :   mp4 [360x640]\n
\n

or if the data is stored in a string:

\n
In [16]: strs="""Available formats:\n   ....:     37  :   mp4 [1080x1920]\n   ....:     46  :   webm    [1080x1920]\n   ....:     22  :   mp4 [720x1280]\n   ....:     45  :   webm    [720x1280]\n   ....:     35  :   flv [480x854]\n   ....:     44  :   webm    [480x854]\n   ....:     34  :   flv [360x640]\n   ....:     18  :   mp4 [360x640]\n   ....:     43  :   webm    [360x640]\n   ....:     5   :   flv [240x400]\n   ....:     17  :   mp4 [144x176]"""\n   ....:     \n\nIn [17]: for line in strs.splitlines():\n   ....:     spl=line.split()\n   ....:     if '18' in  spl:\n   ....:         print line\n   ....:         break\n   ....:         \n    18  :   mp4 [360x640]\n
\n soup wrap:

Something like this, considering the data is stored in a text file:

In [15]: with open("abc") as f:
   ....:     for line in f:
   ....:         spl=line.split()
   ....:         if '18' in spl:
   ....:             print line
   ....:             break
   ....:             
18  :   mp4 [360x640]

or if the data is stored in a string:

In [16]: strs="""Available formats:
   ....:     37  :   mp4 [1080x1920]
   ....:     46  :   webm    [1080x1920]
   ....:     22  :   mp4 [720x1280]
   ....:     45  :   webm    [720x1280]
   ....:     35  :   flv [480x854]
   ....:     44  :   webm    [480x854]
   ....:     34  :   flv [360x640]
   ....:     18  :   mp4 [360x640]
   ....:     43  :   webm    [360x640]
   ....:     5   :   flv [240x400]
   ....:     17  :   mp4 [144x176]"""
   ....:     

In [17]: for line in strs.splitlines():
   ....:     spl=line.split()
   ....:     if '18' in  spl:
   ....:         print line
   ....:         break
   ....:         
    18  :   mp4 [360x640]
qid & accept id: (16257087, 16257122) query: Download a file and push into MySQL without timing out in Python soup:

I would copy whole file to your server and then use LOAD DATA LOCAL INFILE since it supports csv input:

\n
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name\n  FIELDS TERMINATED BY ',' ENCLOSED BY '"'\n  LINES TERMINATED BY '\r\n'\n  IGNORE 1 LINES;\n
\n
\n

If you don't like this solution you could use mysql_ping() (hopefully connector you're using supports it) to auto reconnect.

\n
\n

Checks whether the connection to the server is working. If the connection has gone down and auto-reconnect is enabled an attempt to reconnect is made. If the connection is down and auto-reconnect is disabled, mysql_ping() returns an error.

\n
\n
\n

And if you have problem that you'd be able to download file but it times out because of delay of MySQL, you can run it in two threads and sync it trough queue:

\n
# Prepare queue and end signaling handler\nq = queue.Queue()\ndone = threading.Event()\n\n# Function that fetches items from q and puts them into db after\n# certain amount is reached\ndef store_db():\n    items=[]\n\n    # Until we set done\n    while not done.is_set():\n        try:\n            # We may have 500 records and thread be done... prevent deadlock\n            items.append(q.get(timeout=5))\n            if len(items) > 1000:\n                insert_into(items)\n                items = []\n            q.task_done()\n         # If you wait longer then 5 seconds < exception\n         except queue.Empty: pass\n\n    if items:\n        insert_into(items)\n\n# Fetch all data in a loop\ndef continous_reading():\n    # Fetch row\n    q.put(row)\n\n# Start storer thread\nt = threading.Thread(target=store_db)\nt.daemon = True\nt.start()\n\ncontinous_reading()\nq.join() # Wait for all task to be processed\ndone.set() # Signal store_db that it can terminate\nt.join() # to make sure the items buffer is stored into the db\n
\n soup wrap:

I would copy whole file to your server and then use LOAD DATA LOCAL INFILE since it supports csv input:

LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
  FIELDS TERMINATED BY ',' ENCLOSED BY '"'
  LINES TERMINATED BY '\r\n'
  IGNORE 1 LINES;

If you don't like this solution you could use mysql_ping() (hopefully connector you're using supports it) to auto reconnect.

Checks whether the connection to the server is working. If the connection has gone down and auto-reconnect is enabled an attempt to reconnect is made. If the connection is down and auto-reconnect is disabled, mysql_ping() returns an error.


And if you have problem that you'd be able to download file but it times out because of delay of MySQL, you can run it in two threads and sync it trough queue:

# Prepare queue and end signaling handler
q = queue.Queue()
done = threading.Event()

# Function that fetches items from q and puts them into db after
# certain amount is reached
def store_db():
    items=[]

    # Until we set done
    while not done.is_set():
        try:
            # We may have 500 records and thread be done... prevent deadlock
            items.append(q.get(timeout=5))
            if len(items) > 1000:
                insert_into(items)
                items = []
            q.task_done()
         # If you wait longer then 5 seconds < exception
         except queue.Empty: pass

    if items:
        insert_into(items)

# Fetch all data in a loop
def continous_reading():
    # Fetch row
    q.put(row)

# Start storer thread
t = threading.Thread(target=store_db)
t.daemon = True
t.start()

continous_reading()
q.join() # Wait for all task to be processed
done.set() # Signal store_db that it can terminate
t.join() # to make sure the items buffer is stored into the db
qid & accept id: (16367823, 16367850) query: Python Sum of Squares Function soup:

Convert the integer to a string, then the individual characters back to integers:

\n
def sum_of_squares_of_digits(value):\n    return sum(int(c) ** 2 for c in str(value))\n
\n

This uses sum() together with a generator expression to turn all the digits back to integers, square them, and sum the results together again.

\n

Demo:

\n
>>> def sum_of_squares_of_digits(value):\n...     return sum(int(c) ** 2 for c in str(value))\n... \n>>> sum_of_squares_of_digits(987)\n194\n
\n soup wrap:

Convert the integer to a string, then the individual characters back to integers:

def sum_of_squares_of_digits(value):
    return sum(int(c) ** 2 for c in str(value))

This uses sum() together with a generator expression to turn all the digits back to integers, square them, and sum the results together again.

Demo:

>>> def sum_of_squares_of_digits(value):
...     return sum(int(c) ** 2 for c in str(value))
... 
>>> sum_of_squares_of_digits(987)
194
qid & accept id: (16391808, 16392172) query: how to properly loop through two files comparing strings in both files against each other soup:

The reason its only doing it once is that the for loop has reached the end of the file, so it stops since there are no more lines to read.

\n

In other words, the first time your loop runs, it steps through the entire file, and then since there are no more lines to read (since its reached the end of the file), it doesn't loop again, resulting in only one line being processed.

\n

So one way to solve this is to "rewind" the file, you can do that with the seek method of the file object.

\n

If your files aren't big, another approach is to read them all into a list or similar structure and then loop through it.

\n

However, since your sentiment score is a simple lookup, the best approach would be to build a dictionary with the sentiment scores, then lookup each word in the dictionary to calculate the overall sentiment of the tweet:

\n
import csv\nimport json\n\nscores = {}  # empty dictionary to store scores for each word\n\nwith open('sentimentfile.txt') as f:\n    reader = csv.reader(f, delimiter='\t')\n    for row in reader:\n        scores[row[0].strip()] = int(row[1].strip()) \n\n\nwith open('tweetsfile.txt') as f:\n    for line in f:\n        tweet = json.loads(line)\n        text = tweet.get('text','').encode('utf-8')\n        if text:\n            total_sentiment = sum(scores.get(word,0) for word in text.split())\n            print("{}: {}".format(text,score))\n
\n

The with statement automatically closes file handlers. I am using the csv module to read the file (it works for tab delimited files as well).

\n

This line does the calculation:

\n
total_sentiment = sum(scores.get(word,0) for word in text.split())\n
\n

It is a shorter way to write this loop:

\n
tweet_score = []\nfor word in text.split():\n    if word in scores:\n        tweet_score[word] = scores[word]\n\ntotal_score = sum(tweet_score)\n
\n

The get method of dictionaries takes a second optional argument to return a custom value when the key cannot be found; if you omit this second argument, it will return None. In my loop I am using it to return 0 if the word has no score.

\n soup wrap:

The reason its only doing it once is that the for loop has reached the end of the file, so it stops since there are no more lines to read.

In other words, the first time your loop runs, it steps through the entire file, and then since there are no more lines to read (since its reached the end of the file), it doesn't loop again, resulting in only one line being processed.

So one way to solve this is to "rewind" the file, you can do that with the seek method of the file object.

If your files aren't big, another approach is to read them all into a list or similar structure and then loop through it.

However, since your sentiment score is a simple lookup, the best approach would be to build a dictionary with the sentiment scores, then lookup each word in the dictionary to calculate the overall sentiment of the tweet:

import csv
import json

scores = {}  # empty dictionary to store scores for each word

with open('sentimentfile.txt') as f:
    reader = csv.reader(f, delimiter='\t')
    for row in reader:
        scores[row[0].strip()] = int(row[1].strip()) 


with open('tweetsfile.txt') as f:
    for line in f:
        tweet = json.loads(line)
        text = tweet.get('text','').encode('utf-8')
        if text:
            total_sentiment = sum(scores.get(word,0) for word in text.split())
            print("{}: {}".format(text,score))

The with statement automatically closes file handlers. I am using the csv module to read the file (it works for tab delimited files as well).

This line does the calculation:

total_sentiment = sum(scores.get(word,0) for word in text.split())

It is a shorter way to write this loop:

tweet_score = []
for word in text.split():
    if word in scores:
        tweet_score[word] = scores[word]

total_score = sum(tweet_score)

The get method of dictionaries takes a second optional argument to return a custom value when the key cannot be found; if you omit this second argument, it will return None. In my loop I am using it to return 0 if the word has no score.

qid & accept id: (16397116, 16397874) query: How can I send an MMS via a GSM/GPRS modem connected to a linux computer? soup:

Why do you want to do this? Its an overly complicated process and there is a reason there are MMSC gateways available. You only use the GPRS part to establish a PPP connection, then the rest of the stuff happens over IP.

\n

I strongly suggest you use a gateway for this, and don't do this manually.

\n

In order to establish the PPP connection:

\n
    \n
  1. AT+CGDCONT? This should respond with the context you are on. This means you are ready to attach/connect.
  2. \n
  3. AT+CGATT=1 (attach your modem)
  4. \n
  5. AT+CGDATA=? (check what is the data mode)
  6. \n
  7. AT+CGACT=1 (activate your connection)
  8. \n
\n

Now you are on PPP, and then you talk over the modem using whatever your provider is using. It could be anything from direct HTTP to MMSE protocol.

\n

For example, here is a complete transcript over HTTP. First, we need to setup the modem and connection information. All these commands should get a response of OK from the modem.

\n
AT+CMMSINIT # Initialize the MMS method\nAT+CMMSCURL="some.url.com" # the MMS center URL\nAT+CMMSCID=1 # Set bearer\nAT+CMMSPROTO="1.1.1.1",8080 # MMS Proxy information\nAT+SAPBR=3,1,"Contype","GPRS" # How you are sending\nAT+SAPBR=3,1,"APN","foobar" # Set the APN\nAT+SAPBR=1,1 # Activate the bearer context\n
\n

Next, we prepare the message:

\n
> AT+CMMSEDIT=1  # Enter edit mode\nOK\n> AT+CMMSDOWN="PIC",54321,30000 # Download a pic that is 54321 bytes\n                                # and set the latency\n                                # for the download to 30000 ms\nCONNECT                         # This means, ready to receive data\n                                # so send your file\nOK                              # Data received\n> AT+CMMSRECP="123456789"       # Set the recipient\nOK\n> AT+CMMSVIEW                   # View your message\n(your message)\nOK\n> AT+CMMSSEND                   # Send the message\nOK                              # Message sent\n> AT+CMMSEDIT=0                 # Exit edit mode, and clear the buffer\nOK\n
\n

This of course, is specific to the modem that I was using. Your results may vary. I can tell you that this is an exercise in futility. Go with a proper provider if you want to actually send MMS messages.

\n soup wrap:

Why do you want to do this? Its an overly complicated process and there is a reason there are MMSC gateways available. You only use the GPRS part to establish a PPP connection, then the rest of the stuff happens over IP.

I strongly suggest you use a gateway for this, and don't do this manually.

In order to establish the PPP connection:

  1. AT+CGDCONT? This should respond with the context you are on. This means you are ready to attach/connect.
  2. AT+CGATT=1 (attach your modem)
  3. AT+CGDATA=? (check what is the data mode)
  4. AT+CGACT=1 (activate your connection)

Now you are on PPP, and then you talk over the modem using whatever your provider is using. It could be anything from direct HTTP to MMSE protocol.

For example, here is a complete transcript over HTTP. First, we need to setup the modem and connection information. All these commands should get a response of OK from the modem.

AT+CMMSINIT # Initialize the MMS method
AT+CMMSCURL="some.url.com" # the MMS center URL
AT+CMMSCID=1 # Set bearer
AT+CMMSPROTO="1.1.1.1",8080 # MMS Proxy information
AT+SAPBR=3,1,"Contype","GPRS" # How you are sending
AT+SAPBR=3,1,"APN","foobar" # Set the APN
AT+SAPBR=1,1 # Activate the bearer context

Next, we prepare the message:

> AT+CMMSEDIT=1  # Enter edit mode
OK
> AT+CMMSDOWN="PIC",54321,30000 # Download a pic that is 54321 bytes
                                # and set the latency
                                # for the download to 30000 ms
CONNECT                         # This means, ready to receive data
                                # so send your file
OK                              # Data received
> AT+CMMSRECP="123456789"       # Set the recipient
OK
> AT+CMMSVIEW                   # View your message
(your message)
OK
> AT+CMMSSEND                   # Send the message
OK                              # Message sent
> AT+CMMSEDIT=0                 # Exit edit mode, and clear the buffer
OK

This of course, is specific to the modem that I was using. Your results may vary. I can tell you that this is an exercise in futility. Go with a proper provider if you want to actually send MMS messages.

qid & accept id: (16402525, 16402545) query: Python: Read whitespace separated strings from file similar to readline soup:

You'd need to create a wrapper function; this is easy enough:

\n
def read_by_tokens(fileobj):\n    for line in fileobj:\n        for token in line.split():\n            yield token\n
\n

Note that .readline() doesn't just read a file character by character until a newline is encountered; the file is read in blocks (a buffer) to improve performance.

\n

The above method reads the file by lines but yields the result split on whitespace. Use it like:

\n
with open('somefilename') as f:\n    for token in read_by_tokens(f):\n        print(token)\n
\n

Because read_by_tokens() is a generator, you either need to loop directly over the function result, or use the next() function to get tokens one by one:

\n
with open('somefilename') as f:\n    tokenized = read_by_tokens(f)\n\n    # read first two tokens separately\n    first_token = next(tokenized)\n    second_token = next(tokenized)\n\n    for token in tokenized:\n        # loops over all tokens *except the first two*\n        print(token)\n
\n soup wrap:

You'd need to create a wrapper function; this is easy enough:

def read_by_tokens(fileobj):
    for line in fileobj:
        for token in line.split():
            yield token

Note that .readline() doesn't just read a file character by character until a newline is encountered; the file is read in blocks (a buffer) to improve performance.

The above method reads the file by lines but yields the result split on whitespace. Use it like:

with open('somefilename') as f:
    for token in read_by_tokens(f):
        print(token)

Because read_by_tokens() is a generator, you either need to loop directly over the function result, or use the next() function to get tokens one by one:

with open('somefilename') as f:
    tokenized = read_by_tokens(f)

    # read first two tokens separately
    first_token = next(tokenized)
    second_token = next(tokenized)

    for token in tokenized:
        # loops over all tokens *except the first two*
        print(token)
qid & accept id: (16421050, 16421079) query: Find and replace a string in Python soup:

You want to add "Short" to the first and last word of the string...My advice would be to split and then use indexing and then join!

\n
In [202]: line = "Teacher   =  Small   |1-2|   Student"\n\nIn [203]: line = line.split()\n\nIn [204]: line[0] += "Short"\n\nIn [205]: line[-1] += "Short"\n\nIn [206]: line = "  ".join(line)\n\nIn [207]: line\nOut[207]: 'TeacherShort  =  Small  |1-2|  StudentShort'\n
\n

I think it would be useful to have this in a function:

\n
def customize_string(string,add_on):\n    if "small" in string:\n        line = string.split()\n        line[0] += add_on\n        line[-1] += add_on\n        return "  ".join(line)\n    else:\n        return string\n
\n

here is using it to show that it works!

\n
In [219]: customize_string(line,"Short")\nOut[219]: 'TeacherShort  =  Small  |1-2|  StudentShort'\n
\n soup wrap:

You want to add "Short" to the first and last word of the string...My advice would be to split and then use indexing and then join!

In [202]: line = "Teacher   =  Small   |1-2|   Student"

In [203]: line = line.split()

In [204]: line[0] += "Short"

In [205]: line[-1] += "Short"

In [206]: line = "  ".join(line)

In [207]: line
Out[207]: 'TeacherShort  =  Small  |1-2|  StudentShort'

I think it would be useful to have this in a function:

def customize_string(string,add_on):
    if "small" in string:
        line = string.split()
        line[0] += add_on
        line[-1] += add_on
        return "  ".join(line)
    else:
        return string

here is using it to show that it works!

In [219]: customize_string(line,"Short")
Out[219]: 'TeacherShort  =  Small  |1-2|  StudentShort'
qid & accept id: (16500670, 16500755) query: Method to find substring soup:

You could try something like this:

\n
In [1]: m = 'college'\n\nIn [2]: s = 'col'\n\nIn [3]: if any(m[i:i+len(s)] == s for i in range(len(m)-len(s)+1)):\n   ...:     print 'Present'\n   ...: else:\n   ...:     print 'Not present'\n   ...:     \nPresent\n
\n

Where the any checks every substring of m of length len(s) and sees if it equals s. If so, it returns True and stops further processing (this is called 'short-circuiting' and is pretty similar to the break you have above).

\n

Here is what the any piece would look like if we replaced it with a list comprehension and took out the equality comparison:

\n
In [4]: [m[i:i+len(s)] for i in range(len(m)-len(s)+1)]\nOut[4]: ['col', 'oll', 'lle', 'leg', 'ege']\n
\n soup wrap:

You could try something like this:

In [1]: m = 'college'

In [2]: s = 'col'

In [3]: if any(m[i:i+len(s)] == s for i in range(len(m)-len(s)+1)):
   ...:     print 'Present'
   ...: else:
   ...:     print 'Not present'
   ...:     
Present

Where the any checks every substring of m of length len(s) and sees if it equals s. If so, it returns True and stops further processing (this is called 'short-circuiting' and is pretty similar to the break you have above).

Here is what the any piece would look like if we replaced it with a list comprehension and took out the equality comparison:

In [4]: [m[i:i+len(s)] for i in range(len(m)-len(s)+1)]
Out[4]: ['col', 'oll', 'lle', 'leg', 'ege']
qid & accept id: (16515465, 16597695) query: Is it possible to map a discontiuous data on disk to an array with python? soup:

I posted another answer because for the example given here numpy.memmap worked:

\n
offset = 0\ndata1 = np.memmap('tmp', dtype='i', mode='r+', order='F',\n                  offset=0, shape=(size1))\noffset += size1*byte_size\ndata2 = np.memmap('tmp', dtype='i', mode='r+', order='F',\n                  offset=offset, shape=(size2))\noffset += size1*byte_size\ndata3 = np.memmap('tmp', dtype='i', mode='r+', order='F',\n                  offset=offset, shape=(size3))\n
\n

for int32 byte_size=32/8, for int16 byte_size=16/8 and so forth...

\n

If the sizes are constant, you can load the data in a 2D array like:

\n
shape = (total_length/size,size)\ndata = np.memmap('tmp', dtype='i', mode='r+', order='F', shape=shape)\n
\n

You can change the memmap object as long as you want. It is even possible to make arrays sharing the same elements. In that case the changes made in one are automatically updated in the other.

\n

Other references:

\n\n soup wrap:

I posted another answer because for the example given here numpy.memmap worked:

offset = 0
data1 = np.memmap('tmp', dtype='i', mode='r+', order='F',
                  offset=0, shape=(size1))
offset += size1*byte_size
data2 = np.memmap('tmp', dtype='i', mode='r+', order='F',
                  offset=offset, shape=(size2))
offset += size1*byte_size
data3 = np.memmap('tmp', dtype='i', mode='r+', order='F',
                  offset=offset, shape=(size3))

for int32 byte_size=32/8, for int16 byte_size=16/8 and so forth...

If the sizes are constant, you can load the data in a 2D array like:

shape = (total_length/size,size)
data = np.memmap('tmp', dtype='i', mode='r+', order='F', shape=shape)

You can change the memmap object as long as you want. It is even possible to make arrays sharing the same elements. In that case the changes made in one are automatically updated in the other.

Other references:

qid & accept id: (16522362, 16522369) query: Concatenate elements of a list soup:

Use str.join():

\n
s = ''.join(l)\n
\n

The string on which you call this is used as the delimiter between the strings in l:

\n
>>> l=['a', 'b', 'c']\n>>> ''.join(l)\n'abc'\n>>> '-'.join(l)\n'a-b-c'\n>>> ' - spam ham and eggs - '.join(l)\n'a - spam ham and eggs - b - spam ham and eggs - c'\n
\n

Using str.join() is much faster than concatenating your elements one by one, as that has to create a new string object for every concatenation. str.join() only has to create one new string object.

\n

Note that str.join() will loop over the input sequence twice. Once to calculate how big the output string needs to be, and once again to build it. As a side-effect, that means that using a list comprehension instead of a generator expression is faster:

\n
slower_gen_expr = ' - '.join('{}: {}'.format(key, value) for key, value in some_dict)\nfaster_list_comp = ' - '.join(['{}: {}'.format(key, value) for key, value in some_dict])\n
\n soup wrap:

Use str.join():

s = ''.join(l)

The string on which you call this is used as the delimiter between the strings in l:

>>> l=['a', 'b', 'c']
>>> ''.join(l)
'abc'
>>> '-'.join(l)
'a-b-c'
>>> ' - spam ham and eggs - '.join(l)
'a - spam ham and eggs - b - spam ham and eggs - c'

Using str.join() is much faster than concatenating your elements one by one, as that has to create a new string object for every concatenation. str.join() only has to create one new string object.

Note that str.join() will loop over the input sequence twice. Once to calculate how big the output string needs to be, and once again to build it. As a side-effect, that means that using a list comprehension instead of a generator expression is faster:

slower_gen_expr = ' - '.join('{}: {}'.format(key, value) for key, value in some_dict)
faster_list_comp = ' - '.join(['{}: {}'.format(key, value) for key, value in some_dict])
qid & accept id: (16530710, 16530756) query: Python: Append a list to an existing list assigned to a key in a dictionary? soup:

Check this out:

\n
>>> tst =  {'taste': ('sweet', 'sour', 'juicy', 'melon-like')}\n>>> tst.get('taste', ()) #default to () if does not exist.  \n('sweet', 'sour', 'juicy', 'melon-like')\n>>> key_list=['yuck!','tasty','smoothie']\n>>> tst['taste'] = tst.get('taste') + tuple(key_list)\n>>> tst\n{'taste': ('sweet', 'sour', 'juicy', 'melon-like', 'yuck!', 'tasty', 'smoothie')}\n
\n

To retrieve,

\n
>>> tst = {'taste': ('sweet', 'sour', 'juicy', 'melon-like', 'yuck!', 'tasty', 'smoothie')}\n>>> taste = tst.get('taste')\n>>> taste\n('sweet', 'sour', 'juicy', 'melon-like', 'yuck!', 'tasty', 'smoothie')\n>>> 'sour' in taste\nTrue\n>>> 'sour1' in taste\nFalse\n
\n soup wrap:

Check this out:

>>> tst =  {'taste': ('sweet', 'sour', 'juicy', 'melon-like')}
>>> tst.get('taste', ()) #default to () if does not exist.  
('sweet', 'sour', 'juicy', 'melon-like')
>>> key_list=['yuck!','tasty','smoothie']
>>> tst['taste'] = tst.get('taste') + tuple(key_list)
>>> tst
{'taste': ('sweet', 'sour', 'juicy', 'melon-like', 'yuck!', 'tasty', 'smoothie')}

To retrieve,

>>> tst = {'taste': ('sweet', 'sour', 'juicy', 'melon-like', 'yuck!', 'tasty', 'smoothie')}
>>> taste = tst.get('taste')
>>> taste
('sweet', 'sour', 'juicy', 'melon-like', 'yuck!', 'tasty', 'smoothie')
>>> 'sour' in taste
True
>>> 'sour1' in taste
False
qid & accept id: (16595299, 16595491) query: Python script to replace #define values in C file soup:

Instead of tokenizing the entire source file, you could maybe just use replace in strings, like this:

\n
for line in f1:\n    for i in range(len(KEYWORDS)):\n        line = line.replace("#define " + KEYWORDS[i], "#define " + KEYWORDS[i] + " " + str(VALS[i]))\n    f2.write(line)\n
\n

Indeed this would not work on variables that already have values, it would not replace their old values only append to them.

\n

So the solution OP suggested was instead of replacing the string in the line, to simply rewrite the entire line like this:

\n
for line in f1:\n    for i in range(len(KEYWORDS)):\n        if line.startswith("#define") and KEYWORDS[i] in line:\n            line = "#define " + KEYWORDS[i] + " " + str(VALS[i])+"\n"\n    f2.write(line)\n
\n

Another solution would be to use a regular expression (re.sub() instead of line.replace())

\n soup wrap:

Instead of tokenizing the entire source file, you could maybe just use replace in strings, like this:

for line in f1:
    for i in range(len(KEYWORDS)):
        line = line.replace("#define " + KEYWORDS[i], "#define " + KEYWORDS[i] + " " + str(VALS[i]))
    f2.write(line)

Indeed this would not work on variables that already have values, it would not replace their old values only append to them.

So the solution OP suggested was instead of replacing the string in the line, to simply rewrite the entire line like this:

for line in f1:
    for i in range(len(KEYWORDS)):
        if line.startswith("#define") and KEYWORDS[i] in line:
            line = "#define " + KEYWORDS[i] + " " + str(VALS[i])+"\n"
    f2.write(line)

Another solution would be to use a regular expression (re.sub() instead of line.replace())

qid & accept id: (16615630, 16615667) query: Adding a simple value to a string soup:

How about?

\n
path2 = '"C:\\Users\\bgbesase\\Documents\\Brent\\Code\\Visual Studio' + '"'\n
\n

Or, as you had it

\n
final = path2 + w\n
\n

It's also worth mentioning that you can use raw strings (r'stuff') to avoid having to escape backslashes. Ex.

\n
path2 = r'"C:\Users\bgbesase\Documents\Brent\Code\Visual Studio'\n
\n soup wrap:

How about?

path2 = '"C:\\Users\\bgbesase\\Documents\\Brent\\Code\\Visual Studio' + '"'

Or, as you had it

final = path2 + w

It's also worth mentioning that you can use raw strings (r'stuff') to avoid having to escape backslashes. Ex.

path2 = r'"C:\Users\bgbesase\Documents\Brent\Code\Visual Studio'
qid & accept id: (16659818, 16660372) query: How to read complex numbers from file with numpy? soup:

Here's a more direct way than @Jeff's answer, telling loadtxt to load it in straight to a complex array, using a helper function parse_pair that maps (1.2,0.16) to 1.20+0.16j:

\n
>>> import re\n>>> import numpy as np\n\n>>> pair = re.compile(r'\(([^,\)]+),([^,\)]+)\)')\n>>> def parse_pair(s):\n...    return complex(*map(float, pair.match(s).groups()))\n\n>>> s = '''1 (1.2,0.16) (2.8,1.1)\n2 (2.85,6.9) (5.8,2.2)'''\n>>> from cStringIO import StringIO\n>>> f = StringIO(s)\n\n>>> np.loadtxt(f, delimiter=' ', dtype=np.complex,\n...            converters={1: parse_pair, 2: parse_pair})\narray([[ 1.00+0.j  ,  1.20+0.16j,  2.80+1.1j ],\n       [ 2.00+0.j  ,  2.85+6.9j ,  5.80+2.2j ]])\n
\n

Or in pandas:

\n
>>> import pandas as pd\n>>> f.seek(0)\n>>> pd.read_csv(f, delimiter=' ', index_col=0, names=['a', 'b'],\n...             converters={1: parse_pair, 2: parse_pair})\n             a           b\n1  (1.2+0.16j)  (2.8+1.1j)\n2  (2.85+6.9j)  (5.8+2.2j)\n
\n soup wrap:

Here's a more direct way than @Jeff's answer, telling loadtxt to load it in straight to a complex array, using a helper function parse_pair that maps (1.2,0.16) to 1.20+0.16j:

>>> import re
>>> import numpy as np

>>> pair = re.compile(r'\(([^,\)]+),([^,\)]+)\)')
>>> def parse_pair(s):
...    return complex(*map(float, pair.match(s).groups()))

>>> s = '''1 (1.2,0.16) (2.8,1.1)
2 (2.85,6.9) (5.8,2.2)'''
>>> from cStringIO import StringIO
>>> f = StringIO(s)

>>> np.loadtxt(f, delimiter=' ', dtype=np.complex,
...            converters={1: parse_pair, 2: parse_pair})
array([[ 1.00+0.j  ,  1.20+0.16j,  2.80+1.1j ],
       [ 2.00+0.j  ,  2.85+6.9j ,  5.80+2.2j ]])

Or in pandas:

>>> import pandas as pd
>>> f.seek(0)
>>> pd.read_csv(f, delimiter=' ', index_col=0, names=['a', 'b'],
...             converters={1: parse_pair, 2: parse_pair})
             a           b
1  (1.2+0.16j)  (2.8+1.1j)
2  (2.85+6.9j)  (5.8+2.2j)
qid & accept id: (16661101, 16661406) query: How to do operations with two vectors of different format in python soup:

You could do this:

\n
import numpy as np\nfrom scipy.sparse import csr_matrix\n\nx = np.arange(5)+1\n\ny = [1, 0, 0, 1, 2]\ny = csr_matrix(y)\n\nx2 = 1.0 / np.matrix(x)\n\nz = y.multiply(x2)\n
\n

Result:

\n
>>> z\nmatrix([[ 1.  ,  0.  ,  0.  ,  0.25,  0.4 ]])\n
\n soup wrap:

You could do this:

import numpy as np
from scipy.sparse import csr_matrix

x = np.arange(5)+1

y = [1, 0, 0, 1, 2]
y = csr_matrix(y)

x2 = 1.0 / np.matrix(x)

z = y.multiply(x2)

Result:

>>> z
matrix([[ 1.  ,  0.  ,  0.  ,  0.25,  0.4 ]])
qid & accept id: (16689117, 16689191) query: Compare two lists in python and print the output soup:

A membership check on a set will be significantly faster than manually iterating and checking:

\n
children = {child.get('value') for child in xml_data}\nfor item in main_list:\n    if item[4] in children:\n        print(item[4])\n
\n

Here we construct the set with a simple set comprehension.

\n

Note that it may be worth swapping what data is in the set - if main_list is longer, it will be more efficient to make the set of that data.

\n
items = {item[4] for item in main_list}\nfor child in xml_data:\n    value = child.get('value')\n    if value in items:\n        print(value)\n
\n

These both also only do the processing on the data once, rather than each time a check is made.

\n

Note that a set will not handle duplicate values or order on the set side - if that is important, this isn't a valid solution. This version will only use the order/duplicates from the data you are iterating over. If that isn't valid, then you can still process the data beforehand, and use itertools.product() to iterate a little quicker.

\n
items = [item[4] for item in main_list]\nchildren = [child.get('value') for child in xml_data]\n\nfor item, child in itertools.product(items, children):\n    if item == child:\n        print(item)\n
\n

As Karl Knechtel points out, if you really don't care about order to duplicates at all, you can just do a set intersection:

\n
for item in ({child.get('value') for child in xml_data} &\n             {item[4] for item in main_list}):\n    print(item)\n
\n soup wrap:

A membership check on a set will be significantly faster than manually iterating and checking:

children = {child.get('value') for child in xml_data}
for item in main_list:
    if item[4] in children:
        print(item[4])

Here we construct the set with a simple set comprehension.

Note that it may be worth swapping what data is in the set - if main_list is longer, it will be more efficient to make the set of that data.

items = {item[4] for item in main_list}
for child in xml_data:
    value = child.get('value')
    if value in items:
        print(value)

These both also only do the processing on the data once, rather than each time a check is made.

Note that a set will not handle duplicate values or order on the set side - if that is important, this isn't a valid solution. This version will only use the order/duplicates from the data you are iterating over. If that isn't valid, then you can still process the data beforehand, and use itertools.product() to iterate a little quicker.

items = [item[4] for item in main_list]
children = [child.get('value') for child in xml_data]

for item, child in itertools.product(items, children):
    if item == child:
        print(item)

As Karl Knechtel points out, if you really don't care about order to duplicates at all, you can just do a set intersection:

for item in ({child.get('value') for child in xml_data} &
             {item[4] for item in main_list}):
    print(item)
qid & accept id: (16689457, 16692650) query: Read a number in a word from a file in python soup:

try the following:

\n
import re\n\nnumber_regex = r'#define\s+VERSION_M[AJIN]+OR\s+(\d+)'\n\nwith open("guidefs.h") as f:\n    your_text = f.read()\n    all_numbers = re.findall(number_regex, your_text)\n    # This will return ['2', '1']\n
\n

This will work for both your MAJOR and your MINOR numbers.

\n

If you want a list of integers rather than a list of strings you can use a list comprehension by adding the following line:

\n
all_numbers = [int(x) for x in all_numbers]\n# This will return [2, 1]\n
\n soup wrap:

try the following:

import re

number_regex = r'#define\s+VERSION_M[AJIN]+OR\s+(\d+)'

with open("guidefs.h") as f:
    your_text = f.read()
    all_numbers = re.findall(number_regex, your_text)
    # This will return ['2', '1']

This will work for both your MAJOR and your MINOR numbers.

If you want a list of integers rather than a list of strings you can use a list comprehension by adding the following line:

all_numbers = [int(x) for x in all_numbers]
# This will return [2, 1]
qid & accept id: (16750376, 16750513) query: Universally create Derived class from Base in python soup:

You can use inheritance:

\n
class FileProxyGetter(ProxyGetter):\n    ...\n    def MakeProxy(self, *args, **kwargs):\n        return Proxy.fromstring(*args, **kwargs)\n    def Get(self):\n        ...\n           proxies.append(self.MakeProxy(l[:-1]))\n        ...\n    ...\nclass FileSecureProxyGetter(FileProxyGetter):\n    def MakeProxy(self, *args, **kwargs):\n        return SecureProxy.fromstring(*args, **kwargs)\n
\n

but it's probably more useful in this case to use composition.

\n
class FileProxyGetter(ProxyGetter):\n    def __init__(self, proxyclass, fname = "d:\\proxies.txt"):\n        self.proxyClass = proxyclass\n        self.fileName = fname\n    ...\n    def Get(self):\n        ...\n            proxies.append(self.proxyclass.fromstring(l[:-1]))\n        ...\n    ...\n\n# use this as such\nFileProxyGetter(Proxy, "proxies.txt")\nFileProxyGetter(SecureProxy, "secure_proxies.txt")\n
\n

EDIT: A dirty trick in python to switch the type of an object:

\n
>>> class A(object):\n...     def foo(self):\n...         print 'hello A'\n... \n>>> class B(object):\n...     def foo(self):\n...         print 'hello B'\n... \n>>> a = A()\n>>> a.foo()\nhello A\n>>> a.__class__\n\n>>> a.__class__ = B\n>>> a.foo()\nhello B\n
\n

Another dirty trick for two objects of different types to share the same state:

\n
>>> class B(object):\n...     def rename(self, name):\n...         self.name = name\n... \n>>> class A(object):\n...     def say(self):\n...         print 'Hello', self.name\n... \n>>> a, b = A(), B()\n>>> a.__dict__ = b.__dict__\n>>> b.rename('john')\n>>> a.say()\nHello john\n>>> a.rename('mary')\nTraceback (most recent call last):\n  File "", line 1, in \nAttributeError: 'A' object has no attribute 'rename'\n>>> b.say()\nTraceback (most recent call last):\n  File "", line 1, in \nAttributeError: 'B' object has no attribute 'say'\n
\n

However, these tricks, while possible in Python, I would not call them pythonic nor a good OO design.

\n

Another possibility in Python 3.x and up, which had removed "unbound method" in place of using regular function:

\n
>>> class A(object):\n...     def say(self):\n...         print('Hello', self.name)\n... \n>>> class B(object):\n...     def rename(self, name):\n...         self.name = name + name\n... \n>>> a = A()\n>>> B.rename(a, 'josh')\n>>> a.say()\nHello joshjosh\n
\n soup wrap:

You can use inheritance:

class FileProxyGetter(ProxyGetter):
    ...
    def MakeProxy(self, *args, **kwargs):
        return Proxy.fromstring(*args, **kwargs)
    def Get(self):
        ...
           proxies.append(self.MakeProxy(l[:-1]))
        ...
    ...
class FileSecureProxyGetter(FileProxyGetter):
    def MakeProxy(self, *args, **kwargs):
        return SecureProxy.fromstring(*args, **kwargs)

but it's probably more useful in this case to use composition.

class FileProxyGetter(ProxyGetter):
    def __init__(self, proxyclass, fname = "d:\\proxies.txt"):
        self.proxyClass = proxyclass
        self.fileName = fname
    ...
    def Get(self):
        ...
            proxies.append(self.proxyclass.fromstring(l[:-1]))
        ...
    ...

# use this as such
FileProxyGetter(Proxy, "proxies.txt")
FileProxyGetter(SecureProxy, "secure_proxies.txt")

EDIT: A dirty trick in python to switch the type of an object:

>>> class A(object):
...     def foo(self):
...         print 'hello A'
... 
>>> class B(object):
...     def foo(self):
...         print 'hello B'
... 
>>> a = A()
>>> a.foo()
hello A
>>> a.__class__

>>> a.__class__ = B
>>> a.foo()
hello B

Another dirty trick for two objects of different types to share the same state:

>>> class B(object):
...     def rename(self, name):
...         self.name = name
... 
>>> class A(object):
...     def say(self):
...         print 'Hello', self.name
... 
>>> a, b = A(), B()
>>> a.__dict__ = b.__dict__
>>> b.rename('john')
>>> a.say()
Hello john
>>> a.rename('mary')
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'A' object has no attribute 'rename'
>>> b.say()
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'B' object has no attribute 'say'

However, these tricks, while possible in Python, I would not call them pythonic nor a good OO design.

Another possibility in Python 3.x and up, which had removed "unbound method" in place of using regular function:

>>> class A(object):
...     def say(self):
...         print('Hello', self.name)
... 
>>> class B(object):
...     def rename(self, name):
...         self.name = name + name
... 
>>> a = A()
>>> B.rename(a, 'josh')
>>> a.say()
Hello joshjosh
qid & accept id: (16773583, 16773765) query: Python: Extracting specific data with html parser soup:

Looks like you forgot to set self.inLink = False in handle_starttag by default:

\n
from HTMLParser import HTMLParser\n\n\nclass AllLanguages(HTMLParser):\n    def __init__(self):\n        HTMLParser.__init__(self)\n        self.inLink = False\n        self.dataArray = []\n        self.countLanguages = 0\n        self.lasttag = None\n        self.lastname = None\n        self.lastvalue = None\n\n    def handle_starttag(self, tag, attrs):\n        self.inLink = False\n        if tag == 'a':\n            for name, value in attrs:\n                if name == 'class' and value == 'Vocabulary':\n                    self.countLanguages += 1\n                    self.inLink = True\n                    self.lasttag = tag\n\n    def handle_endtag(self, tag):\n        if tag == "a":\n            self.inlink = False\n\n    def handle_data(self, data):\n        if self.lasttag == 'a' and self.inLink and data.strip():\n            print data\n\n\nparser = AllLanguages()\nparser.feed("""\n\nTest\n\nSwahili\nThilo Schadeberg\nEnglish\nRussian\n\n""")\n
\n

prints:

\n
Swahili\nEnglish\nRussian\n
\n

Also, take a look at:

\n\n

Hope that helps.

\n soup wrap:

Looks like you forgot to set self.inLink = False in handle_starttag by default:

from HTMLParser import HTMLParser


class AllLanguages(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.inLink = False
        self.dataArray = []
        self.countLanguages = 0
        self.lasttag = None
        self.lastname = None
        self.lastvalue = None

    def handle_starttag(self, tag, attrs):
        self.inLink = False
        if tag == 'a':
            for name, value in attrs:
                if name == 'class' and value == 'Vocabulary':
                    self.countLanguages += 1
                    self.inLink = True
                    self.lasttag = tag

    def handle_endtag(self, tag):
        if tag == "a":
            self.inlink = False

    def handle_data(self, data):
        if self.lasttag == 'a' and self.inLink and data.strip():
            print data


parser = AllLanguages()
parser.feed("""

Test

Swahili
Thilo Schadeberg
English
Russian

""")

prints:

Swahili
English
Russian

Also, take a look at:

Hope that helps.

qid & accept id: (16782291, 16782570) query: How do I convert data from a list of lists to a readable table (or group of columns)? soup:

A bit string formatting might help:

\n
>>> data = [['Knight', '500', '500', '0', '0'],\n            ['Mage', '0', '0', '500', '500'],\n            ['Mage', '0', '0', '500', '500'],\n            ['Mage', '0', '0', '500', '500'],\n            ['Mage', '0', '0', '500', '500']]\n\n>>> frmt = '{:10s}' + 4 * '{:>12s}'\n>>> for line in data::\n        print(frmt.format(*line))\n
\n

results in:

\n
Knight             500         500           0           0\nMage                 0           0         500         500\nMage                 0           0         500         500\nMage                 0           0         500         500\nMage                 0           0         500         500\n
\n soup wrap:

A bit string formatting might help:

>>> data = [['Knight', '500', '500', '0', '0'],
            ['Mage', '0', '0', '500', '500'],
            ['Mage', '0', '0', '500', '500'],
            ['Mage', '0', '0', '500', '500'],
            ['Mage', '0', '0', '500', '500']]

>>> frmt = '{:10s}' + 4 * '{:>12s}'
>>> for line in data::
        print(frmt.format(*line))

results in:

Knight             500         500           0           0
Mage                 0           0         500         500
Mage                 0           0         500         500
Mage                 0           0         500         500
Mage                 0           0         500         500
qid & accept id: (16862690, 16863218) query: How to dynamically create classes inside a module-level initialize() method in Python soup:

Well, you'll have to find some way to pass the engine variable to your custom_db_api module. This might be marginally cleaner...

\n
Base = declarative_base()\n\nclass Something(Base):\n    pass\n\ndef initialize(engine):\n    Something.__table__ = Table('something', Base.metadata, autoload_with=engine)\n
\n

...or if you can infer the correct engine initialization parameter from some 'global', like sys.argv, you could use something like this...

\n
import sys\n\nBase = declarative_base()\nif len(sys.argv) > 1 and sys.argv[1] == '--use-alt-db':\n    engine = create_engine('mysql://user:pass@alt_host/db_name')\nelse:\n    engine = create_engine('mysql://user:pass@main_host/db_name')\n\ntable = Table('something', Base.metadata, autoload_with=engine)\n\nclass Something(Base):\n    __table__ = table\n
\n

It kinda depends on how you intend to tell the program which DB to use.

\n soup wrap:

Well, you'll have to find some way to pass the engine variable to your custom_db_api module. This might be marginally cleaner...

Base = declarative_base()

class Something(Base):
    pass

def initialize(engine):
    Something.__table__ = Table('something', Base.metadata, autoload_with=engine)

...or if you can infer the correct engine initialization parameter from some 'global', like sys.argv, you could use something like this...

import sys

Base = declarative_base()
if len(sys.argv) > 1 and sys.argv[1] == '--use-alt-db':
    engine = create_engine('mysql://user:pass@alt_host/db_name')
else:
    engine = create_engine('mysql://user:pass@main_host/db_name')

table = Table('something', Base.metadata, autoload_with=engine)

class Something(Base):
    __table__ = table

It kinda depends on how you intend to tell the program which DB to use.

qid & accept id: (16876342, 16878411) query: Python string extraction from Subprocess soup:

If you're on linux, you may try this to get the MAC address:

\n
iface = 'wlan0'\nmac_addr = open('/sys/class/net/%s/address' % iface).read().rstrip()\n
\n

For general string extraction, you may use the re module:

\n
import subprocess, re\n\nRE_MAC = re.compile(r'\bHWaddr\s+(((?(2):|)[\dA-Fa-f]{2}){6})\b')\nmatch = RE_MAC.search(subprocess.check_output(["ifconfig", "wlan0"]))\nif match:\n    mac_addr = match.group(1)\n
\n

Note that my version of ifconfig (net-tools 1.60) uses ether rather than HWaddr, illustrating one problem of parsing the output of such programs.

\n soup wrap:

If you're on linux, you may try this to get the MAC address:

iface = 'wlan0'
mac_addr = open('/sys/class/net/%s/address' % iface).read().rstrip()

For general string extraction, you may use the re module:

import subprocess, re

RE_MAC = re.compile(r'\bHWaddr\s+(((?(2):|)[\dA-Fa-f]{2}){6})\b')
match = RE_MAC.search(subprocess.check_output(["ifconfig", "wlan0"]))
if match:
    mac_addr = match.group(1)

Note that my version of ifconfig (net-tools 1.60) uses ether rather than HWaddr, illustrating one problem of parsing the output of such programs.

qid & accept id: (16897609, 16897735) query: Wxpython closing windows soup:

Yes. You just need to save a reference to the 2nd frame. Something like this should suffice:

\n
self.secondFrame = MySecondFrame()\n
\n

Then in the first frame's close method, you can just do something like this:

\n
self.secondFrame.Close()\n
\n

However, I should note that creating a frame without the usual toolbar goes against most OS GUI guidelines and users will likely be irritated by that design decision.

\n

EDIT: Yes, you can catch the event that occurs when the user presses the "X" button on the window via wx.EVT_CLOSE. When you do that, you need to call the main frame's Destroy() method instead of its Close method or you'll end up in an infinite loop since calling Close() fires EVT_CLOSE. You can still use Close() for the second frame though.

\n soup wrap:

Yes. You just need to save a reference to the 2nd frame. Something like this should suffice:

self.secondFrame = MySecondFrame()

Then in the first frame's close method, you can just do something like this:

self.secondFrame.Close()

However, I should note that creating a frame without the usual toolbar goes against most OS GUI guidelines and users will likely be irritated by that design decision.

EDIT: Yes, you can catch the event that occurs when the user presses the "X" button on the window via wx.EVT_CLOSE. When you do that, you need to call the main frame's Destroy() method instead of its Close method or you'll end up in an infinite loop since calling Close() fires EVT_CLOSE. You can still use Close() for the second frame though.

qid & accept id: (16965782, 16965825) query: Dictionary from variables that may not be initialized soup:

I'm a little hazy on what you actually want here, but maybe something like:

\n
d = {k:v for k,v in locals().items() if v is not None and not k.startswith('__')}\n
\n

Example:

\n
>>> x = 1\n>>> y = 3\n>>> z = None\n>>> d = {k:v for k,v in locals().items() if v is not None and not k.startswith('__')}\n>>> d\n{'y': 3, 'x': 1}\n
\n soup wrap:

I'm a little hazy on what you actually want here, but maybe something like:

d = {k:v for k,v in locals().items() if v is not None and not k.startswith('__')}

Example:

>>> x = 1
>>> y = 3
>>> z = None
>>> d = {k:v for k,v in locals().items() if v is not None and not k.startswith('__')}
>>> d
{'y': 3, 'x': 1}
qid & accept id: (16980554, 16980901) query: Developing Python modules - adding them to the Path soup:

Normally when you are developing a single application your directory structure will be similar to

\n
src/\n   |-myapp/\n          |-pkg_a/\n                 |-__init__.py\n                 |-foo.py\n          |-pkg_b/\n                 |-__init__.py\n                 |-bar.py\n   |-myapp.py\n
\n

This lets your whole project be reused as a package by others. In myapp.py you will typically have a short main function.

\n

You can import other modules of your application easily. For example, in pkg_b/bar.py you might have

\n
import myapp.pkg_a.foo\n
\n

I think it's the preferred way of organising your imports.

\n

You can do relative imports if you really want, they are described in PEP-328.

\n
import ..pkg_a.foo\n
\n

but personally I think, they are a bit ugly and difficult to maintain (that's arguable, of course).

\n

Of course, if one of your modules needs a module from another application it's a completely different story, since this application is an external dependency and you'll have to handle it.

\n soup wrap:

Normally when you are developing a single application your directory structure will be similar to

src/
   |-myapp/
          |-pkg_a/
                 |-__init__.py
                 |-foo.py
          |-pkg_b/
                 |-__init__.py
                 |-bar.py
   |-myapp.py

This lets your whole project be reused as a package by others. In myapp.py you will typically have a short main function.

You can import other modules of your application easily. For example, in pkg_b/bar.py you might have

import myapp.pkg_a.foo

I think it's the preferred way of organising your imports.

You can do relative imports if you really want, they are described in PEP-328.

import ..pkg_a.foo

but personally I think, they are a bit ugly and difficult to maintain (that's arguable, of course).

Of course, if one of your modules needs a module from another application it's a completely different story, since this application is an external dependency and you'll have to handle it.

qid & accept id: (16983863, 16988531) query: How to pass additional parameters (besides of arguments) to a function in Python soup:

If I've understood your question correctly, there are quite a number of ways to do what you want and avoid using global variables. Here they are.

\n

Given:

\n
x0 = 1\ndef fun2(f1, x):\n    return f1(x)\n
\n

All of these techniques accomplish your goal:

\n
#### #0 -- function attributes\ndef fun1(x):\n    return x + fun1.c\n\nfun1.c = 1;  y = fun2(fun1, x0);   print(y)   # --> 2\nfun1.c = 2;  y = fun2(fun1, x0);   print(y)   # --> 3\n\n#### #1 -- closure\ndef fun1(c):\n    def wrapper(x):\n        return x + c\n    return wrapper\n\ny = fun2(fun1(c=1), x0);   print(y)   # --> 2\ny = fun2(fun1(c=2), x0);   print(y)   # --> 3\n\n#### #2 -- functools.partial object\nfrom functools import partial\n\ndef fun1(x, c):\n    return x + c\n\ny = fun2(partial(fun1, c=1), x0);   print(y)   # --> 2\ny = fun2(partial(fun1, c=2), x0);   print(y)   # --> 3\n\n#### #3 -- function object (functor)\nclass Fun1(object):\n    def __init__(self, c):\n        self.c = c\n    def __call__(self, x):\n        return x + self.c\n\ny = fun2(Fun1(c=1), x0);   print(y)   # --> 2\ny = fun2(Fun1(c=2), x0);   print(y)   # --> 3\n\n#### #4 -- function decorator\ndef fun1(x, c):\n    return x + c\n\ndef decorate(c):\n    def wrapper(f):\n        def wrapped(x):\n            return f(x, c)\n        return wrapped\n    return wrapper\n\ny = fun2(decorate(c=1)(fun1), x0);   print(y)   # --> 2\ny = fun2(decorate(c=2)(fun1), x0);   print(y)   # --> 3\n
\n

Note that writing c= arguments wasn't always strictly required in the calls -- I just put it in all of the usage examples for consistency and because it makes it clearer how it's being passed.

\n soup wrap:

If I've understood your question correctly, there are quite a number of ways to do what you want and avoid using global variables. Here they are.

Given:

x0 = 1
def fun2(f1, x):
    return f1(x)

All of these techniques accomplish your goal:

#### #0 -- function attributes
def fun1(x):
    return x + fun1.c

fun1.c = 1;  y = fun2(fun1, x0);   print(y)   # --> 2
fun1.c = 2;  y = fun2(fun1, x0);   print(y)   # --> 3

#### #1 -- closure
def fun1(c):
    def wrapper(x):
        return x + c
    return wrapper

y = fun2(fun1(c=1), x0);   print(y)   # --> 2
y = fun2(fun1(c=2), x0);   print(y)   # --> 3

#### #2 -- functools.partial object
from functools import partial

def fun1(x, c):
    return x + c

y = fun2(partial(fun1, c=1), x0);   print(y)   # --> 2
y = fun2(partial(fun1, c=2), x0);   print(y)   # --> 3

#### #3 -- function object (functor)
class Fun1(object):
    def __init__(self, c):
        self.c = c
    def __call__(self, x):
        return x + self.c

y = fun2(Fun1(c=1), x0);   print(y)   # --> 2
y = fun2(Fun1(c=2), x0);   print(y)   # --> 3

#### #4 -- function decorator
def fun1(x, c):
    return x + c

def decorate(c):
    def wrapper(f):
        def wrapped(x):
            return f(x, c)
        return wrapped
    return wrapper

y = fun2(decorate(c=1)(fun1), x0);   print(y)   # --> 2
y = fun2(decorate(c=2)(fun1), x0);   print(y)   # --> 3

Note that writing c= arguments wasn't always strictly required in the calls -- I just put it in all of the usage examples for consistency and because it makes it clearer how it's being passed.

qid & accept id: (17013381, 17013459) query: List of (date, day_of_week) tuples soup:

The datetime object itself has a .weekday() attribute. You can add these in a separate loop:

\n
dateList = [(d, d.weekday()) for d in dateList]\n
\n

For your example code, that gives:

\n
[(datetime.datetime(2013, 2, 16, 0, 0), 5), (datetime.datetime(2013, 2, 17, 0, 0), 6), (datetime.datetime(2013, 2, 18, 0, 0), 0), (datetime.datetime(2013, 2, 19, 0, 0), 1), (datetime.datetime(2013, 2, 20, 0, 0), 2), (datetime.datetime(2013, 2, 21, 0, 0), 3), (datetime.datetime(2013, 2, 22, 0, 0), 4), (datetime.datetime(2013, 2, 23, 0, 0), 5), (datetime.datetime(2013, 2, 24, 0, 0), 6), (datetime.datetime(2013, 2, 25, 0, 0), 0), (datetime.datetime(2013, 2, 26, 0, 0), 1), (datetime.datetime(2013, 2, 27, 0, 0), 2), (datetime.datetime(2013, 2, 28, 0, 0), 3), (datetime.datetime(2013, 3, 1, 0, 0), 4), (datetime.datetime(2013, 3, 2, 0, 0), 5), (datetime.datetime(2013, 3, 3, 0, 0), 6), (datetime.datetime(2013, 3, 4, 0, 0), 0), (datetime.datetime(2013, 3, 5, 0, 0), 1), (datetime.datetime(2013, 3, 6, 0, 0), 2), (datetime.datetime(2013, 3, 7, 0, 0), 3), (datetime.datetime(2013, 3, 8, 0, 0), 4), (datetime.datetime(2013, 3, 9, 0, 0), 5), (datetime.datetime(2013, 3, 10, 0, 0), 6), (datetime.datetime(2013, 3, 11, 0, 0), 0), (datetime.datetime(2013, 3, 12, 0, 0), 1), (datetime.datetime(2013, 3, 13, 0, 0), 2), (datetime.datetime(2013, 3, 14, 0, 0), 3), (datetime.datetime(2013, 3, 15, 0, 0), 4), (datetime.datetime(2013, 3, 16, 0, 0), 5), (datetime.datetime(2013, 3, 17, 0, 0), 6), (datetime.datetime(2013, 3, 18, 0, 0), 0), (datetime.datetime(2013, 3, 19, 0, 0), 1), (datetime.datetime(2013, 3, 20, 0, 0), 2), (datetime.datetime(2013, 3, 21, 0, 0), 3), (datetime.datetime(2013, 3, 22, 0, 0), 4), (datetime.datetime(2013, 3, 23, 0, 0), 5), (datetime.datetime(2013, 3, 24, 0, 0), 6), (datetime.datetime(2013, 3, 25, 0, 0), 0), (datetime.datetime(2013, 3, 26, 0, 0), 1), (datetime.datetime(2013, 3, 27, 0, 0), 2), (datetime.datetime(2013, 3, 28, 0, 0), 3), (datetime.datetime(2013, 3, 29, 0, 0), 4), (datetime.datetime(2013, 3, 30, 0, 0), 5), (datetime.datetime(2013, 3, 31, 0, 0), 6), (datetime.datetime(2013, 4, 1, 0, 0), 0), (datetime.datetime(2013, 4, 2, 0, 0), 1), (datetime.datetime(2013, 4, 3, 0, 0), 2), (datetime.datetime(2013, 4, 4, 0, 0), 3), (datetime.datetime(2013, 4, 5, 0, 0), 4), (datetime.datetime(2013, 4, 6, 0, 0), 5), (datetime.datetime(2013, 4, 7, 0, 0), 6), (datetime.datetime(2013, 4, 8, 0, 0), 0), (datetime.datetime(2013, 4, 9, 0, 0), 1), (datetime.datetime(2013, 4, 10, 0, 0), 2), (datetime.datetime(2013, 4, 11, 0, 0), 3), (datetime.datetime(2013, 4, 12, 0, 0), 4), (datetime.datetime(2013, 4, 13, 0, 0), 5), (datetime.datetime(2013, 4, 14, 0, 0), 6), (datetime.datetime(2013, 4, 15, 0, 0), 0), (datetime.datetime(2013, 4, 16, 0, 0), 1), (datetime.datetime(2013, 4, 17, 0, 0), 2), (datetime.datetime(2013, 4, 18, 0, 0), 3), (datetime.datetime(2013, 4, 19, 0, 0), 4), (datetime.datetime(2013, 4, 20, 0, 0), 5), (datetime.datetime(2013, 4, 21, 0, 0), 6), (datetime.datetime(2013, 4, 22, 0, 0), 0), (datetime.datetime(2013, 4, 23, 0, 0), 1), (datetime.datetime(2013, 4, 24, 0, 0), 2), (datetime.datetime(2013, 4, 25, 0, 0), 3), (datetime.datetime(2013, 4, 26, 0, 0), 4), (datetime.datetime(2013, 4, 27, 0, 0), 5), (datetime.datetime(2013, 4, 28, 0, 0), 6), (datetime.datetime(2013, 4, 29, 0, 0), 0), (datetime.datetime(2013, 4, 30, 0, 0), 1), (datetime.datetime(2013, 5, 1, 0, 0), 2), (datetime.datetime(2013, 5, 2, 0, 0), 3), (datetime.datetime(2013, 5, 3, 0, 0), 4), (datetime.datetime(2013, 5, 4, 0, 0), 5), (datetime.datetime(2013, 5, 5, 0, 0), 6), (datetime.datetime(2013, 5, 6, 0, 0), 0), (datetime.datetime(2013, 5, 7, 0, 0), 1), (datetime.datetime(2013, 5, 8, 0, 0), 2), (datetime.datetime(2013, 5, 9, 0, 0), 3), (datetime.datetime(2013, 5, 10, 0, 0), 4), (datetime.datetime(2013, 5, 11, 0, 0), 5), (datetime.datetime(2013, 5, 12, 0, 0), 6), (datetime.datetime(2013, 5, 13, 0, 0), 0), (datetime.datetime(2013, 5, 14, 0, 0), 1), (datetime.datetime(2013, 5, 15, 0, 0), 2), (datetime.datetime(2013, 5, 16, 0, 0), 3), (datetime.datetime(2013, 5, 17, 0, 0), 4), (datetime.datetime(2013, 5, 18, 0, 0), 5), (datetime.datetime(2013, 5, 19, 0, 0), 6), (datetime.datetime(2013, 5, 20, 0, 0), 0), (datetime.datetime(2013, 5, 21, 0, 0), 1), (datetime.datetime(2013, 5, 22, 0, 0), 2), (datetime.datetime(2013, 5, 23, 0, 0), 3), (datetime.datetime(2013, 5, 24, 0, 0), 4), (datetime.datetime(2013, 5, 25, 0, 0), 5), (datetime.datetime(2013, 5, 26, 0, 0), 6), (datetime.datetime(2013, 5, 27, 0, 0), 0), (datetime.datetime(2013, 5, 28, 0, 0), 1), (datetime.datetime(2013, 5, 29, 0, 0), 2), (datetime.datetime(2013, 5, 30, 0, 0), 3), (datetime.datetime(2013, 5, 31, 0, 0), 4), (datetime.datetime(2013, 6, 1, 0, 0), 5), (datetime.datetime(2013, 6, 2, 0, 0), 6), (datetime.datetime(2013, 6, 3, 0, 0), 0), (datetime.datetime(2013, 6, 4, 0, 0), 1), (datetime.datetime(2013, 6, 5, 0, 0), 2), (datetime.datetime(2013, 6, 6, 0, 0), 3), (datetime.datetime(2013, 6, 7, 0, 0), 4), (datetime.datetime(2013, 6, 8, 0, 0), 5), (datetime.datetime(2013, 6, 9, 0, 0), 6), (datetime.datetime(2013, 6, 10, 0, 0), 0), (datetime.datetime(2013, 6, 11, 0, 0), 1), (datetime.datetime(2013, 6, 12, 0, 0), 2), (datetime.datetime(2013, 6, 13, 0, 0), 3), (datetime.datetime(2013, 6, 14, 0, 0), 4), (datetime.datetime(2013, 6, 15, 0, 0), 5), (datetime.datetime(2013, 6, 16, 0, 0), 6), (datetime.datetime(2013, 6, 17, 0, 0), 0), (datetime.datetime(2013, 6, 18, 0, 0), 1), (datetime.datetime(2013, 6, 19, 0, 0), 2), (datetime.datetime(2013, 6, 20, 0, 0), 3), (datetime.datetime(2013, 6, 21, 0, 0), 4), (datetime.datetime(2013, 6, 22, 0, 0), 5), (datetime.datetime(2013, 6, 23, 0, 0), 6), (datetime.datetime(2013, 6, 24, 0, 0), 0), (datetime.datetime(2013, 6, 25, 0, 0), 1), (datetime.datetime(2013, 6, 26, 0, 0), 2), (datetime.datetime(2013, 6, 27, 0, 0), 3), (datetime.datetime(2013, 6, 28, 0, 0), 4), (datetime.datetime(2013, 6, 29, 0, 0), 5), (datetime.datetime(2013, 6, 30, 0, 0), 6), (datetime.datetime(2013, 7, 1, 0, 0), 0), (datetime.datetime(2013, 7, 2, 0, 0), 1), (datetime.datetime(2013, 7, 3, 0, 0), 2), (datetime.datetime(2013, 7, 4, 0, 0), 3), (datetime.datetime(2013, 7, 5, 0, 0), 4), (datetime.datetime(2013, 7, 6, 0, 0), 5), (datetime.datetime(2013, 7, 7, 0, 0), 6), (datetime.datetime(2013, 7, 8, 0, 0), 0), (datetime.datetime(2013, 7, 9, 0, 0), 1), (datetime.datetime(2013, 7, 10, 0, 0), 2), (datetime.datetime(2013, 7, 11, 0, 0), 3), (datetime.datetime(2013, 7, 12, 0, 0), 4), (datetime.datetime(2013, 7, 13, 0, 0), 5), (datetime.datetime(2013, 7, 14, 0, 0), 6), (datetime.datetime(2013, 7, 15, 0, 0), 0), (datetime.datetime(2013, 7, 16, 0, 0), 1), (datetime.datetime(2013, 7, 17, 0, 0), 2), (datetime.datetime(2013, 7, 18, 0, 0), 3), (datetime.datetime(2013, 7, 19, 0, 0), 4), (datetime.datetime(2013, 7, 20, 0, 0), 5), (datetime.datetime(2013, 7, 21, 0, 0), 6), (datetime.datetime(2013, 7, 22, 0, 0), 0), (datetime.datetime(2013, 7, 23, 0, 0), 1), (datetime.datetime(2013, 7, 24, 0, 0), 2), (datetime.datetime(2013, 7, 25, 0, 0), 3), (datetime.datetime(2013, 7, 26, 0, 0), 4), (datetime.datetime(2013, 7, 27, 0, 0), 5), (datetime.datetime(2013, 7, 28, 0, 0), 6), (datetime.datetime(2013, 7, 29, 0, 0), 0), (datetime.datetime(2013, 7, 30, 0, 0), 1), (datetime.datetime(2013, 7, 31, 0, 0), 2), (datetime.datetime(2013, 8, 1, 0, 0), 3), (datetime.datetime(2013, 8, 2, 0, 0), 4), (datetime.datetime(2013, 8, 3, 0, 0), 5), (datetime.datetime(2013, 8, 4, 0, 0), 6), (datetime.datetime(2013, 8, 5, 0, 0), 0), (datetime.datetime(2013, 8, 6, 0, 0), 1), (datetime.datetime(2013, 8, 7, 0, 0), 2), (datetime.datetime(2013, 8, 8, 0, 0), 3), (datetime.datetime(2013, 8, 9, 0, 0), 4), (datetime.datetime(2013, 8, 10, 0, 0), 5)]\n
\n

You can combine it with your existing list comprehension by creating an extra nested loop with one element:

\n
dateList = [(d, d.weekday()) for x in range(0,delta) for d in [base + timedelta(days=x)]]\n
\n soup wrap:

The datetime object itself has a .weekday() attribute. You can add these in a separate loop:

dateList = [(d, d.weekday()) for d in dateList]

For your example code, that gives:

[(datetime.datetime(2013, 2, 16, 0, 0), 5), (datetime.datetime(2013, 2, 17, 0, 0), 6), (datetime.datetime(2013, 2, 18, 0, 0), 0), (datetime.datetime(2013, 2, 19, 0, 0), 1), (datetime.datetime(2013, 2, 20, 0, 0), 2), (datetime.datetime(2013, 2, 21, 0, 0), 3), (datetime.datetime(2013, 2, 22, 0, 0), 4), (datetime.datetime(2013, 2, 23, 0, 0), 5), (datetime.datetime(2013, 2, 24, 0, 0), 6), (datetime.datetime(2013, 2, 25, 0, 0), 0), (datetime.datetime(2013, 2, 26, 0, 0), 1), (datetime.datetime(2013, 2, 27, 0, 0), 2), (datetime.datetime(2013, 2, 28, 0, 0), 3), (datetime.datetime(2013, 3, 1, 0, 0), 4), (datetime.datetime(2013, 3, 2, 0, 0), 5), (datetime.datetime(2013, 3, 3, 0, 0), 6), (datetime.datetime(2013, 3, 4, 0, 0), 0), (datetime.datetime(2013, 3, 5, 0, 0), 1), (datetime.datetime(2013, 3, 6, 0, 0), 2), (datetime.datetime(2013, 3, 7, 0, 0), 3), (datetime.datetime(2013, 3, 8, 0, 0), 4), (datetime.datetime(2013, 3, 9, 0, 0), 5), (datetime.datetime(2013, 3, 10, 0, 0), 6), (datetime.datetime(2013, 3, 11, 0, 0), 0), (datetime.datetime(2013, 3, 12, 0, 0), 1), (datetime.datetime(2013, 3, 13, 0, 0), 2), (datetime.datetime(2013, 3, 14, 0, 0), 3), (datetime.datetime(2013, 3, 15, 0, 0), 4), (datetime.datetime(2013, 3, 16, 0, 0), 5), (datetime.datetime(2013, 3, 17, 0, 0), 6), (datetime.datetime(2013, 3, 18, 0, 0), 0), (datetime.datetime(2013, 3, 19, 0, 0), 1), (datetime.datetime(2013, 3, 20, 0, 0), 2), (datetime.datetime(2013, 3, 21, 0, 0), 3), (datetime.datetime(2013, 3, 22, 0, 0), 4), (datetime.datetime(2013, 3, 23, 0, 0), 5), (datetime.datetime(2013, 3, 24, 0, 0), 6), (datetime.datetime(2013, 3, 25, 0, 0), 0), (datetime.datetime(2013, 3, 26, 0, 0), 1), (datetime.datetime(2013, 3, 27, 0, 0), 2), (datetime.datetime(2013, 3, 28, 0, 0), 3), (datetime.datetime(2013, 3, 29, 0, 0), 4), (datetime.datetime(2013, 3, 30, 0, 0), 5), (datetime.datetime(2013, 3, 31, 0, 0), 6), (datetime.datetime(2013, 4, 1, 0, 0), 0), (datetime.datetime(2013, 4, 2, 0, 0), 1), (datetime.datetime(2013, 4, 3, 0, 0), 2), (datetime.datetime(2013, 4, 4, 0, 0), 3), (datetime.datetime(2013, 4, 5, 0, 0), 4), (datetime.datetime(2013, 4, 6, 0, 0), 5), (datetime.datetime(2013, 4, 7, 0, 0), 6), (datetime.datetime(2013, 4, 8, 0, 0), 0), (datetime.datetime(2013, 4, 9, 0, 0), 1), (datetime.datetime(2013, 4, 10, 0, 0), 2), (datetime.datetime(2013, 4, 11, 0, 0), 3), (datetime.datetime(2013, 4, 12, 0, 0), 4), (datetime.datetime(2013, 4, 13, 0, 0), 5), (datetime.datetime(2013, 4, 14, 0, 0), 6), (datetime.datetime(2013, 4, 15, 0, 0), 0), (datetime.datetime(2013, 4, 16, 0, 0), 1), (datetime.datetime(2013, 4, 17, 0, 0), 2), (datetime.datetime(2013, 4, 18, 0, 0), 3), (datetime.datetime(2013, 4, 19, 0, 0), 4), (datetime.datetime(2013, 4, 20, 0, 0), 5), (datetime.datetime(2013, 4, 21, 0, 0), 6), (datetime.datetime(2013, 4, 22, 0, 0), 0), (datetime.datetime(2013, 4, 23, 0, 0), 1), (datetime.datetime(2013, 4, 24, 0, 0), 2), (datetime.datetime(2013, 4, 25, 0, 0), 3), (datetime.datetime(2013, 4, 26, 0, 0), 4), (datetime.datetime(2013, 4, 27, 0, 0), 5), (datetime.datetime(2013, 4, 28, 0, 0), 6), (datetime.datetime(2013, 4, 29, 0, 0), 0), (datetime.datetime(2013, 4, 30, 0, 0), 1), (datetime.datetime(2013, 5, 1, 0, 0), 2), (datetime.datetime(2013, 5, 2, 0, 0), 3), (datetime.datetime(2013, 5, 3, 0, 0), 4), (datetime.datetime(2013, 5, 4, 0, 0), 5), (datetime.datetime(2013, 5, 5, 0, 0), 6), (datetime.datetime(2013, 5, 6, 0, 0), 0), (datetime.datetime(2013, 5, 7, 0, 0), 1), (datetime.datetime(2013, 5, 8, 0, 0), 2), (datetime.datetime(2013, 5, 9, 0, 0), 3), (datetime.datetime(2013, 5, 10, 0, 0), 4), (datetime.datetime(2013, 5, 11, 0, 0), 5), (datetime.datetime(2013, 5, 12, 0, 0), 6), (datetime.datetime(2013, 5, 13, 0, 0), 0), (datetime.datetime(2013, 5, 14, 0, 0), 1), (datetime.datetime(2013, 5, 15, 0, 0), 2), (datetime.datetime(2013, 5, 16, 0, 0), 3), (datetime.datetime(2013, 5, 17, 0, 0), 4), (datetime.datetime(2013, 5, 18, 0, 0), 5), (datetime.datetime(2013, 5, 19, 0, 0), 6), (datetime.datetime(2013, 5, 20, 0, 0), 0), (datetime.datetime(2013, 5, 21, 0, 0), 1), (datetime.datetime(2013, 5, 22, 0, 0), 2), (datetime.datetime(2013, 5, 23, 0, 0), 3), (datetime.datetime(2013, 5, 24, 0, 0), 4), (datetime.datetime(2013, 5, 25, 0, 0), 5), (datetime.datetime(2013, 5, 26, 0, 0), 6), (datetime.datetime(2013, 5, 27, 0, 0), 0), (datetime.datetime(2013, 5, 28, 0, 0), 1), (datetime.datetime(2013, 5, 29, 0, 0), 2), (datetime.datetime(2013, 5, 30, 0, 0), 3), (datetime.datetime(2013, 5, 31, 0, 0), 4), (datetime.datetime(2013, 6, 1, 0, 0), 5), (datetime.datetime(2013, 6, 2, 0, 0), 6), (datetime.datetime(2013, 6, 3, 0, 0), 0), (datetime.datetime(2013, 6, 4, 0, 0), 1), (datetime.datetime(2013, 6, 5, 0, 0), 2), (datetime.datetime(2013, 6, 6, 0, 0), 3), (datetime.datetime(2013, 6, 7, 0, 0), 4), (datetime.datetime(2013, 6, 8, 0, 0), 5), (datetime.datetime(2013, 6, 9, 0, 0), 6), (datetime.datetime(2013, 6, 10, 0, 0), 0), (datetime.datetime(2013, 6, 11, 0, 0), 1), (datetime.datetime(2013, 6, 12, 0, 0), 2), (datetime.datetime(2013, 6, 13, 0, 0), 3), (datetime.datetime(2013, 6, 14, 0, 0), 4), (datetime.datetime(2013, 6, 15, 0, 0), 5), (datetime.datetime(2013, 6, 16, 0, 0), 6), (datetime.datetime(2013, 6, 17, 0, 0), 0), (datetime.datetime(2013, 6, 18, 0, 0), 1), (datetime.datetime(2013, 6, 19, 0, 0), 2), (datetime.datetime(2013, 6, 20, 0, 0), 3), (datetime.datetime(2013, 6, 21, 0, 0), 4), (datetime.datetime(2013, 6, 22, 0, 0), 5), (datetime.datetime(2013, 6, 23, 0, 0), 6), (datetime.datetime(2013, 6, 24, 0, 0), 0), (datetime.datetime(2013, 6, 25, 0, 0), 1), (datetime.datetime(2013, 6, 26, 0, 0), 2), (datetime.datetime(2013, 6, 27, 0, 0), 3), (datetime.datetime(2013, 6, 28, 0, 0), 4), (datetime.datetime(2013, 6, 29, 0, 0), 5), (datetime.datetime(2013, 6, 30, 0, 0), 6), (datetime.datetime(2013, 7, 1, 0, 0), 0), (datetime.datetime(2013, 7, 2, 0, 0), 1), (datetime.datetime(2013, 7, 3, 0, 0), 2), (datetime.datetime(2013, 7, 4, 0, 0), 3), (datetime.datetime(2013, 7, 5, 0, 0), 4), (datetime.datetime(2013, 7, 6, 0, 0), 5), (datetime.datetime(2013, 7, 7, 0, 0), 6), (datetime.datetime(2013, 7, 8, 0, 0), 0), (datetime.datetime(2013, 7, 9, 0, 0), 1), (datetime.datetime(2013, 7, 10, 0, 0), 2), (datetime.datetime(2013, 7, 11, 0, 0), 3), (datetime.datetime(2013, 7, 12, 0, 0), 4), (datetime.datetime(2013, 7, 13, 0, 0), 5), (datetime.datetime(2013, 7, 14, 0, 0), 6), (datetime.datetime(2013, 7, 15, 0, 0), 0), (datetime.datetime(2013, 7, 16, 0, 0), 1), (datetime.datetime(2013, 7, 17, 0, 0), 2), (datetime.datetime(2013, 7, 18, 0, 0), 3), (datetime.datetime(2013, 7, 19, 0, 0), 4), (datetime.datetime(2013, 7, 20, 0, 0), 5), (datetime.datetime(2013, 7, 21, 0, 0), 6), (datetime.datetime(2013, 7, 22, 0, 0), 0), (datetime.datetime(2013, 7, 23, 0, 0), 1), (datetime.datetime(2013, 7, 24, 0, 0), 2), (datetime.datetime(2013, 7, 25, 0, 0), 3), (datetime.datetime(2013, 7, 26, 0, 0), 4), (datetime.datetime(2013, 7, 27, 0, 0), 5), (datetime.datetime(2013, 7, 28, 0, 0), 6), (datetime.datetime(2013, 7, 29, 0, 0), 0), (datetime.datetime(2013, 7, 30, 0, 0), 1), (datetime.datetime(2013, 7, 31, 0, 0), 2), (datetime.datetime(2013, 8, 1, 0, 0), 3), (datetime.datetime(2013, 8, 2, 0, 0), 4), (datetime.datetime(2013, 8, 3, 0, 0), 5), (datetime.datetime(2013, 8, 4, 0, 0), 6), (datetime.datetime(2013, 8, 5, 0, 0), 0), (datetime.datetime(2013, 8, 6, 0, 0), 1), (datetime.datetime(2013, 8, 7, 0, 0), 2), (datetime.datetime(2013, 8, 8, 0, 0), 3), (datetime.datetime(2013, 8, 9, 0, 0), 4), (datetime.datetime(2013, 8, 10, 0, 0), 5)]

You can combine it with your existing list comprehension by creating an extra nested loop with one element:

dateList = [(d, d.weekday()) for x in range(0,delta) for d in [base + timedelta(days=x)]]
qid & accept id: (17023994, 17024209) query: How to associate some value in model with ForeignKey? soup:
class Product(models.Model):\n    currencies = models.ManyToManyField('Currency', through='Pricing', blank=True, null=True)\n\nclass Currency(models.Model):\n    name = models.CharField()\n    sign = models.CharField()\n\nclass Pricing(models.Model):\n    product = models.ForeignKey(Product)\n    currency = models.ForeignKey(Currency)\n    price = models.FloatField()\n
\n

and then you can use something like

\n
product = Product.objects.get(name='Cactus')\nprice = product.pricing_set.get(currency__name='USD')\n
\n soup wrap:
class Product(models.Model):
    currencies = models.ManyToManyField('Currency', through='Pricing', blank=True, null=True)

class Currency(models.Model):
    name = models.CharField()
    sign = models.CharField()

class Pricing(models.Model):
    product = models.ForeignKey(Product)
    currency = models.ForeignKey(Currency)
    price = models.FloatField()

and then you can use something like

product = Product.objects.get(name='Cactus')
price = product.pricing_set.get(currency__name='USD')
qid & accept id: (17058504, 17058536) query: Accept newline character in python soup:

EDIT: This answer assumes that you are using Python 3. If you are using Python 2.x, use the one provided by Anthon.

\n

Do something like this:

\n
text = ''\nwhile True: # change this condition.\n    text += input('''Enter the paragraph :''')+'\n' #UPDATED. Appended a \n character.\n
\n

For example, you want to end the input sequence by an extra newline character, then the code would be:

\n
text = ''\nwhile True:\n    dummy = input('''Enter the paragraph :''')+'\n'\n    if dummy=='\n':\n        break\n    text += dummy\n
\n soup wrap:

EDIT: This answer assumes that you are using Python 3. If you are using Python 2.x, use the one provided by Anthon.

Do something like this:

text = ''
while True: # change this condition.
    text += input('''Enter the paragraph :''')+'\n' #UPDATED. Appended a \n character.

For example, you want to end the input sequence by an extra newline character, then the code would be:

text = ''
while True:
    dummy = input('''Enter the paragraph :''')+'\n'
    if dummy=='\n':
        break
    text += dummy
qid & accept id: (17065086, 17065634) query: How to get the caller class name inside a function of another class in python? soup:

Well, after some digging at the prompt, here's what I get:

\n
stack = inspect.stack()\nthe_class = stack[1][0].f_locals["self"].__class__\nthe_method = stack[1][0].f_code.co_name\n\nprint("I was called by {}.{}()".format(str(calling_class), calling_code_name))\n# => I was called by A.a()\n
\n

When invoked:

\n
➤ python test.py\nA.a()\nB.b()\n  I was called by __main__.A.a()\n
\n

given the file test.py:

\n
import inspect\n\nclass A:\n  def a(self):\n    print("A.a()")\n    B().b()\n\nclass B:\n  def b(self):\n    print("B.b()")\n    stack = inspect.stack()\n    the_class = stack[1][0].f_locals["self"].__class__\n    the_method = stack[1][0].f_code.co_name\n    print("  I was called by {}.{}()".format(str(the_class), the_method))\n\nA().a()\n
\n

Not sure how it will behave when called from something other than an object.

\n soup wrap:

Well, after some digging at the prompt, here's what I get:

stack = inspect.stack()
the_class = stack[1][0].f_locals["self"].__class__
the_method = stack[1][0].f_code.co_name

print("I was called by {}.{}()".format(str(calling_class), calling_code_name))
# => I was called by A.a()

When invoked:

➤ python test.py
A.a()
B.b()
  I was called by __main__.A.a()

given the file test.py:

import inspect

class A:
  def a(self):
    print("A.a()")
    B().b()

class B:
  def b(self):
    print("B.b()")
    stack = inspect.stack()
    the_class = stack[1][0].f_locals["self"].__class__
    the_method = stack[1][0].f_code.co_name
    print("  I was called by {}.{}()".format(str(the_class), the_method))

A().a()

Not sure how it will behave when called from something other than an object.

qid & accept id: (17065977, 17066014) query: Inversing a twodimensional array in python soup:

Do you want something like:

\n
tempArray = [list(reversed(x)) for x in reversed(self.topArea)]\n
\n

If everything is lists, you could also do:

\n
tempArray = [x[::-1] for x in reversed(self.topArea)]\n
\n

for a possible speed boost.

\n soup wrap:

Do you want something like:

tempArray = [list(reversed(x)) for x in reversed(self.topArea)]

If everything is lists, you could also do:

tempArray = [x[::-1] for x in reversed(self.topArea)]

for a possible speed boost.

qid & accept id: (17077403, 17077950) query: Making id case-insensitive but case-preserving in endpoints-proto-datastore soup:

Your setter is being called, and it is causing the behavior you don't want. Tearing it down:

\n
def IdSet(self, value):\n    if not isinstance(value, basestring):\n        raise TypeError('ID must be a string.')\n    self.caseful_id = value\n    self.UpdateFromKey(ndb.Key(self.__class__, value.lower()))\n
\n

since you call

\n
self.caseful_id = value\n
\n

before UpdateFromKey, the caseful_id will always be the one from the most recent request.

\n

Bear in mind that UpdateFromKey tries to retrieve an entity by that key and then patches in any missing data from the entity stored in the datastore (and also sets from_datastore to True).

\n

Since you set the caseful_id field before UpdateFromKey, there is no missing data. Instead, you could do this to set the value if not already set (and so this wouldn't be relevant for your 'GET' method):

\n
def IdSet(self, value):\n    if not isinstance(value, basestring):\n        raise TypeError('ID must be a string.')\n    self.UpdateFromKey(ndb.Key(self.__class__, value.lower()))\n    if self.caseful_id is None:\n        self.caseful_id = value\n
\n soup wrap:

Your setter is being called, and it is causing the behavior you don't want. Tearing it down:

def IdSet(self, value):
    if not isinstance(value, basestring):
        raise TypeError('ID must be a string.')
    self.caseful_id = value
    self.UpdateFromKey(ndb.Key(self.__class__, value.lower()))

since you call

self.caseful_id = value

before UpdateFromKey, the caseful_id will always be the one from the most recent request.

Bear in mind that UpdateFromKey tries to retrieve an entity by that key and then patches in any missing data from the entity stored in the datastore (and also sets from_datastore to True).

Since you set the caseful_id field before UpdateFromKey, there is no missing data. Instead, you could do this to set the value if not already set (and so this wouldn't be relevant for your 'GET' method):

def IdSet(self, value):
    if not isinstance(value, basestring):
        raise TypeError('ID must be a string.')
    self.UpdateFromKey(ndb.Key(self.__class__, value.lower()))
    if self.caseful_id is None:
        self.caseful_id = value
qid & accept id: (17086278, 17086340) query: how to iterate over all files in path? soup:

Two simple ways might be:

\n
import os\npath = "c:\\Python27\\test"\n\nfor name in os.listdir(path):\n    if name.endswith('.txt'):\n        fpath = os.path.join(path, name)\n        with open(fpath) as fin:\n            print fpath, 'opened'\n
\n

or

\n
import glob\npath = "c:\\Python27\\test"\n\nfor fpath in glob.glob(os.path.join(path, '*.txt')):\n    with open(fpath) as fin:\n        print fpath, 'opened'\n
\n

The reason is that open() must get a valid file name. The * stuff is syntactic sugar which must be dealt with separately.

\n soup wrap:

Two simple ways might be:

import os
path = "c:\\Python27\\test"

for name in os.listdir(path):
    if name.endswith('.txt'):
        fpath = os.path.join(path, name)
        with open(fpath) as fin:
            print fpath, 'opened'

or

import glob
path = "c:\\Python27\\test"

for fpath in glob.glob(os.path.join(path, '*.txt')):
    with open(fpath) as fin:
        print fpath, 'opened'

The reason is that open() must get a valid file name. The * stuff is syntactic sugar which must be dealt with separately.

qid & accept id: (17103701, 17103769) query: Finding common elements from two lists of lists soup:

Convert the innermost lists of b into a set(s), and then iterate over a to check whether any item in a exist in s or not.

\n
tot_items_b = sum(1 for x in b for y in x) #total items in b\n
\n

Sets provide an O(1) lookup, so the overall complexity is going to be :

\n

O(max(len(a), tot_items_b))

\n
def func(a, b):\n   #sets can't contain mutable items, so convert lists to tuple while storing\n\n   s = set(tuple(y) for x in b for y in x)\n   #s is set([(41, 2, 34), (98, 23, 56), (42, 25, 64),...])\n\n   return any(tuple(item) in s for item in a)\n
\n

Demo:

\n
>>> a = [[1, 2, 3], [4, 5, 6], [4, 2, 3]]\n>>> b = [[[11, 22, 3], [12, 34, 6], [41, 2, 34], [198, 213, 536], [1198, 1123, 1156]], [[11, 22, 3], [42, 25, 64], [43, 45, 23]], [[3, 532, 23], [4, 5, 6], [98, 23, 56], [918, 231, 526]]]\n>>> func(a,b)\nTrue\n
\n

Help on any:

\n
>>> print any.__doc__\nany(iterable) -> bool\n\nReturn True if bool(x) is True for any x in the iterable.\nIf the iterable is empty, return False.\n
\n

Use set intersection to get all the common elements:

\n
>>> s_b = set(tuple(y) for x in b for y in x)\n>>> s_a = set(tuple(x) for x in a)\n>>> s_a & s_b\nset([(4, 5, 6)])\n
\n soup wrap:

Convert the innermost lists of b into a set(s), and then iterate over a to check whether any item in a exist in s or not.

tot_items_b = sum(1 for x in b for y in x) #total items in b

Sets provide an O(1) lookup, so the overall complexity is going to be :

O(max(len(a), tot_items_b))

def func(a, b):
   #sets can't contain mutable items, so convert lists to tuple while storing

   s = set(tuple(y) for x in b for y in x)
   #s is set([(41, 2, 34), (98, 23, 56), (42, 25, 64),...])

   return any(tuple(item) in s for item in a)

Demo:

>>> a = [[1, 2, 3], [4, 5, 6], [4, 2, 3]]
>>> b = [[[11, 22, 3], [12, 34, 6], [41, 2, 34], [198, 213, 536], [1198, 1123, 1156]], [[11, 22, 3], [42, 25, 64], [43, 45, 23]], [[3, 532, 23], [4, 5, 6], [98, 23, 56], [918, 231, 526]]]
>>> func(a,b)
True

Help on any:

>>> print any.__doc__
any(iterable) -> bool

Return True if bool(x) is True for any x in the iterable.
If the iterable is empty, return False.

Use set intersection to get all the common elements:

>>> s_b = set(tuple(y) for x in b for y in x)
>>> s_a = set(tuple(x) for x in a)
>>> s_a & s_b
set([(4, 5, 6)])
qid & accept id: (17122268, 17122544) query: (python) How to create static text in curses soup:

Here is an example that shows some static text in red(that always stays on top):

\n
import sys\nimport curses\n\n\ncurses.initscr()\n\nif not curses.has_colors():\n    curses.endwin()\n    print "no colors"\n    sys.exit()\nelse:\n    curses.start_color()\n\ncurses.noecho()    # don't echo the keys on the screen\ncurses.cbreak()    # don't wait enter for input\ncurses.curs_set(0) # don't show cursor.\n\nRED_TEXT = 1\ncurses.init_pair(RED_TEXT, curses.COLOR_RED, curses.COLOR_BLACK)\n\nwindow = curses.newwin(20, 20, 0, 0)\nwindow.box()\nstaticwin = curses.newwin(5, 10, 1, 1)\nstaticwin.box()\n\nstaticwin.addstr(1, 1, "test", curses.color_pair(RED_TEXT))\n\ncur_x = 10\ncur_y = 10\nwhile True:\n    window.addch(cur_y, cur_x, '@')\n    window.refresh()\n    staticwin.box()\n    staticwin.refresh()\n    inchar = window.getch()\n    window.addch(cur_y, cur_x, ' ')\n    # W,A,S,D used to move around the @\n    if inchar == ord('w'):\n        cur_y -= 1\n    elif inchar == ord('a'):\n        cur_x -= 1\n    elif inchar == ord('d'):\n        cur_x += 1\n    elif inchar == ord('s'):\n        cur_y += 1\n    elif inchar == ord('q'):\n        break\ncurses.endwin()\n
\n

A screenshot of the result:

\n

enter image description here

\n

Remember that windows on top must be refresh()ed last otherwise the windows that should go below are drawn over them.

\n

If you want to change the static text do:

\n
staticwin.clear()   #clean the window\nstaticwin.addstr(1, 1, "insert-text-here", curses.color_pair(RED_TEXT))\nstaticwin.box()     #re-draw the box\nstaticwin.refresh()\n
\n

Where the 1, 1 means to start writing from the second character of the second line(remember: coordinates start at 0). This is needed since the window's box is drawn on the first line and first column.

\n soup wrap:

Here is an example that shows some static text in red(that always stays on top):

import sys
import curses


curses.initscr()

if not curses.has_colors():
    curses.endwin()
    print "no colors"
    sys.exit()
else:
    curses.start_color()

curses.noecho()    # don't echo the keys on the screen
curses.cbreak()    # don't wait enter for input
curses.curs_set(0) # don't show cursor.

RED_TEXT = 1
curses.init_pair(RED_TEXT, curses.COLOR_RED, curses.COLOR_BLACK)

window = curses.newwin(20, 20, 0, 0)
window.box()
staticwin = curses.newwin(5, 10, 1, 1)
staticwin.box()

staticwin.addstr(1, 1, "test", curses.color_pair(RED_TEXT))

cur_x = 10
cur_y = 10
while True:
    window.addch(cur_y, cur_x, '@')
    window.refresh()
    staticwin.box()
    staticwin.refresh()
    inchar = window.getch()
    window.addch(cur_y, cur_x, ' ')
    # W,A,S,D used to move around the @
    if inchar == ord('w'):
        cur_y -= 1
    elif inchar == ord('a'):
        cur_x -= 1
    elif inchar == ord('d'):
        cur_x += 1
    elif inchar == ord('s'):
        cur_y += 1
    elif inchar == ord('q'):
        break
curses.endwin()

A screenshot of the result:

enter image description here

Remember that windows on top must be refresh()ed last otherwise the windows that should go below are drawn over them.

If you want to change the static text do:

staticwin.clear()   #clean the window
staticwin.addstr(1, 1, "insert-text-here", curses.color_pair(RED_TEXT))
staticwin.box()     #re-draw the box
staticwin.refresh()

Where the 1, 1 means to start writing from the second character of the second line(remember: coordinates start at 0). This is needed since the window's box is drawn on the first line and first column.

qid & accept id: (17144809, 17145197) query: Translating regex match groups soup:

I'd use finditer() with a wrapper generator:

\n
import re\nfrom functools import partial\n\ndef _hexrepl(match):\n    return chr(int(match.group(1), 16))\nunescape = partial(re.compile(r'#([0-9A-F]{2})').sub, _hexrepl)\n\ndef pdfnames(inputtext):\n    for match in Name.finditer(inputtext):\n        yield unescape(match.group(0))\n
\n

Demo:

\n
>>> for name in pdfnames(names):\n...     print name\n... \n/Adobe Green\n/PANTONE 5757 CV\n/paired()parentheses\n/The_Key_of_F#_Minor\n/AB\n/Name1\n/ASomewhatLongerName\n/A;Name_With-Various***Characters?\n/1.2\n/$$\n/@pattern\n/.notdef\n
\n

There is no more clever way that I know of; the re engine cannot otherwise combine substitution and matching.

\n soup wrap:

I'd use finditer() with a wrapper generator:

import re
from functools import partial

def _hexrepl(match):
    return chr(int(match.group(1), 16))
unescape = partial(re.compile(r'#([0-9A-F]{2})').sub, _hexrepl)

def pdfnames(inputtext):
    for match in Name.finditer(inputtext):
        yield unescape(match.group(0))

Demo:

>>> for name in pdfnames(names):
...     print name
... 
/Adobe Green
/PANTONE 5757 CV
/paired()parentheses
/The_Key_of_F#_Minor
/AB
/Name1
/ASomewhatLongerName
/A;Name_With-Various***Characters?
/1.2
/$$
/@pattern
/.notdef

There is no more clever way that I know of; the re engine cannot otherwise combine substitution and matching.

qid & accept id: (17163234, 17164833) query: How to find shortest path for raw data soup:

This is a classic Breadth First Search problem, where you have an undirected, unweighted graph and you want to find the shortest path between 2 vertices.

\n

Some helpful links on Breadth First Search:

\n\n

Some edge cases that you have to take note of:

\n
    \n
  • No path between the source and destination vertices
  • \n
  • Source and destination are the same vertex
  • \n
\n

I'll suppose that your edge list is a dictionary of lists or a list of lists, eg.

\n
[[4191, 949], [3002, 4028, 957], [2494, 959, 3011], [4243, 965], [1478], ...]\n
\n

Or

\n
{ 0: [4191, 949],\n  1: [3002, 4028, 957],\n  2: [2494, 959, 3011],\n  3: [4243, 965],\n  4: [1478], ...}\n
\n

I've written some code to show how the breadth first search works:

\n
import sys\nimport sys\nimport Queue\n\ndef get_shortest_path(par, src, dest):\n    '''\n    Returns the shortest path as a list of integers\n    par - parent information\n    src - source vertex\n    dest - destination vertex\n    '''\n    if dest == src:\n        return [src]\n    else:\n        ret = get_shortest_path(par, src, par[dest])\n        ret.append(dest)\n        return ret\n\ndef bfs(edgeList, src, dest):\n    '''\n    Breadth first search routine. Returns (distance, shortestPath) pair from src to dest. Returns (-1, []) if there is no path from src to dest\n    edgeList - adjacency list of graph. Either list of lists or dict of lists\n    src - source vertex\n    dest - destination vertex\n    '''\n    vis = set() # stores the vertices that have been visited\n    par = {} # stores parent information. vertex -> parent vertex in BFS tree\n    distDict = {} # stores distance of visited vertices from the source. This is the number of edges between the source vertex and the given vertex\n    q = Queue.Queue()\n    q.put((src, 0)) # enqueue (source, distance) pair\n    par[src] = -1 # source has no parent\n    vis.add(src) # minor technicality, will explain later\n    while not q.empty():\n        (v,dist) = q.get() # grab vertex in queue\n        distDict[v] = dist # update the distance\n        if v == dest:\n            break # reached destination, done\n        nextDist = dist+1\n        for nextV in edgeList[v]:\n            # visit vertices adjacent to the current vertex\n            if nextV not in vis:\n                # not yet visited\n                par[nextV] = v # update parent of nextV to v\n                q.put((nextV, nextDist)) # add into queeu\n                vis.add(nextV) # mark as visited\n    # obtained shortest path now\n    if dest in distDict:\n        return (distDict[dest], get_shortest_path(par, src, dest))\n    else:\n        return (-1, []) # no shortest path\n\n# example run, feel free to remove this\nif __name__ == '__main__':\n    edgeList = {\n        0: [6,],\n        1: [2, 7],\n        2: [1, 3, 6],\n        3: [2, 4, 5],\n        4: [3, 8],\n        5: [3, 7],\n        6: [0, 2],\n        7: [1, 5],\n        8: [4],\n    }\n    while True:\n        src = int(sys.stdin.readline())\n        dest = int(sys.stdin.readline())\n        (dist, shortest_path) = bfs(edgeList, src, dest)\n        print 'dist =', dist\n        print 'shortest_path =', shortest_path\n
\n soup wrap:

This is a classic Breadth First Search problem, where you have an undirected, unweighted graph and you want to find the shortest path between 2 vertices.

Some helpful links on Breadth First Search:

Some edge cases that you have to take note of:

  • No path between the source and destination vertices
  • Source and destination are the same vertex

I'll suppose that your edge list is a dictionary of lists or a list of lists, eg.

[[4191, 949], [3002, 4028, 957], [2494, 959, 3011], [4243, 965], [1478], ...]

Or

{ 0: [4191, 949],
  1: [3002, 4028, 957],
  2: [2494, 959, 3011],
  3: [4243, 965],
  4: [1478], ...}

I've written some code to show how the breadth first search works:

import sys
import sys
import Queue

def get_shortest_path(par, src, dest):
    '''
    Returns the shortest path as a list of integers
    par - parent information
    src - source vertex
    dest - destination vertex
    '''
    if dest == src:
        return [src]
    else:
        ret = get_shortest_path(par, src, par[dest])
        ret.append(dest)
        return ret

def bfs(edgeList, src, dest):
    '''
    Breadth first search routine. Returns (distance, shortestPath) pair from src to dest. Returns (-1, []) if there is no path from src to dest
    edgeList - adjacency list of graph. Either list of lists or dict of lists
    src - source vertex
    dest - destination vertex
    '''
    vis = set() # stores the vertices that have been visited
    par = {} # stores parent information. vertex -> parent vertex in BFS tree
    distDict = {} # stores distance of visited vertices from the source. This is the number of edges between the source vertex and the given vertex
    q = Queue.Queue()
    q.put((src, 0)) # enqueue (source, distance) pair
    par[src] = -1 # source has no parent
    vis.add(src) # minor technicality, will explain later
    while not q.empty():
        (v,dist) = q.get() # grab vertex in queue
        distDict[v] = dist # update the distance
        if v == dest:
            break # reached destination, done
        nextDist = dist+1
        for nextV in edgeList[v]:
            # visit vertices adjacent to the current vertex
            if nextV not in vis:
                # not yet visited
                par[nextV] = v # update parent of nextV to v
                q.put((nextV, nextDist)) # add into queeu
                vis.add(nextV) # mark as visited
    # obtained shortest path now
    if dest in distDict:
        return (distDict[dest], get_shortest_path(par, src, dest))
    else:
        return (-1, []) # no shortest path

# example run, feel free to remove this
if __name__ == '__main__':
    edgeList = {
        0: [6,],
        1: [2, 7],
        2: [1, 3, 6],
        3: [2, 4, 5],
        4: [3, 8],
        5: [3, 7],
        6: [0, 2],
        7: [1, 5],
        8: [4],
    }
    while True:
        src = int(sys.stdin.readline())
        dest = int(sys.stdin.readline())
        (dist, shortest_path) = bfs(edgeList, src, dest)
        print 'dist =', dist
        print 'shortest_path =', shortest_path
qid & accept id: (17167297, 17167437) query: Convert this python dictionary into JSON format? soup:

You need to first create a structure with the correct format:

\n
import json\n\ndict_ = {"20090209.02s1.1_sequence.txt": [645045714, 3559.6422951221466, 206045184], "20090209.02s1.2_sequence.txt": [645045714, 3543.8322949409485, 234618880]}\nvalues = [{"file_name": k, "file_information": v} for k, v in dict_.items()]\njson.dumps(values, indent=4)\n
\n

Note that the desired json output does not looks valud to me. Here's the output for this code:

\n
[\n    {\n        "file_name": "20090209.02s1.1_sequence.txt", \n        "file_information": [\n            645045714, \n            3559.6422951221466, \n            206045184\n        ]\n    }, \n    {\n        "file_name": "20090209.02s1.2_sequence.txt", \n        "file_information": [\n            645045714, \n            3543.8322949409485, \n            234618880\n        ]\n    }\n]\n
\n soup wrap:

You need to first create a structure with the correct format:

import json

dict_ = {"20090209.02s1.1_sequence.txt": [645045714, 3559.6422951221466, 206045184], "20090209.02s1.2_sequence.txt": [645045714, 3543.8322949409485, 234618880]}
values = [{"file_name": k, "file_information": v} for k, v in dict_.items()]
json.dumps(values, indent=4)

Note that the desired json output does not looks valud to me. Here's the output for this code:

[
    {
        "file_name": "20090209.02s1.1_sequence.txt", 
        "file_information": [
            645045714, 
            3559.6422951221466, 
            206045184
        ]
    }, 
    {
        "file_name": "20090209.02s1.2_sequence.txt", 
        "file_information": [
            645045714, 
            3543.8322949409485, 
            234618880
        ]
    }
]
qid & accept id: (17176270, 17176348) query: Finding All Defined Functions in Python Environment soup:

You can use inspect module:

\n
import inspect\nimport sys\n\n\ndef test():\n    pass\n\nfunctions = [name for name, obj in inspect.getmembers(sys.modules[__name__], inspect.isfunction)]\nprint functions\n
\n

prints:

\n
['test']\n
\n soup wrap:

You can use inspect module:

import inspect
import sys


def test():
    pass

functions = [name for name, obj in inspect.getmembers(sys.modules[__name__], inspect.isfunction)]
print functions

prints:

['test']
qid & accept id: (17203403, 17205759) query: student t confidence interval in python soup:

I guess you could use scipy.stats.t and it's interval method:

\n
In [1]: from scipy.stats import t\nIn [2]: t.interval(0.95, 10, loc=1, scale=2)  # 95% confidence interval\nOut[2]: (-3.4562777039298762, 5.4562777039298762)\nIn [3]: t.interval(0.99, 10, loc=1, scale=2)  # 99% confidence interval\nOut[3]: (-5.338545334351676, 7.338545334351676)\n
\n

Sure, you can make your own function if you like. Let's make it look like in Mathematica:

\n
from scipy.stats import t\n\n\ndef StudentTCI(loc, scale, df, alpha=0.95):\n    return t.interval(alpha, df, loc, scale)\n\nprint StudentTCI(1, 2, 10)\nprint StudentTCI(1, 2, 10, 0.99)\n
\n

Result:

\n
(-3.4562777039298762, 5.4562777039298762)\n(-5.338545334351676, 7.338545334351676)\n
\n soup wrap:

I guess you could use scipy.stats.t and it's interval method:

In [1]: from scipy.stats import t
In [2]: t.interval(0.95, 10, loc=1, scale=2)  # 95% confidence interval
Out[2]: (-3.4562777039298762, 5.4562777039298762)
In [3]: t.interval(0.99, 10, loc=1, scale=2)  # 99% confidence interval
Out[3]: (-5.338545334351676, 7.338545334351676)

Sure, you can make your own function if you like. Let's make it look like in Mathematica:

from scipy.stats import t


def StudentTCI(loc, scale, df, alpha=0.95):
    return t.interval(alpha, df, loc, scale)

print StudentTCI(1, 2, 10)
print StudentTCI(1, 2, 10, 0.99)

Result:

(-3.4562777039298762, 5.4562777039298762)
(-5.338545334351676, 7.338545334351676)
qid & accept id: (17211188, 17211208) query: How to create a timer on python soup:

You record the start time, then later on calculate the difference between that start time and the current time.

\n

Due to platform differences, for precision you want to use the timeit.default_timer callable:

\n
from timeit import default_timer\n\nstart = default_timer()\n\n# do stuff\n\nduration = default_timer() - start\n
\n

This gives you a wall-clock time duration in seconds as a floating point value.

\n

Demo:

\n
>>> from timeit import default_timer\n>>> start = default_timer()\n>>> # Martijn reads another post somewhere\n... \n>>> print default_timer() - start\n19.1996181011\n
\n soup wrap:

You record the start time, then later on calculate the difference between that start time and the current time.

Due to platform differences, for precision you want to use the timeit.default_timer callable:

from timeit import default_timer

start = default_timer()

# do stuff

duration = default_timer() - start

This gives you a wall-clock time duration in seconds as a floating point value.

Demo:

>>> from timeit import default_timer
>>> start = default_timer()
>>> # Martijn reads another post somewhere
... 
>>> print default_timer() - start
19.1996181011
qid & accept id: (17243403, 17243475) query: Replace character in line inside a file soup:

You can use the fileinput module, if you're trying to modify the same file:

\n
>>> strs = "sample4:15"\n
\n

Take the advantage of sequence unpacking to store the results in variables after splitting.

\n
>>> sample, value = strs.split(':')\n>>> sample\n'sample4'\n>>> value\n'15'\n
\n

Code:

\n
import fileinput\nfor line in fileinput.input(filename, inplace = True):\n    sample, value = line.split(':')\n    value = int(value)     #convert value to int for calculation purpose\n    if some_condition: \n           # do some calculations on sample and value\n           # modify sample, value if required \n\n    #now the write the data(either modified or still the old one) to back to file\n    print "{}:{}".format(sample, value)\n
\n soup wrap:

You can use the fileinput module, if you're trying to modify the same file:

>>> strs = "sample4:15"

Take the advantage of sequence unpacking to store the results in variables after splitting.

>>> sample, value = strs.split(':')
>>> sample
'sample4'
>>> value
'15'

Code:

import fileinput
for line in fileinput.input(filename, inplace = True):
    sample, value = line.split(':')
    value = int(value)     #convert value to int for calculation purpose
    if some_condition: 
           # do some calculations on sample and value
           # modify sample, value if required 

    #now the write the data(either modified or still the old one) to back to file
    print "{}:{}".format(sample, value)
qid & accept id: (17250660, 17250702) query: How to parse XML file from European Central Bank with Python soup:

You have a namespaced XML file. ElementTree is not too smart about namespaces. You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary. This is not documented very well:

\n
namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'} # add more as needed\n\nfor cube in root.findall('.//ex:Cube[@currency]', namespaces=namespaces):\n    print(cube.attrib['currency'], cube.attrib['rate'])\n
\n

This uses a simple XPath query; './/' means find any child tag, ex:Cube limits the search to the tags in the namespace labeled with the ex prefix (from the namespaces mapping) and [@currency] limits the search to elements that have a currency attribute.

\n

Demo:

\n
>>> import requests\n>>> r = requests.get('http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml', stream=True)\n>>> from xml.etree import ElementTree as ET\n>>> tree = ET.parse(r.raw)\n>>> root = tree.getroot()\n>>> namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'}\n>>> for cube in root.findall('.//ex:Cube[@currency]', namespaces=namespaces):\n...     print(cube.attrib['currency'], cube.attrib['rate'])\n... \nUSD 1.3180\nJPY 128.66\nBGN 1.9558\nCZK 25.825\nDKK 7.4582\nGBP 0.85330\nHUF 298.87\nLTL 3.4528\nLVL 0.7016\nPLN 4.3289\nRON 4.5350\nSEK 8.6927\nCHF 1.2257\nNOK 7.9090\nHRK 7.4905\nRUB 43.2260\nTRY 2.5515\nAUD 1.4296\nBRL 2.9737\nCAD 1.3705\nCNY 8.0832\nHKD 10.2239\nIDR 13088.24\nILS 4.7891\nINR 78.1200\nKRW 1521.52\nMXN 17.5558\nMYR 4.2222\nNZD 1.7004\nPHP 57.707\nSGD 1.6790\nTHB 41.003\nZAR 13.4906\n
\n

You can use this information to search for the specific rate too; either build a dictionary, or search the XML document directly for matching currencies:

\n
currency = input('What currency are you looking for? ')\nmatch = root.find('.//ex:Cube[@currency="{}"]'.format(currency.upper()), namespaces=namespaces)\nif match is not None:\n    print('The rate for {} is {}'.format(currency, match.attrib['rate']))\n
\n soup wrap:

You have a namespaced XML file. ElementTree is not too smart about namespaces. You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary. This is not documented very well:

namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'} # add more as needed

for cube in root.findall('.//ex:Cube[@currency]', namespaces=namespaces):
    print(cube.attrib['currency'], cube.attrib['rate'])

This uses a simple XPath query; './/' means find any child tag, ex:Cube limits the search to the tags in the namespace labeled with the ex prefix (from the namespaces mapping) and [@currency] limits the search to elements that have a currency attribute.

Demo:

>>> import requests
>>> r = requests.get('http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml', stream=True)
>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse(r.raw)
>>> root = tree.getroot()
>>> namespaces = {'ex': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'}
>>> for cube in root.findall('.//ex:Cube[@currency]', namespaces=namespaces):
...     print(cube.attrib['currency'], cube.attrib['rate'])
... 
USD 1.3180
JPY 128.66
BGN 1.9558
CZK 25.825
DKK 7.4582
GBP 0.85330
HUF 298.87
LTL 3.4528
LVL 0.7016
PLN 4.3289
RON 4.5350
SEK 8.6927
CHF 1.2257
NOK 7.9090
HRK 7.4905
RUB 43.2260
TRY 2.5515
AUD 1.4296
BRL 2.9737
CAD 1.3705
CNY 8.0832
HKD 10.2239
IDR 13088.24
ILS 4.7891
INR 78.1200
KRW 1521.52
MXN 17.5558
MYR 4.2222
NZD 1.7004
PHP 57.707
SGD 1.6790
THB 41.003
ZAR 13.4906

You can use this information to search for the specific rate too; either build a dictionary, or search the XML document directly for matching currencies:

currency = input('What currency are you looking for? ')
match = root.find('.//ex:Cube[@currency="{}"]'.format(currency.upper()), namespaces=namespaces)
if match is not None:
    print('The rate for {} is {}'.format(currency, match.attrib['rate']))
qid & accept id: (17252056, 17252210) query: Tkinter nested mainloop soup:

This should be an example of a nested mainloop in Tkinter:

\n
import Tkinter\n\ndef main():\n    print 'main'\n    t.mainloop()\n    print 'end main'\n\nt = Tkinter.Tk()\nb = Tkinter.Button(t, command = main)\nb.pack()\nt.mainloop()\n
\n

Whenever you hit the button a new mainloop is executed.

\n
main\nmain\nmain\nmain\nmain\n# now close the window\nend main\nend main\nend main\nend main\nend main\n
\n soup wrap:

This should be an example of a nested mainloop in Tkinter:

import Tkinter

def main():
    print 'main'
    t.mainloop()
    print 'end main'

t = Tkinter.Tk()
b = Tkinter.Button(t, command = main)
b.pack()
t.mainloop()

Whenever you hit the button a new mainloop is executed.

main
main
main
main
main
# now close the window
end main
end main
end main
end main
end main
qid & accept id: (17254599, 17254627) query: Can I group / aggregate elements in a list (or dict) comprehension? soup:

You can use collections.defaultdict:

\n
>>> from collections import defaultdict\n>>> lis = [                            \n    (1, "red"),\n    (1, "red,green"),\n    (1, "green,blue"),\n    (2, "green"),\n    (2, "yellow,blue"),\n]\n>>> dic = defaultdict(set)       #sets only contain unique items\nfor k, v in lis:\n    dic[k].update(v.split(','))\n\n>>> dic\ndefaultdict(,\n{1: set(['blue', 'green', 'red']),\n 2: set(['blue', 'green', 'yellow'])})\n
\n

Now iterate over dic:

\n
>>> dic2 = defaultdict(list)\nfor k,v in dic.iteritems():\n    for val in v:\n        dic2[val].append(k)\n...         \n>>> dic2\ndefaultdict(,\n{'blue': [1, 2],\n 'green': [1, 2],\n 'yellow': [2],\n 'red': [1]})\n
\n soup wrap:

You can use collections.defaultdict:

>>> from collections import defaultdict
>>> lis = [                            
    (1, "red"),
    (1, "red,green"),
    (1, "green,blue"),
    (2, "green"),
    (2, "yellow,blue"),
]
>>> dic = defaultdict(set)       #sets only contain unique items
for k, v in lis:
    dic[k].update(v.split(','))

>>> dic
defaultdict(,
{1: set(['blue', 'green', 'red']),
 2: set(['blue', 'green', 'yellow'])})

Now iterate over dic:

>>> dic2 = defaultdict(list)
for k,v in dic.iteritems():
    for val in v:
        dic2[val].append(k)
...         
>>> dic2
defaultdict(,
{'blue': [1, 2],
 'green': [1, 2],
 'yellow': [2],
 'red': [1]})
qid & accept id: (17260358, 17260447) query: How to perform re substitutions on

tags within a specific class? soup:

p.string.strip() will remove leading, trailing spaces.

\n

p.string.replaceWith(NEW_STRING) will replace the text of p tag to NEW_STRING.

\n
from bs4 import BeautifulSoup\n\nwith open('file.html', 'r') as f:\n    html_file_as_string = f.read()\nsoup = BeautifulSoup(html_file_as_string, "lxml")\nfor div in soup.find_all('div', {'class': 'my_class'}):\n    for p in div.find('p'):\n        p.string.replace_with(p.string.strip())\nwith open('file', 'w') as f:\n    f.write(soup.renderContents())\n
\n

BTW, re.sub(..) return substituted string. It does not replace substitute original string.

\n
>>> import re\n>>> text = '   hello'\n>>> re.sub('\s+', '', text)\n'hello'\n>>> text\n'   hello'\n
\n

EDIT

\n

Code edited to match edited question:

\n
from bs4 import BeautifulSoup\n\nwith open('file.html', 'r') as f:\n    html_file_as_string = f.read()\nsoup = BeautifulSoup(html_file_as_string, "lxml")\nfor div in soup.find_all('div', {'class': 'my_class'}):\n    for p in div.findAll('p'):\n        new = BeautifulSoup(u'\n'.join(u'

{}

'.format(line.strip()) for line in p.text.splitlines() if line), 'html.parser')\n p.replace_with(new)\nwith open('file', 'w') as f:\n f.write(soup.renderContents())\n
\n soup wrap:

p.string.strip() will remove leading, trailing spaces.

p.string.replaceWith(NEW_STRING) will replace the text of p tag to NEW_STRING.

from bs4 import BeautifulSoup

with open('file.html', 'r') as f:
    html_file_as_string = f.read()
soup = BeautifulSoup(html_file_as_string, "lxml")
for div in soup.find_all('div', {'class': 'my_class'}):
    for p in div.find('p'):
        p.string.replace_with(p.string.strip())
with open('file', 'w') as f:
    f.write(soup.renderContents())

BTW, re.sub(..) return substituted string. It does not replace substitute original string.

>>> import re
>>> text = '   hello'
>>> re.sub('\s+', '', text)
'hello'
>>> text
'   hello'

EDIT

Code edited to match edited question:

from bs4 import BeautifulSoup

with open('file.html', 'r') as f:
    html_file_as_string = f.read()
soup = BeautifulSoup(html_file_as_string, "lxml")
for div in soup.find_all('div', {'class': 'my_class'}):
    for p in div.findAll('p'):
        new = BeautifulSoup(u'\n'.join(u'

{}

'.format(line.strip()) for line in p.text.splitlines() if line), 'html.parser') p.replace_with(new) with open('file', 'w') as f: f.write(soup.renderContents())
qid & accept id: (17270318, 17270611) query: Iterator for each item in a 2D Python list and its immediate m by n neighbourhood soup:
board = [\n    [1,0,1,0,1],\n    [1,0,1,0,1],\n    [1,0,1,0,1],\n    [1,0,1,0,1],\n    [1,0,1,0,1]\n]\n\ndef clamp(minV,maxV,x):\n    if x < minV:\n        return minV \n    elif x > maxV:\n        return maxV\n    else:\n        return x\n\ndef getNeighbour(grid,startx,starty,radius):\n    width = len(grid[starty])\n    height = len(grid)\n    neighbourhood = []\n    for y in range(clamp(0,height,starty-radius),clamp(0,height,starty+radius)+1):\n        row = []\n        for x in range(clamp(0,width,startx-radius),clamp(0,width,startx+radius)+1):\n            if x != startx or (x==startx and  y != starty):\n                row.append(grid[y][x])\n        neighbourhood.append(row)\n    return neighbourhood\n
\n

Examples:

\n
>>> pprint(getNeighbour(board, 0, 0, 1))\n[0]\n[1, 0] (expected)\n>>> pprint(getNeighbour(board, 2, 2, 1))\n[0, 1, 0]\n[0, 0]\n[0, 1, 0] (expected)\n>>> \n
\n
\n

Addressing the performance aspect with a list like:

\n
board = [[1,0]*2000]*1000\n
\n

The run time is essentially the same as if the board were 10x10

\n soup wrap:
board = [
    [1,0,1,0,1],
    [1,0,1,0,1],
    [1,0,1,0,1],
    [1,0,1,0,1],
    [1,0,1,0,1]
]

def clamp(minV,maxV,x):
    if x < minV:
        return minV 
    elif x > maxV:
        return maxV
    else:
        return x

def getNeighbour(grid,startx,starty,radius):
    width = len(grid[starty])
    height = len(grid)
    neighbourhood = []
    for y in range(clamp(0,height,starty-radius),clamp(0,height,starty+radius)+1):
        row = []
        for x in range(clamp(0,width,startx-radius),clamp(0,width,startx+radius)+1):
            if x != startx or (x==startx and  y != starty):
                row.append(grid[y][x])
        neighbourhood.append(row)
    return neighbourhood

Examples:

>>> pprint(getNeighbour(board, 0, 0, 1))
[0]
[1, 0] (expected)
>>> pprint(getNeighbour(board, 2, 2, 1))
[0, 1, 0]
[0, 0]
[0, 1, 0] (expected)
>>> 

Addressing the performance aspect with a list like:

board = [[1,0]*2000]*1000

The run time is essentially the same as if the board were 10x10

qid & accept id: (17288571, 17288817) query: Use Python zip to save data in separate columns from a binary file soup:

The itertools module has an islice() function which may help you:

\n
>>> s = "abcdefghijklmnopqrstuvwxyz"\n>>> import itertools\n>>> for val in itertools.islice(s, 0, None, 8):\n...   print val\n...\na\ni\nq\ny\n>>> for val in itertools.islice(s, 1, None, 8):\n...   print val\n...\nb\nj\nr\nz\n>>> for val in itertools.islice(s, 2, None, 8):\n...   print val\n...\nc\nk\ns\n
\n

So for your problem, you might do:

\n
import itertools\na = [item for item in itertools.islice(e, 0, None, 8)]\nb = [item for item in itertools.islice(e, 1, None, 8)]\nc = [item for item in itertools.islice(e, 2, None, 8)]\n
\n

and so on. Or, better yet:

\n
columns = []\nfor n in range(8):\n    columns.append([item for item in itertools.islice(e, n, None, 8)])\n
\n

Hope this helps!

\n

P.S. Here's the documentation for islice. There are plenty of other useful tools in the itertools module: take a look!

\n soup wrap:

The itertools module has an islice() function which may help you:

>>> s = "abcdefghijklmnopqrstuvwxyz"
>>> import itertools
>>> for val in itertools.islice(s, 0, None, 8):
...   print val
...
a
i
q
y
>>> for val in itertools.islice(s, 1, None, 8):
...   print val
...
b
j
r
z
>>> for val in itertools.islice(s, 2, None, 8):
...   print val
...
c
k
s

So for your problem, you might do:

import itertools
a = [item for item in itertools.islice(e, 0, None, 8)]
b = [item for item in itertools.islice(e, 1, None, 8)]
c = [item for item in itertools.islice(e, 2, None, 8)]

and so on. Or, better yet:

columns = []
for n in range(8):
    columns.append([item for item in itertools.islice(e, n, None, 8)])

Hope this helps!

P.S. Here's the documentation for islice. There are plenty of other useful tools in the itertools module: take a look!

qid & accept id: (17307474, 17307521) query: how to update global variable in python soup:

Use global statement. But there's no need of global for mutable objects, if you're modifying them in-place.

\n

You can use modules like pickle to store your list in a file. You can load the list when you want to use it and store it back after doing your modifications.

\n
lis = ["link1", "link2",...]\n\ndef update():\n  global lis\n  #do something\n  return lis\n
\n

Pickle example:

\n
import pickle\ndef update():\n  lis = pickle.load( open( "lis.pkl", "rb" ) ) # Load the list\n  #do something with lis                     #modify it \n  pickle.dump( lis, open( "lis.pkl", "wb" ) )  #save it again\n
\n

For better performance you can also use the cPickle module.

\n

More examples

\n soup wrap:

Use global statement. But there's no need of global for mutable objects, if you're modifying them in-place.

You can use modules like pickle to store your list in a file. You can load the list when you want to use it and store it back after doing your modifications.

lis = ["link1", "link2",...]

def update():
  global lis
  #do something
  return lis

Pickle example:

import pickle
def update():
  lis = pickle.load( open( "lis.pkl", "rb" ) ) # Load the list
  #do something with lis                     #modify it 
  pickle.dump( lis, open( "lis.pkl", "wb" ) )  #save it again

For better performance you can also use the cPickle module.

More examples

qid & accept id: (17334702, 17335025) query: inserting a new entry into adjacency list soup:

First, I'd suggest defaultdict so that referencing an index that doesn't exist will initialize it to an empty list.

\n
from collections import defaultdict\n\ndict1 = defaultdict(list)\ndict1['x1'] = ['y1','y2']\ndict1['x2'] = ['y2','y3','y4']\ndict2 = defaultdict(list)\ndict2['y1'] = ['x1']\ndict2['y2'] = ['x1','x2']\ndict2['y3'] = ['x2']\n
\n

Then when 'x3':[y2,y4] comes in:

\n
dict1['x3'] = set(dict1['x3']+[y2,y4])\nfor y in dict1['x3']:\n    dict2[y] = set(dict2[y]+'x3')\n
\n

using set to eliminate duplicate values. Obviously some of the above values would be a little more dynamic than hard coded values.

\n

Note: This isn't faster, in fact it's probably slower, but the defaultdict is a better way to avoid the KeyError and you definitely don't want to introduce duplicate values into your adj list as this will hurt the performance, or maybe even correctness, of whatever algorithm you apply to this graph.

\n soup wrap:

First, I'd suggest defaultdict so that referencing an index that doesn't exist will initialize it to an empty list.

from collections import defaultdict

dict1 = defaultdict(list)
dict1['x1'] = ['y1','y2']
dict1['x2'] = ['y2','y3','y4']
dict2 = defaultdict(list)
dict2['y1'] = ['x1']
dict2['y2'] = ['x1','x2']
dict2['y3'] = ['x2']

Then when 'x3':[y2,y4] comes in:

dict1['x3'] = set(dict1['x3']+[y2,y4])
for y in dict1['x3']:
    dict2[y] = set(dict2[y]+'x3')

using set to eliminate duplicate values. Obviously some of the above values would be a little more dynamic than hard coded values.

Note: This isn't faster, in fact it's probably slower, but the defaultdict is a better way to avoid the KeyError and you definitely don't want to introduce duplicate values into your adj list as this will hurt the performance, or maybe even correctness, of whatever algorithm you apply to this graph.

qid & accept id: (17365289, 17365399) query: How to send audio wav file generated at the server to client browser? soup:

You can create an in-memory file with StringIO:

\n
from cStringIO import StringIO\nfrom flask import make_response\n\nfrom somewhere import generate_wav_file  # TODO your code here\n\n@app.route('/path')\ndef view_method():\n\n    buf = StringIO()\n\n    # generate_wav_file should take a file as parameter and write a wav in it\n    generate_wav_file(buf) \n\n    response = make_response(buf.getvalue())\n    buf.close()\n    response.headers['Content-Type'] = 'audio/wav'\n    response.headers['Content-Disposition'] = 'attachment; filename=sound.wav'\n    return response\n
\n

If you have file on disk:

\n
from flask import send_file\n\n@app.route('/path')\ndef view_method():\n     path_to_file = "/test.wav"\n\n     return send_file(\n         path_to_file, \n         mimetype="audio/wav", \n         as_attachment=True, \n         attachment_filename="test.wav")\n
\n soup wrap:

You can create an in-memory file with StringIO:

from cStringIO import StringIO
from flask import make_response

from somewhere import generate_wav_file  # TODO your code here

@app.route('/path')
def view_method():

    buf = StringIO()

    # generate_wav_file should take a file as parameter and write a wav in it
    generate_wav_file(buf) 

    response = make_response(buf.getvalue())
    buf.close()
    response.headers['Content-Type'] = 'audio/wav'
    response.headers['Content-Disposition'] = 'attachment; filename=sound.wav'
    return response

If you have file on disk:

from flask import send_file

@app.route('/path')
def view_method():
     path_to_file = "/test.wav"

     return send_file(
         path_to_file, 
         mimetype="audio/wav", 
         as_attachment=True, 
         attachment_filename="test.wav")
qid & accept id: (17368930, 17369007) query: Inspecting data descriptor attributes in python soup:

To get the descriptor itself, you can look into class __dict__:

\n
MyClass.__dict__['x']\n
\n

But the better way is to modify the getter:

\n
def __get__(self, obj, objtype):\n    print 'Retrieving', self.name\n    if obj is None:  # accessed as class attribute\n        return self  # return the descriptor itself\n    else:  # accessed as instance attribute\n        return self.val  # return a value\n
\n

Which gives:

\n
Retrieving var "x"\n('__weakref__', )\n('x', <__main__.RevealAccess object at 0x7f32ef989890>)\n
\n soup wrap:

To get the descriptor itself, you can look into class __dict__:

MyClass.__dict__['x']

But the better way is to modify the getter:

def __get__(self, obj, objtype):
    print 'Retrieving', self.name
    if obj is None:  # accessed as class attribute
        return self  # return the descriptor itself
    else:  # accessed as instance attribute
        return self.val  # return a value

Which gives:

Retrieving var "x"
('__weakref__', )
('x', <__main__.RevealAccess object at 0x7f32ef989890>)
qid & accept id: (17374553, 17375569) query: How to create a new list or new line after a certain number of iterations soup:

You could do this:

\n
import csv\nfrom itertools import izip_longest\n\nwith open('/tmp/line.csv','r') as fin:\n    cr=csv.reader(fin)\n    n=10\n    data=izip_longest(*[iter(list(cr)[0])]*n,fillvalue='')\n    print '\n'.join(', '.join(t) for t in data)\n
\n

With your data, prints:

\n
CLB, HNRG, LPI, MTDR, MVO, NRGY, PSE, PVR, RRC, WES\nACMP, ATLS, ATW, BP, BWP, COG, DGAS, DNR, EPB, EPL\nEXLP, NOV, OIS, PNRG, SEP, APL, ARP, CVX, DMLP, DRQ\nDWSN, EC, ECA, FTI, GLOG, IMO, LINE, NFX, OILT, PNG\nQRE, RGP, RRMS, SDRL, SNP, TLP, VNR, XOM, XTXI, AHGP\n
\n

Edit

\n

With the clarification (Py 3)

\n

I would write your program thissa way:

\n
import csv\nfrom itertools import zip_longest\n\nn=10\nwith open('/tmp/rawdata.txt','r') as fin, open('/tmp/out.csv','w') as fout:\n    reader=csv.reader(fin)\n    writer=csv.writer(fout) \n    source=(e for line in reader for e in line)             \n    for t in zip_longest(*[source]*n):\n        writer.writerow(list(e for e in t if e))\n
\n

Changes:

\n
    \n
  1. Output is to a file;
  2. \n
  3. Source of elements is a generator;
  4. \n
  5. No matter how many lines or comma separated elements per line, the source is treated item by item (subject to csv/element considerations);
  6. \n
  7. No matter what n is, the output is n elements long until there is the last bit < n
  8. \n
\n soup wrap:

You could do this:

import csv
from itertools import izip_longest

with open('/tmp/line.csv','r') as fin:
    cr=csv.reader(fin)
    n=10
    data=izip_longest(*[iter(list(cr)[0])]*n,fillvalue='')
    print '\n'.join(', '.join(t) for t in data)

With your data, prints:

CLB, HNRG, LPI, MTDR, MVO, NRGY, PSE, PVR, RRC, WES
ACMP, ATLS, ATW, BP, BWP, COG, DGAS, DNR, EPB, EPL
EXLP, NOV, OIS, PNRG, SEP, APL, ARP, CVX, DMLP, DRQ
DWSN, EC, ECA, FTI, GLOG, IMO, LINE, NFX, OILT, PNG
QRE, RGP, RRMS, SDRL, SNP, TLP, VNR, XOM, XTXI, AHGP

Edit

With the clarification (Py 3)

I would write your program thissa way:

import csv
from itertools import zip_longest

n=10
with open('/tmp/rawdata.txt','r') as fin, open('/tmp/out.csv','w') as fout:
    reader=csv.reader(fin)
    writer=csv.writer(fout) 
    source=(e for line in reader for e in line)             
    for t in zip_longest(*[source]*n):
        writer.writerow(list(e for e in t if e))

Changes:

  1. Output is to a file;
  2. Source of elements is a generator;
  3. No matter how many lines or comma separated elements per line, the source is treated item by item (subject to csv/element considerations);
  4. No matter what n is, the output is n elements long until there is the last bit < n
qid & accept id: (17387219, 17387992) query: Sorting numpy matrix for a given column soup:

If you have a np.matrix, called m:

\n
col = 1\nm[np.array(m[:,col].argsort(axis=0).tolist()).ravel()]\n
\n

If you have a np.ndarray, called a:

\n
col = 1\na[a[:,col].argsort(axis=0)]\n
\n

If you have a structured array with named columns:

\n
def mysort(data, col_name, key=None):\n    d = data.copy()\n    cols = [i[0] for i in eval(str(d.dtype))]\n    if key:\n        argsort = np.array([key(i) for i in d[col_name]]).argsort()\n    else:\n        argsort = d[col_name].argsort()\n    for col in cols:\n        d[col] = d[col][argsort]\n    return d\n
\n

For your specific case you need the following key function:

\n
def key(x):\n    x = ''.join([i for i in x if i.isdigit() or i=='_'])\n    return '{1:{f}{a}10}_{2:{f}{a}10}_{3:{f}{a}10}'.format(*x.split('_'), f='0', a='>')\n\nd = mysort(data, 'MyColumn', key)\n
\n soup wrap:

If you have a np.matrix, called m:

col = 1
m[np.array(m[:,col].argsort(axis=0).tolist()).ravel()]

If you have a np.ndarray, called a:

col = 1
a[a[:,col].argsort(axis=0)]

If you have a structured array with named columns:

def mysort(data, col_name, key=None):
    d = data.copy()
    cols = [i[0] for i in eval(str(d.dtype))]
    if key:
        argsort = np.array([key(i) for i in d[col_name]]).argsort()
    else:
        argsort = d[col_name].argsort()
    for col in cols:
        d[col] = d[col][argsort]
    return d

For your specific case you need the following key function:

def key(x):
    x = ''.join([i for i in x if i.isdigit() or i=='_'])
    return '{1:{f}{a}10}_{2:{f}{a}10}_{3:{f}{a}10}'.format(*x.split('_'), f='0', a='>')

d = mysort(data, 'MyColumn', key)
qid & accept id: (17426202, 17426234) query: Python function that takes an input and spits out a month and how many days it has soup:

Use the calendar module; the calendar.monthrange() function returns a (weekday, number_of_days) tuple:

\n
>>> import calendar\n>>> print calendar.monthrange(2012, 2)[1]\n29\n
\n

Note that you have to include the year; in a leap-year, February has 29 days, after all.

\n

You can get just the current year with the datetime module:

\n
import datetime\nyear = datetime.date.today().year\n
\n

Now you only have to ask for a month number:

\n
import datetime\nimport calendar\n\ndef main():\n    year = datetime.date.today().year\n    userin = int(raw_input("Enter a month as number: "))  # Python 3: `int(input(...))` \n    print '{}, {}'.format(calendar.month_abbr[userin], calendar.monthrange(year, userin)[1])\n
\n

This prints the abbreviated month and the number of days:

\n
Enter a month as number: 2\nFeb, 28\n
\n soup wrap:

Use the calendar module; the calendar.monthrange() function returns a (weekday, number_of_days) tuple:

>>> import calendar
>>> print calendar.monthrange(2012, 2)[1]
29

Note that you have to include the year; in a leap-year, February has 29 days, after all.

You can get just the current year with the datetime module:

import datetime
year = datetime.date.today().year

Now you only have to ask for a month number:

import datetime
import calendar

def main():
    year = datetime.date.today().year
    userin = int(raw_input("Enter a month as number: "))  # Python 3: `int(input(...))` 
    print '{}, {}'.format(calendar.month_abbr[userin], calendar.monthrange(year, userin)[1])

This prints the abbreviated month and the number of days:

Enter a month as number: 2
Feb, 28
qid & accept id: (17462994, 17463020) query: Python getting a string (key + value) from Python Dictionary soup:

A list of key-value strs,

\n
>>> d = {'key1': 'value1', 'key2': 'value2'}\n>>> ['{}_{}'.format(k,v) for k,v in d.iteritems()]\n['key2_value2', 'key1_value1']\n
\n

Or if you want a single string of all key-value pairs,

\n
>>> ', '.join(['{}_{}'.format(k,v) for k,v in d.iteritems()])\n'key2_value2, key1_value1'\n
\n

EDIT:

\n

Maybe you are looking for something like this,

\n
def checkCommonNodes(id, rs):\n    id_key, id_value = id.split('_')\n    for r in rs:\n        try:\n            if r[id_key] == id_value:\n                print "".join('{}_{}'.format(k,v) for k,v in r.iteritems())\n        except KeyError:\n            continue\n
\n

You may also be wanting to break after printing - hard to know exactly what this is for.

\n soup wrap:

A list of key-value strs,

>>> d = {'key1': 'value1', 'key2': 'value2'}
>>> ['{}_{}'.format(k,v) for k,v in d.iteritems()]
['key2_value2', 'key1_value1']

Or if you want a single string of all key-value pairs,

>>> ', '.join(['{}_{}'.format(k,v) for k,v in d.iteritems()])
'key2_value2, key1_value1'

EDIT:

Maybe you are looking for something like this,

def checkCommonNodes(id, rs):
    id_key, id_value = id.split('_')
    for r in rs:
        try:
            if r[id_key] == id_value:
                print "".join('{}_{}'.format(k,v) for k,v in r.iteritems())
        except KeyError:
            continue

You may also be wanting to break after printing - hard to know exactly what this is for.

qid & accept id: (17471682, 17473460) query: Remove a level from a pandas MultiIndex soup:

This could be an enhancement to droplevel, maybe by passing uniquify=True

\n
In [77]: MultiIndex.from_tuples(index_3levels.droplevel('l3').unique())\nOut[77]: \nMultiIndex\n[(0, 100), (1, 101)]\n
\n

Here's another way to do this

\n

First create some data

\n
In [226]: def f(i):\n            return [(i,100,1000),(i,100,1001),(i,100,1002),(i+1,101,1001)]\n\nIn [227]: l = []\n\nIn [228]: for i in range(1000000):\n             l.extend(f(i))\n\nIn [229]: index_3levels=pd.MultiIndex.from_tuples(l,names=["l1","l2","l3"])\n\nIn [230]: len(index_3levels)\nOut[230]: 4000000\n
\n

The method shown above

\n
In [238]: %timeit MultiIndex.from_tuples(index_3levels.droplevel(level='l3').unique())\n1 loops, best of 3: 2.26 s per loop\n
\n

Let's split the index apart to 2 components, l1, and l2 and uniquify, much\nfaster to unique these as these are Int64Index

\n
In [249]: l2 = index_3levels.droplevel(level='l3').droplevel(level='l1').unique()\n\nIn [250]: %timeit index_3levels.droplevel(level='l3').droplevel(level='l1').unique()\n10 loops, best of 3: 35.3 ms per loop\n\nIn [251]: l1 = index_3levels.droplevel(level='l3').droplevel(level='l2').unique()\n\nIn [252]: %timeit index_3levels.droplevel(level='l3').droplevel(level='l2').unique()\n10 loops, best of 3: 52.2 ms per loop\n\nIn [253]: len(l1)\nOut[253]: 1000001\n\nIn [254]: len(l2)\nOut[254]: 2\n
\n

Reassemble

\n
In [255]: %timeit MultiIndex.from_arrays([ np.repeat(l1,len(l2)), np.repeat(l2,len(l1)) ])\n10 loops, best of 3: 183 ms per loop\n
\n

Total time about 270ms, pretty good speedup. Note that I think the ordering may be different, but I think some combination of np.repeate/np.tile will work

\n soup wrap:

This could be an enhancement to droplevel, maybe by passing uniquify=True

In [77]: MultiIndex.from_tuples(index_3levels.droplevel('l3').unique())
Out[77]: 
MultiIndex
[(0, 100), (1, 101)]

Here's another way to do this

First create some data

In [226]: def f(i):
            return [(i,100,1000),(i,100,1001),(i,100,1002),(i+1,101,1001)]

In [227]: l = []

In [228]: for i in range(1000000):
             l.extend(f(i))

In [229]: index_3levels=pd.MultiIndex.from_tuples(l,names=["l1","l2","l3"])

In [230]: len(index_3levels)
Out[230]: 4000000

The method shown above

In [238]: %timeit MultiIndex.from_tuples(index_3levels.droplevel(level='l3').unique())
1 loops, best of 3: 2.26 s per loop

Let's split the index apart to 2 components, l1, and l2 and uniquify, much faster to unique these as these are Int64Index

In [249]: l2 = index_3levels.droplevel(level='l3').droplevel(level='l1').unique()

In [250]: %timeit index_3levels.droplevel(level='l3').droplevel(level='l1').unique()
10 loops, best of 3: 35.3 ms per loop

In [251]: l1 = index_3levels.droplevel(level='l3').droplevel(level='l2').unique()

In [252]: %timeit index_3levels.droplevel(level='l3').droplevel(level='l2').unique()
10 loops, best of 3: 52.2 ms per loop

In [253]: len(l1)
Out[253]: 1000001

In [254]: len(l2)
Out[254]: 2

Reassemble

In [255]: %timeit MultiIndex.from_arrays([ np.repeat(l1,len(l2)), np.repeat(l2,len(l1)) ])
10 loops, best of 3: 183 ms per loop

Total time about 270ms, pretty good speedup. Note that I think the ordering may be different, but I think some combination of np.repeate/np.tile will work

qid & accept id: (17478779, 17478866) query: Make scatter plot from set of points in tuples soup:

You can do:

\n
x,y = zip(*s)\nplt.scatter(x,y)\n
\n

Or even in an "one-liner":

\n
plt.scatter(*zip(*a))\n
\n

zip() can be used to pack and unpack arrays and when you call using method(*list_or_tuple), each element in the list or tuple is passed as an argument.

\n soup wrap:

You can do:

x,y = zip(*s)
plt.scatter(x,y)

Or even in an "one-liner":

plt.scatter(*zip(*a))

zip() can be used to pack and unpack arrays and when you call using method(*list_or_tuple), each element in the list or tuple is passed as an argument.

qid & accept id: (17486578, 17616626) query: How can you bundle all your python code into a single zip file? soup:

You can automate most of the work with regular python tools. Let's start with clean virtualenv.

\n
[zart@feena ~]$ mkdir ziplib-demo\n[zart@feena ~]$ cd ziplib-demo\n[zart@feena ziplib-demo]$ virtualenv .\nNew python executable in ./bin/python\nInstalling setuptools.............done.\nInstalling pip...............done.\n
\n

Now let's install set of packages that will go into zipped library. The trick is to force installing them into specific directory.

\n

(Note: don't use --egg option either on command-line or in pip.conf/pip.ini because it will break file layout making it non-importable in zip)

\n
[zart@feena ziplib-demo]$ bin/pip install --install-option --install-lib=$PWD/unpacked waitress\nDownloading/unpacking waitress\n  Downloading waitress-0.8.5.tar.gz (112kB): 112kB downloaded\n  Running setup.py egg_info for package waitress\n\nRequirement already satisfied (use --upgrade to upgrade): setuptools in ./lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg (from waitress)\nInstalling collected packages: waitress\n  Running setup.py install for waitress\n\n    Installing waitress-serve script to /home/zart/ziplib-demo/bin\nSuccessfully installed waitress\nCleaning up...\n
\n

Update: pip now has -t switch, that does the same thing as --install-option --install-lib=.

\n

Now let's pack all of them into one zip

\n
[zart@feena ziplib-demo]$ cd unpacked\n[zart@feena unpacked]$ ls\nwaitress  waitress-0.8.5-py2.7.egg-info\n[zart@feena unpacked]$ zip -r9 ../library.zip *\n  adding: waitress/ (stored 0%)\n  adding: waitress/receiver.py (deflated 71%)\n  adding: waitress/server.pyc (deflated 64%)\n  adding: waitress/utilities.py (deflated 62%)\n  adding: waitress/trigger.pyc (deflated 63%)\n  adding: waitress/trigger.py (deflated 61%)\n  adding: waitress/receiver.pyc (deflated 60%)\n  adding: waitress/adjustments.pyc (deflated 51%)\n  adding: waitress/compat.pyc (deflated 56%)\n  adding: waitress/adjustments.py (deflated 60%)\n  adding: waitress/server.py (deflated 68%)\n  adding: waitress/channel.py (deflated 72%)\n  adding: waitress/task.pyc (deflated 57%)\n  adding: waitress/tests/ (stored 0%)\n  adding: waitress/tests/test_regression.py (deflated 63%)\n  adding: waitress/tests/test_functional.py (deflated 88%)\n  adding: waitress/tests/test_parser.pyc (deflated 76%)\n  adding: waitress/tests/test_trigger.pyc (deflated 73%)\n  adding: waitress/tests/test_init.py (deflated 72%)\n  adding: waitress/tests/test_utilities.pyc (deflated 78%)\n  adding: waitress/tests/test_buffers.pyc (deflated 79%)\n  adding: waitress/tests/test_trigger.py (deflated 82%)\n  adding: waitress/tests/test_buffers.py (deflated 86%)\n  adding: waitress/tests/test_runner.py (deflated 75%)\n  adding: waitress/tests/test_init.pyc (deflated 69%)\n  adding: waitress/tests/__init__.pyc (deflated 21%)\n  adding: waitress/tests/support.pyc (deflated 48%)\n  adding: waitress/tests/test_utilities.py (deflated 73%)\n  adding: waitress/tests/test_channel.py (deflated 87%)\n  adding: waitress/tests/test_task.py (deflated 87%)\n  adding: waitress/tests/test_functional.pyc (deflated 82%)\n  adding: waitress/tests/__init__.py (deflated 5%)\n  adding: waitress/tests/test_compat.pyc (deflated 53%)\n  adding: waitress/tests/test_receiver.pyc (deflated 79%)\n  adding: waitress/tests/test_adjustments.py (deflated 78%)\n  adding: waitress/tests/test_adjustments.pyc (deflated 74%)\n  adding: waitress/tests/test_server.pyc (deflated 73%)\n  adding: waitress/tests/fixtureapps/ (stored 0%)\n  adding: waitress/tests/fixtureapps/filewrapper.pyc (deflated 59%)\n  adding: waitress/tests/fixtureapps/getline.py (deflated 37%)\n  adding: waitress/tests/fixtureapps/nocl.py (deflated 47%)\n  adding: waitress/tests/fixtureapps/sleepy.pyc (deflated 44%)\n  adding: waitress/tests/fixtureapps/echo.py (deflated 40%)\n  adding: waitress/tests/fixtureapps/error.py (deflated 52%)\n  adding: waitress/tests/fixtureapps/nocl.pyc (deflated 48%)\n  adding: waitress/tests/fixtureapps/getline.pyc (deflated 32%)\n  adding: waitress/tests/fixtureapps/writecb.pyc (deflated 42%)\n  adding: waitress/tests/fixtureapps/toolarge.py (deflated 37%)\n  adding: waitress/tests/fixtureapps/__init__.pyc (deflated 20%)\n  adding: waitress/tests/fixtureapps/writecb.py (deflated 50%)\n  adding: waitress/tests/fixtureapps/badcl.pyc (deflated 44%)\n  adding: waitress/tests/fixtureapps/runner.pyc (deflated 58%)\n  adding: waitress/tests/fixtureapps/__init__.py (stored 0%)\n  adding: waitress/tests/fixtureapps/filewrapper.py (deflated 74%)\n  adding: waitress/tests/fixtureapps/runner.py (deflated 41%)\n  adding: waitress/tests/fixtureapps/echo.pyc (deflated 42%)\n  adding: waitress/tests/fixtureapps/groundhog1.jpg (deflated 24%)\n  adding: waitress/tests/fixtureapps/error.pyc (deflated 48%)\n  adding: waitress/tests/fixtureapps/sleepy.py (deflated 42%)\n  adding: waitress/tests/fixtureapps/toolarge.pyc (deflated 43%)\n  adding: waitress/tests/fixtureapps/badcl.py (deflated 45%)\n  adding: waitress/tests/support.py (deflated 52%)\n  adding: waitress/tests/test_task.pyc (deflated 78%)\n  adding: waitress/tests/test_channel.pyc (deflated 78%)\n  adding: waitress/tests/test_regression.pyc (deflated 68%)\n  adding: waitress/tests/test_parser.py (deflated 80%)\n  adding: waitress/tests/test_server.py (deflated 78%)\n  adding: waitress/tests/test_receiver.py (deflated 87%)\n  adding: waitress/tests/test_compat.py (deflated 51%)\n  adding: waitress/tests/test_runner.pyc (deflated 72%)\n  adding: waitress/__init__.pyc (deflated 50%)\n  adding: waitress/channel.pyc (deflated 58%)\n  adding: waitress/runner.pyc (deflated 54%)\n  adding: waitress/buffers.py (deflated 74%)\n  adding: waitress/__init__.py (deflated 61%)\n  adding: waitress/runner.py (deflated 58%)\n  adding: waitress/parser.py (deflated 69%)\n  adding: waitress/compat.py (deflated 69%)\n  adding: waitress/buffers.pyc (deflated 69%)\n  adding: waitress/utilities.pyc (deflated 60%)\n  adding: waitress/parser.pyc (deflated 53%)\n  adding: waitress/task.py (deflated 72%)\n  adding: waitress-0.8.5-py2.7.egg-info/ (stored 0%)\n  adding: waitress-0.8.5-py2.7.egg-info/dependency_links.txt (stored 0%)\n  adding: waitress-0.8.5-py2.7.egg-info/installed-files.txt (deflated 83%)\n  adding: waitress-0.8.5-py2.7.egg-info/top_level.txt (stored 0%)\n  adding: waitress-0.8.5-py2.7.egg-info/PKG-INFO (deflated 65%)\n  adding: waitress-0.8.5-py2.7.egg-info/not-zip-safe (stored 0%)\n  adding: waitress-0.8.5-py2.7.egg-info/SOURCES.txt (deflated 71%)\n  adding: waitress-0.8.5-py2.7.egg-info/entry_points.txt (deflated 33%)\n  adding: waitress-0.8.5-py2.7.egg-info/requires.txt (deflated 5%)\n[zart@feena unpacked]$ cd ..\n
\n

Note that those files should be at top of zip, you can't just zip -r9 library.zip unpacked

\n

Checking the result:

\n
[zart@feena ziplib-demo]$ PYTHONPATH=library.zip python\nPython 2.7.1 (r271:86832, Apr 12 2011, 16:15:16)\n[GCC 4.6.0 20110331 (Red Hat 4.6.0-2)] on linux2\nType "help", "copyright", "credits" or "license" for more information.\n>>> import waitress\n>>> waitress\n\n>>>\n>>> from wsgiref.simple_server import demo_app\n>>> waitress.serve(demo_app)\nserving on http://0.0.0.0:8080\n^C>>>\n
\n

Update: since python 3.5 there is also zipapp module which can help with bundling the whole package into .pyz file. For more complex needs pyinstaller, py2exe or py2app might better fit the bill.

\n soup wrap:

You can automate most of the work with regular python tools. Let's start with clean virtualenv.

[zart@feena ~]$ mkdir ziplib-demo
[zart@feena ~]$ cd ziplib-demo
[zart@feena ziplib-demo]$ virtualenv .
New python executable in ./bin/python
Installing setuptools.............done.
Installing pip...............done.

Now let's install set of packages that will go into zipped library. The trick is to force installing them into specific directory.

(Note: don't use --egg option either on command-line or in pip.conf/pip.ini because it will break file layout making it non-importable in zip)

[zart@feena ziplib-demo]$ bin/pip install --install-option --install-lib=$PWD/unpacked waitress
Downloading/unpacking waitress
  Downloading waitress-0.8.5.tar.gz (112kB): 112kB downloaded
  Running setup.py egg_info for package waitress

Requirement already satisfied (use --upgrade to upgrade): setuptools in ./lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg (from waitress)
Installing collected packages: waitress
  Running setup.py install for waitress

    Installing waitress-serve script to /home/zart/ziplib-demo/bin
Successfully installed waitress
Cleaning up...

Update: pip now has -t switch, that does the same thing as --install-option --install-lib=.

Now let's pack all of them into one zip

[zart@feena ziplib-demo]$ cd unpacked
[zart@feena unpacked]$ ls
waitress  waitress-0.8.5-py2.7.egg-info
[zart@feena unpacked]$ zip -r9 ../library.zip *
  adding: waitress/ (stored 0%)
  adding: waitress/receiver.py (deflated 71%)
  adding: waitress/server.pyc (deflated 64%)
  adding: waitress/utilities.py (deflated 62%)
  adding: waitress/trigger.pyc (deflated 63%)
  adding: waitress/trigger.py (deflated 61%)
  adding: waitress/receiver.pyc (deflated 60%)
  adding: waitress/adjustments.pyc (deflated 51%)
  adding: waitress/compat.pyc (deflated 56%)
  adding: waitress/adjustments.py (deflated 60%)
  adding: waitress/server.py (deflated 68%)
  adding: waitress/channel.py (deflated 72%)
  adding: waitress/task.pyc (deflated 57%)
  adding: waitress/tests/ (stored 0%)
  adding: waitress/tests/test_regression.py (deflated 63%)
  adding: waitress/tests/test_functional.py (deflated 88%)
  adding: waitress/tests/test_parser.pyc (deflated 76%)
  adding: waitress/tests/test_trigger.pyc (deflated 73%)
  adding: waitress/tests/test_init.py (deflated 72%)
  adding: waitress/tests/test_utilities.pyc (deflated 78%)
  adding: waitress/tests/test_buffers.pyc (deflated 79%)
  adding: waitress/tests/test_trigger.py (deflated 82%)
  adding: waitress/tests/test_buffers.py (deflated 86%)
  adding: waitress/tests/test_runner.py (deflated 75%)
  adding: waitress/tests/test_init.pyc (deflated 69%)
  adding: waitress/tests/__init__.pyc (deflated 21%)
  adding: waitress/tests/support.pyc (deflated 48%)
  adding: waitress/tests/test_utilities.py (deflated 73%)
  adding: waitress/tests/test_channel.py (deflated 87%)
  adding: waitress/tests/test_task.py (deflated 87%)
  adding: waitress/tests/test_functional.pyc (deflated 82%)
  adding: waitress/tests/__init__.py (deflated 5%)
  adding: waitress/tests/test_compat.pyc (deflated 53%)
  adding: waitress/tests/test_receiver.pyc (deflated 79%)
  adding: waitress/tests/test_adjustments.py (deflated 78%)
  adding: waitress/tests/test_adjustments.pyc (deflated 74%)
  adding: waitress/tests/test_server.pyc (deflated 73%)
  adding: waitress/tests/fixtureapps/ (stored 0%)
  adding: waitress/tests/fixtureapps/filewrapper.pyc (deflated 59%)
  adding: waitress/tests/fixtureapps/getline.py (deflated 37%)
  adding: waitress/tests/fixtureapps/nocl.py (deflated 47%)
  adding: waitress/tests/fixtureapps/sleepy.pyc (deflated 44%)
  adding: waitress/tests/fixtureapps/echo.py (deflated 40%)
  adding: waitress/tests/fixtureapps/error.py (deflated 52%)
  adding: waitress/tests/fixtureapps/nocl.pyc (deflated 48%)
  adding: waitress/tests/fixtureapps/getline.pyc (deflated 32%)
  adding: waitress/tests/fixtureapps/writecb.pyc (deflated 42%)
  adding: waitress/tests/fixtureapps/toolarge.py (deflated 37%)
  adding: waitress/tests/fixtureapps/__init__.pyc (deflated 20%)
  adding: waitress/tests/fixtureapps/writecb.py (deflated 50%)
  adding: waitress/tests/fixtureapps/badcl.pyc (deflated 44%)
  adding: waitress/tests/fixtureapps/runner.pyc (deflated 58%)
  adding: waitress/tests/fixtureapps/__init__.py (stored 0%)
  adding: waitress/tests/fixtureapps/filewrapper.py (deflated 74%)
  adding: waitress/tests/fixtureapps/runner.py (deflated 41%)
  adding: waitress/tests/fixtureapps/echo.pyc (deflated 42%)
  adding: waitress/tests/fixtureapps/groundhog1.jpg (deflated 24%)
  adding: waitress/tests/fixtureapps/error.pyc (deflated 48%)
  adding: waitress/tests/fixtureapps/sleepy.py (deflated 42%)
  adding: waitress/tests/fixtureapps/toolarge.pyc (deflated 43%)
  adding: waitress/tests/fixtureapps/badcl.py (deflated 45%)
  adding: waitress/tests/support.py (deflated 52%)
  adding: waitress/tests/test_task.pyc (deflated 78%)
  adding: waitress/tests/test_channel.pyc (deflated 78%)
  adding: waitress/tests/test_regression.pyc (deflated 68%)
  adding: waitress/tests/test_parser.py (deflated 80%)
  adding: waitress/tests/test_server.py (deflated 78%)
  adding: waitress/tests/test_receiver.py (deflated 87%)
  adding: waitress/tests/test_compat.py (deflated 51%)
  adding: waitress/tests/test_runner.pyc (deflated 72%)
  adding: waitress/__init__.pyc (deflated 50%)
  adding: waitress/channel.pyc (deflated 58%)
  adding: waitress/runner.pyc (deflated 54%)
  adding: waitress/buffers.py (deflated 74%)
  adding: waitress/__init__.py (deflated 61%)
  adding: waitress/runner.py (deflated 58%)
  adding: waitress/parser.py (deflated 69%)
  adding: waitress/compat.py (deflated 69%)
  adding: waitress/buffers.pyc (deflated 69%)
  adding: waitress/utilities.pyc (deflated 60%)
  adding: waitress/parser.pyc (deflated 53%)
  adding: waitress/task.py (deflated 72%)
  adding: waitress-0.8.5-py2.7.egg-info/ (stored 0%)
  adding: waitress-0.8.5-py2.7.egg-info/dependency_links.txt (stored 0%)
  adding: waitress-0.8.5-py2.7.egg-info/installed-files.txt (deflated 83%)
  adding: waitress-0.8.5-py2.7.egg-info/top_level.txt (stored 0%)
  adding: waitress-0.8.5-py2.7.egg-info/PKG-INFO (deflated 65%)
  adding: waitress-0.8.5-py2.7.egg-info/not-zip-safe (stored 0%)
  adding: waitress-0.8.5-py2.7.egg-info/SOURCES.txt (deflated 71%)
  adding: waitress-0.8.5-py2.7.egg-info/entry_points.txt (deflated 33%)
  adding: waitress-0.8.5-py2.7.egg-info/requires.txt (deflated 5%)
[zart@feena unpacked]$ cd ..

Note that those files should be at top of zip, you can't just zip -r9 library.zip unpacked

Checking the result:

[zart@feena ziplib-demo]$ PYTHONPATH=library.zip python
Python 2.7.1 (r271:86832, Apr 12 2011, 16:15:16)
[GCC 4.6.0 20110331 (Red Hat 4.6.0-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import waitress
>>> waitress

>>>
>>> from wsgiref.simple_server import demo_app
>>> waitress.serve(demo_app)
serving on http://0.0.0.0:8080
^C>>>

Update: since python 3.5 there is also zipapp module which can help with bundling the whole package into .pyz file. For more complex needs pyinstaller, py2exe or py2app might better fit the bill.

qid & accept id: (17504995, 17505027) query: Python iterate list of dicts and create a new one soup:

Just use a loop:

\n
for entry in inputlist:\n    entry['r'] = min(mxr, calculateRange(x, entry['x'], y, entry['y']))\n
\n

Dictionaries are mutable, adding a key is reflected in all references to the dictionary.

\n

Demo:

\n
>>> import math\n>>> def calculateRange (x1, x2, y1, y2):\n...   squareNumber = math.sqrt(math.pow ((x1-x2),2) + math.pow((y1-y2),2))\n...   return round(squareNumber, 1)\n...\n>>> x = 2\n>>> y = 3\n>>> mxr = 30\n>>> inputlist = [\n...    {'town':'A', 'x':12, 'y':13},\n...    {'town':'B', 'x':100, 'y':43},\n...    {'town':'C', 'x':19, 'y':5}\n... ]\n>>> for entry in inputlist:\n...     entry['r'] = min(mxr, calculateRange(x, entry['x'], y, entry['y']))\n... \n>>> inputlist\n[{'town': 'A', 'x': 12, 'r': 14.1, 'y': 13}, {'town': 'B', 'x': 100, 'r': 30, 'y': 43}, {'town': 'C', 'x': 19, 'r': 17.1, 'y': 5}]\n
\n soup wrap:

Just use a loop:

for entry in inputlist:
    entry['r'] = min(mxr, calculateRange(x, entry['x'], y, entry['y']))

Dictionaries are mutable, adding a key is reflected in all references to the dictionary.

Demo:

>>> import math
>>> def calculateRange (x1, x2, y1, y2):
...   squareNumber = math.sqrt(math.pow ((x1-x2),2) + math.pow((y1-y2),2))
...   return round(squareNumber, 1)
...
>>> x = 2
>>> y = 3
>>> mxr = 30
>>> inputlist = [
...    {'town':'A', 'x':12, 'y':13},
...    {'town':'B', 'x':100, 'y':43},
...    {'town':'C', 'x':19, 'y':5}
... ]
>>> for entry in inputlist:
...     entry['r'] = min(mxr, calculateRange(x, entry['x'], y, entry['y']))
... 
>>> inputlist
[{'town': 'A', 'x': 12, 'r': 14.1, 'y': 13}, {'town': 'B', 'x': 100, 'r': 30, 'y': 43}, {'town': 'C', 'x': 19, 'r': 17.1, 'y': 5}]
qid & accept id: (17507325, 17507463) query: Accessing an additional profile from templates soup:

You would simply do:

\n
{{ request.user.userreferralprofile.y }}\n
\n

Or if you have specified a related_name,

\n
class UserReferralProfile(models.Model):\n    x = models.OneToOneField(User, related_name='referal')\n    y = models.CharField()\n
\n

In the template,

\n
{{ request.user.referal.y }}\n
\n soup wrap:

You would simply do:

{{ request.user.userreferralprofile.y }}

Or if you have specified a related_name,

class UserReferralProfile(models.Model):
    x = models.OneToOneField(User, related_name='referal')
    y = models.CharField()

In the template,

{{ request.user.referal.y }}
qid & accept id: (17507841, 17507857) query: How to find combinations of a list in a Dictionary? soup:

Use itertools.permutations() to generate all permutations of a list:

\n
from itertools import permutations\n\nif any(tuple(perm) in yourdictionary for perm in permutations(yourlist)):\n    # match found\n
\n

but you really want to rethink your data structure. If you made your keys frozenset() objects instead, you simply would test for:

\n
if frozenset(yourlist) in yourdictionary:\n    # match found\n
\n

which would be a lot faster.

\n

Demos:

\n
>>> from itertools import permutations\n>>> yourdictionary = {(1,3,2):'text',(3,1,2):'text'}\n>>> yourlist = [1, 2, 3]\n>>> print any(tuple(perm) in yourdictionary for perm in permutations(yourlist))\nTrue\n>>> yourdictionary = {frozenset([1, 2, 3]): 'text', frozenset([4, 5, 6]): 'othertext'}\n>>> frozenset(yourlist) in yourdictionary\nTrue\n>>> frozenset([2, 3]) in yourdictionary\nFalse\n
\n soup wrap:

Use itertools.permutations() to generate all permutations of a list:

from itertools import permutations

if any(tuple(perm) in yourdictionary for perm in permutations(yourlist)):
    # match found

but you really want to rethink your data structure. If you made your keys frozenset() objects instead, you simply would test for:

if frozenset(yourlist) in yourdictionary:
    # match found

which would be a lot faster.

Demos:

>>> from itertools import permutations
>>> yourdictionary = {(1,3,2):'text',(3,1,2):'text'}
>>> yourlist = [1, 2, 3]
>>> print any(tuple(perm) in yourdictionary for perm in permutations(yourlist))
True
>>> yourdictionary = {frozenset([1, 2, 3]): 'text', frozenset([4, 5, 6]): 'othertext'}
>>> frozenset(yourlist) in yourdictionary
True
>>> frozenset([2, 3]) in yourdictionary
False
qid & accept id: (17534484, 17534712) query: Generate random string from regex character set soup:

Paul McGuire, author of Pyparsing, has written an inverse regex parser, with which you could do this:

\n
import invRegex\nprint(''.join(invRegex.invert('[a-z]')))\n# abcdefghijklmnopqrstuvwxyz\n
\n

If you do not want to install Pyparsing, there is also a regex inverter that uses only modules from the standard library with which you could write:

\n
import inverse_regex\nprint(''.join(inverse_regex.ipermute('[a-z]')))\n# abcdefghijklmnopqrstuvwxyz\n
\n

Note: neither module can invert all regex patterns.

\n
\n

And there are differences between the two modules:

\n
import invRegex\nimport inverse_regex\nprint(repr(''.join(invRegex.invert('.'))))\nprint(repr(''.join(inverse_regex.ipermute('.'))))\n
\n

yields

\n
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'\n'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'\n
\n
\n

Here is another difference, this time pyparsing enumerates a larger set of matches:

\n
x = list(invRegex.invert('[a-z][0-9]?.'))\ny = list(inverse_regex.ipermute('[a-z][0-9]?.'))\nprint(len(x))\n# 26884\nprint(len(y))\n# 1100\n
\n
\n soup wrap:

Paul McGuire, author of Pyparsing, has written an inverse regex parser, with which you could do this:

import invRegex
print(''.join(invRegex.invert('[a-z]')))
# abcdefghijklmnopqrstuvwxyz

If you do not want to install Pyparsing, there is also a regex inverter that uses only modules from the standard library with which you could write:

import inverse_regex
print(''.join(inverse_regex.ipermute('[a-z]')))
# abcdefghijklmnopqrstuvwxyz

Note: neither module can invert all regex patterns.


And there are differences between the two modules:

import invRegex
import inverse_regex
print(repr(''.join(invRegex.invert('.'))))
print(repr(''.join(inverse_regex.ipermute('.'))))

yields

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

Here is another difference, this time pyparsing enumerates a larger set of matches:

x = list(invRegex.invert('[a-z][0-9]?.'))
y = list(inverse_regex.ipermute('[a-z][0-9]?.'))
print(len(x))
# 26884
print(len(y))
# 1100

qid & accept id: (17555470, 17555491) query: Check if all elements in nested iterables evaluate to False soup:

Use nested any() calls:

\n
if not any(any(inner) for inner in x):\n
\n

any() returns False only if all of the elements in the iterable passed to it are False. not any() thus is True only if all elements are false:

\n
>>> x = [(None, None, None), (None, None, None), (None, None, None)]\n>>> not any(any(inner) for inner in x)\nTrue\n>>> x = [(None, None, None), (None, None, None), (None, None, 1)]\n>>> not any(any(inner) for inner in x)\nFalse\n
\n soup wrap:

Use nested any() calls:

if not any(any(inner) for inner in x):

any() returns False only if all of the elements in the iterable passed to it are False. not any() thus is True only if all elements are false:

>>> x = [(None, None, None), (None, None, None), (None, None, None)]
>>> not any(any(inner) for inner in x)
True
>>> x = [(None, None, None), (None, None, None), (None, None, 1)]
>>> not any(any(inner) for inner in x)
False
qid & accept id: (17559933, 17578405) query: Complete a task during certain time frames within a python script soup:

There's no way Python can "ignore the while loops" (unless you have some other condition outside of them). But if the while loop tells Python to loop 0 times, it will do exactly that.

\n
\n

First, you've got this loop:

\n
while currenttime > '23:30:00' and currenttime < '23:40:00':\n
\n

… followed by this one:

\n
while currenttime > '23:40:00' and currenttime < '23:50:00':\n
\n

Think about what happens at 23:40:00. It's neither before nor after 23:40:00, so you'll skip the second loop before you even get into it.

\n

While we're at it, two side notes on these lines:

\n
    \n
  • You can just write '23:40:00' < currenttime < '23:50:00' in Python.
  • \n
  • You can use datetime or time objects instead of strings to compare times.
  • \n
\n
\n

Next, you repeatedly do this:

\n
currenttime = strftime('%H:%M:%S')\nprint ("""23:40:00 to 23:50:00 | %s""" % (currenttime))\nsleep(1)\n
\n

This means that currenttime is never actually the current time. It's usually a second ago. Think about what that does to your loop conditions at the edges.

\n

As a side note, sleep(1) isn't guaranteed to sleep exactly one second. If your computer is busy, or decides it wants to go into low-power mode, it can go significantly longer than a second. If interrupts are flying, it can go shorter than a second. Even in the best case, it'll often by off by half a clock tick in one direction or the other. So, if you need this to fire exactly 600 times, it's generally not going to do that.

\n
\n

Meanwhile, you've got this:

\n
global clearscreen\n
\n

Obviously there's no way anyone can ever change this in your current code, so presumably in your real code you're attempting to change it from another thread. But you don't have a lock on it. So, it's perfectly possible that you will not see a change immediately, or even ever.

\n
\n

Writing schedulers is a lot harder than it looks. That's why you're usually better off using an existing one. Options include:

\n
    \n
  • The stdlib sched module.
  • \n
  • The stdlib threading.Timer.
  • \n
  • The stdlib signal.setitemer. Only on platforms with real signals (meaning not Windows), and possibly not appropriate on some platforms if you're using threads or fork.
  • \n
  • Various third-party modules on PyPI/recipes on ActiveState that give you a better Timer (e.g., using a single timer thread with a queue of upcoming jobs, rather than a thread for each job).
  • \n
  • An event loop framework that handles timers—although this is probably overkill if scheduling is all you need, if you have some other reason to use Twisted or wx or PyGame or gevent, let it do the scheduling.
  • \n
  • An external timer that runs your script—cron on any Unix, LaunchServices on Mac and some other Unixes, Scheduled Tasks on Windows, etc. Probably not appropriate for running a task every second, but from your comments, it sounds like your real need is "to be able to call a function every 3 mins +- 60 seconds."
  • \n
\n
\n

Since you specifically asked about sched… There are two approaches:

\n
    \n
  1. Build the whole day's schedule at once and call enterabs repeatedly to fill it with the day's tasks, plus one more task that runs at midnight tomorrow and does the same thing.
  2. \n
  3. Write a function that figures out, based on the current time, which task to schedule next and when, and does so. Call that function after the actual work.
  4. \n
\n

Here's what the first one looks like:

\n
import sched\nimport datetime\nimport time\n\ns = sched.scheduler(time.time, time.sleep)\n\ndef dotoday():\n    now = datetime.date.now()\n    stime = now.time()\n\n    # Schedule "first" every 3 minutes from 22:00 to 22:57\n    if stime < datetime.time(22, 0):\n        stime = datetime.time(22, 0)\n    while stime <= datetime.time(22, 57):\n        s.enterabs(stime, 1, first, ())\n        stime += datetime.timedelta(0, 180)\n\n    # Schedule "second" every 3 minutes from 23:00 to 23:57\n    stime = datetime.time(23, 0)\n    while stime <= datetime.time(23, 57):\n        s.enterabs(stime, 1, second, ())\n        stime += datetime.timedelta(0, 180)\n\n    # Schedule "dotoday" to run tomorrow\n    midnight = now.replace(hour=0, minute=0, second=0)\n    tomorrow = midnight + datetime.timedelta(1, 0)\n    s.enterabs(tomorrow, 1, dotoday, ())\n\ndotoday()\ns.run()\n
\n

I made this a bit more complicated than necessary, so that you can start it at 23:31:17 and it will run the first batch of tasks at 23:31:17, 23:34:17, etc. instead of waiting until 23:33:00, 23:36:00, etc.

\n soup wrap:

There's no way Python can "ignore the while loops" (unless you have some other condition outside of them). But if the while loop tells Python to loop 0 times, it will do exactly that.


First, you've got this loop:

while currenttime > '23:30:00' and currenttime < '23:40:00':

… followed by this one:

while currenttime > '23:40:00' and currenttime < '23:50:00':

Think about what happens at 23:40:00. It's neither before nor after 23:40:00, so you'll skip the second loop before you even get into it.

While we're at it, two side notes on these lines:

  • You can just write '23:40:00' < currenttime < '23:50:00' in Python.
  • You can use datetime or time objects instead of strings to compare times.

Next, you repeatedly do this:

currenttime = strftime('%H:%M:%S')
print ("""23:40:00 to 23:50:00 | %s""" % (currenttime))
sleep(1)

This means that currenttime is never actually the current time. It's usually a second ago. Think about what that does to your loop conditions at the edges.

As a side note, sleep(1) isn't guaranteed to sleep exactly one second. If your computer is busy, or decides it wants to go into low-power mode, it can go significantly longer than a second. If interrupts are flying, it can go shorter than a second. Even in the best case, it'll often by off by half a clock tick in one direction or the other. So, if you need this to fire exactly 600 times, it's generally not going to do that.


Meanwhile, you've got this:

global clearscreen

Obviously there's no way anyone can ever change this in your current code, so presumably in your real code you're attempting to change it from another thread. But you don't have a lock on it. So, it's perfectly possible that you will not see a change immediately, or even ever.


Writing schedulers is a lot harder than it looks. That's why you're usually better off using an existing one. Options include:

  • The stdlib sched module.
  • The stdlib threading.Timer.
  • The stdlib signal.setitemer. Only on platforms with real signals (meaning not Windows), and possibly not appropriate on some platforms if you're using threads or fork.
  • Various third-party modules on PyPI/recipes on ActiveState that give you a better Timer (e.g., using a single timer thread with a queue of upcoming jobs, rather than a thread for each job).
  • An event loop framework that handles timers—although this is probably overkill if scheduling is all you need, if you have some other reason to use Twisted or wx or PyGame or gevent, let it do the scheduling.
  • An external timer that runs your script—cron on any Unix, LaunchServices on Mac and some other Unixes, Scheduled Tasks on Windows, etc. Probably not appropriate for running a task every second, but from your comments, it sounds like your real need is "to be able to call a function every 3 mins +- 60 seconds."

Since you specifically asked about sched… There are two approaches:

  1. Build the whole day's schedule at once and call enterabs repeatedly to fill it with the day's tasks, plus one more task that runs at midnight tomorrow and does the same thing.
  2. Write a function that figures out, based on the current time, which task to schedule next and when, and does so. Call that function after the actual work.

Here's what the first one looks like:

import sched
import datetime
import time

s = sched.scheduler(time.time, time.sleep)

def dotoday():
    now = datetime.date.now()
    stime = now.time()

    # Schedule "first" every 3 minutes from 22:00 to 22:57
    if stime < datetime.time(22, 0):
        stime = datetime.time(22, 0)
    while stime <= datetime.time(22, 57):
        s.enterabs(stime, 1, first, ())
        stime += datetime.timedelta(0, 180)

    # Schedule "second" every 3 minutes from 23:00 to 23:57
    stime = datetime.time(23, 0)
    while stime <= datetime.time(23, 57):
        s.enterabs(stime, 1, second, ())
        stime += datetime.timedelta(0, 180)

    # Schedule "dotoday" to run tomorrow
    midnight = now.replace(hour=0, minute=0, second=0)
    tomorrow = midnight + datetime.timedelta(1, 0)
    s.enterabs(tomorrow, 1, dotoday, ())

dotoday()
s.run()

I made this a bit more complicated than necessary, so that you can start it at 23:31:17 and it will run the first batch of tasks at 23:31:17, 23:34:17, etc. instead of waiting until 23:33:00, 23:36:00, etc.

qid & accept id: (17561901, 17562000) query: Pick values only below a certain threshold soup:

The following solution assumes that the data was read in as a list of tuples.

\n

Ex:

\n
[(1,5.2),\n(2,1.43),\n(3,3.54),\n(4,887),\n(5,0.35)]\n
\n

would be the list for the sample data in the problem.

\n
def cutoff(threshold, data):\n    sortedData = sorted(data, key=lambda x: x[1])\n    finalList = filter(lambda x: x[1] < threshold, sortedData)\n    return finalList if len(finalList) > 2 else 'No values found'\n
\n

The first line of the function sorts the list by the values in the second place of the tuple.

\n

The second line of the function then filters that resulting list so that only the elements in which the values are below the threshold remain.

\n

The third line then returns the resulting sorted list if it contains more than two elements, and 'No values found' otherwise, which should accomplish what you're trying to do, less the file input.

\n soup wrap:

The following solution assumes that the data was read in as a list of tuples.

Ex:

[(1,5.2),
(2,1.43),
(3,3.54),
(4,887),
(5,0.35)]

would be the list for the sample data in the problem.

def cutoff(threshold, data):
    sortedData = sorted(data, key=lambda x: x[1])
    finalList = filter(lambda x: x[1] < threshold, sortedData)
    return finalList if len(finalList) > 2 else 'No values found'

The first line of the function sorts the list by the values in the second place of the tuple.

The second line of the function then filters that resulting list so that only the elements in which the values are below the threshold remain.

The third line then returns the resulting sorted list if it contains more than two elements, and 'No values found' otherwise, which should accomplish what you're trying to do, less the file input.

qid & accept id: (17598881, 17599081) query: Using df.apply() with a Pandas MuliIndex / carrying out operations on hierarchical index rows? soup:

Create the data

\n
In [12]: df = DataFrame(randn(10,4),columns=list('ABCD'))\n\nIn [13]: df['year'] = 2003\n\nIn [14]: df['id'] = [12,34,12,34,72,0,38,53,70,70]\nIn [16]: df.loc[:5,'year'] = 2004\n\nIn [17]: df\nOut[17]: \n          A         B         C         D  year  id\n0 -1.917262  0.228599 -0.463695  0.776567  2004  12\n1  2.064658 -0.716104 -1.399685  0.402077  2004  34\n2 -1.282627  0.338368  0.757658 -0.114086  2004  12\n3  1.190319 -1.592282  0.942431 -0.778128  2004  34\n4  1.928094  0.532387 -0.352155 -0.039304  2004  72\n5  0.535093 -1.655569 -0.309651  0.438992  2004   0\n6  0.332428 -0.427696 -1.324072  2.158907  2003  38\n7 -1.343306 -0.288373  0.544344 -1.361189  2003  53\n8  0.959273 -0.420134  0.691108 -0.469833  2003  70\n9  0.692352  0.101226 -0.161140 -0.100968  2003  70\n
\n

Groupby year and id, then mean

\n
In [21]: df.groupby(['year','id']).mean()\nOut[21]: \n                A         B         C         D\nyear id                                        \n2003 38  0.332428 -0.427696 -1.324072  2.158907\n     53 -1.343306 -0.288373  0.544344 -1.361189\n     70  0.825812 -0.159454  0.264984 -0.285401\n2004 0   0.535093 -1.655569 -0.309651  0.438992\n     12 -1.599944  0.283483  0.146981  0.331241\n     34  1.627488 -1.154193 -0.228627 -0.188025\n     72  1.928094  0.532387 -0.352155 -0.039304\n
\n

By year mean

\n
In [24]: df.groupby(['year']).mean()\nOut[24]: \n             A         B         C         D         id\nyear                                                   \n2003  0.160187 -0.258744 -0.062440  0.056729  57.750000\n2004  0.419713 -0.477434 -0.137516  0.114353  27.333333\n
\n

By id

\n
In [25]: df.groupby(['id']).mean()\nOut[25]: \n           A         B         C         D  year\nid                                              \n0   0.535093 -1.655569 -0.309651  0.438992  2004\n12 -1.599944  0.283483  0.146981  0.331241  2004\n34  1.627488 -1.154193 -0.228627 -0.188025  2004\n38  0.332428 -0.427696 -1.324072  2.158907  2003\n53 -1.343306 -0.288373  0.544344 -1.361189  2003\n70  0.825812 -0.159454  0.264984 -0.285401  2003\n72  1.928094  0.532387 -0.352155 -0.039304  2004\n
\n soup wrap:

Create the data

In [12]: df = DataFrame(randn(10,4),columns=list('ABCD'))

In [13]: df['year'] = 2003

In [14]: df['id'] = [12,34,12,34,72,0,38,53,70,70]
In [16]: df.loc[:5,'year'] = 2004

In [17]: df
Out[17]: 
          A         B         C         D  year  id
0 -1.917262  0.228599 -0.463695  0.776567  2004  12
1  2.064658 -0.716104 -1.399685  0.402077  2004  34
2 -1.282627  0.338368  0.757658 -0.114086  2004  12
3  1.190319 -1.592282  0.942431 -0.778128  2004  34
4  1.928094  0.532387 -0.352155 -0.039304  2004  72
5  0.535093 -1.655569 -0.309651  0.438992  2004   0
6  0.332428 -0.427696 -1.324072  2.158907  2003  38
7 -1.343306 -0.288373  0.544344 -1.361189  2003  53
8  0.959273 -0.420134  0.691108 -0.469833  2003  70
9  0.692352  0.101226 -0.161140 -0.100968  2003  70

Groupby year and id, then mean

In [21]: df.groupby(['year','id']).mean()
Out[21]: 
                A         B         C         D
year id                                        
2003 38  0.332428 -0.427696 -1.324072  2.158907
     53 -1.343306 -0.288373  0.544344 -1.361189
     70  0.825812 -0.159454  0.264984 -0.285401
2004 0   0.535093 -1.655569 -0.309651  0.438992
     12 -1.599944  0.283483  0.146981  0.331241
     34  1.627488 -1.154193 -0.228627 -0.188025
     72  1.928094  0.532387 -0.352155 -0.039304

By year mean

In [24]: df.groupby(['year']).mean()
Out[24]: 
             A         B         C         D         id
year                                                   
2003  0.160187 -0.258744 -0.062440  0.056729  57.750000
2004  0.419713 -0.477434 -0.137516  0.114353  27.333333

By id

In [25]: df.groupby(['id']).mean()
Out[25]: 
           A         B         C         D  year
id                                              
0   0.535093 -1.655569 -0.309651  0.438992  2004
12 -1.599944  0.283483  0.146981  0.331241  2004
34  1.627488 -1.154193 -0.228627 -0.188025  2004
38  0.332428 -0.427696 -1.324072  2.158907  2003
53 -1.343306 -0.288373  0.544344 -1.361189  2003
70  0.825812 -0.159454  0.264984 -0.285401  2003
72  1.928094  0.532387 -0.352155 -0.039304  2004
qid & accept id: (17636790, 17640380) query: How to set a date restriction for returned events in Google Calendar and put them in order - Python soup:

Python datetime objects are very easy to work with. For instance, you could collect your event and datetime information into a list of tuples, like this:

\n
all_events = []\nfor :\n    all_events.append((dateutil.parser.parse(event['start']['date']), event['summary']))\n
\n

And then you can filter and sort those tuples, using the fact that you can add/subtract datetimes and get timedelta objects. You can simply compare to a timedelta object which represents a difference of 30 days:

\n
from datetime import datetime, timedelta\n\nmax_td = timedelta(days=30)\nnow = datetime.now()\n\n# Remove events that are too far into the future\nfiltered_events = filter(lambda e: e[0] - now <= max_td, all_events)\n\n# Sort events in ascending order of start time\nfiltered_events.sort()\n
\n soup wrap:

Python datetime objects are very easy to work with. For instance, you could collect your event and datetime information into a list of tuples, like this:

all_events = []
for :
    all_events.append((dateutil.parser.parse(event['start']['date']), event['summary']))

And then you can filter and sort those tuples, using the fact that you can add/subtract datetimes and get timedelta objects. You can simply compare to a timedelta object which represents a difference of 30 days:

from datetime import datetime, timedelta

max_td = timedelta(days=30)
now = datetime.now()

# Remove events that are too far into the future
filtered_events = filter(lambda e: e[0] - now <= max_td, all_events)

# Sort events in ascending order of start time
filtered_events.sort()
qid & accept id: (17637244, 17639411) query: voronoi and lloyd relaxation using python/scipy soup:

You have a region you know, a point you don't, and you know that vor.point_region[point] == region. For a single region, you can figure out the corresponding point as:

\n
point = np.argwhere(vor.point_region == region)\n
\n

You can also create a region_point indexing array to figure out multiple points from an array of regions as:

\n
region_point = np.argsort(vor.point_region)\npoints = region_point[regions-1]\n
\n soup wrap:

You have a region you know, a point you don't, and you know that vor.point_region[point] == region. For a single region, you can figure out the corresponding point as:

point = np.argwhere(vor.point_region == region)

You can also create a region_point indexing array to figure out multiple points from an array of regions as:

region_point = np.argsort(vor.point_region)
points = region_point[regions-1]
qid & accept id: (17641195, 17730447) query: fabric cleanup operation with atexit soup:

I also have the same problem. Next Code is not ideal, but I have an implementation like this currently.

\n

fabfile.py

\n
from functools import wraps\nfrom fabric.network import needs_host\nfrom fabric.api import run, env\n\ndef runs_final(func):\n    @wraps(func)\n    def decorated(*args, **kwargs):\n        if env.host_string == env.all_hosts[-1]:\n            return func(*args, **kwargs)\n        else:\n            return None\n    return decorated\n\n@needs_host\ndef hello():\n    run('hostname')\n    atexit()\n\n@runs_final\ndef atexit():\n    print ('this is at exit command.')\n
\n

Result:

\n
fabric$ fab hello -H web01,web02\n>[web01] Executing task 'hello'\n>[web01] run: hostname\n>[web01] out: web01\n>[web01] out: \n>[web02] Executing task 'hello'\n>[web02] run: hostname\n>[web02] out: web02\n>[web02] out: \n>\n>this is at exit command.\n>\n>Done.\n
\n soup wrap:

I also have the same problem. Next Code is not ideal, but I have an implementation like this currently.

fabfile.py

from functools import wraps
from fabric.network import needs_host
from fabric.api import run, env

def runs_final(func):
    @wraps(func)
    def decorated(*args, **kwargs):
        if env.host_string == env.all_hosts[-1]:
            return func(*args, **kwargs)
        else:
            return None
    return decorated

@needs_host
def hello():
    run('hostname')
    atexit()

@runs_final
def atexit():
    print ('this is at exit command.')

Result:

fabric$ fab hello -H web01,web02
>[web01] Executing task 'hello'
>[web01] run: hostname
>[web01] out: web01
>[web01] out: 
>[web02] Executing task 'hello'
>[web02] run: hostname
>[web02] out: web02
>[web02] out: 
>
>this is at exit command.
>
>Done.
qid & accept id: (17659626, 17660067) query: Joining fields values soup:

To make the classes modify their class attributes at class-definition time (without boilerplate code inside each class definition), you'll need a class decorator or a metaclass.

\n

If you use a class decorator, then you'll have to decorate each class individually.

\n

If you use a metaclass, class A's metaclass will be inherited by class B and class C, so you will only have to modify class A:

\n
class MetaA(type):\n    def __init__(cls, name, bases, clsdict):\n        super(MetaA, cls).__init__(name, bases, clsdict)\n        for base in bases:\n            if hasattr(base, 'array'):\n                cls.array = base.array + cls.array\n                break\n\nclass A(object):\n    __metaclass__ = MetaA\n    array = [1]\n\n    def __init__(self):\n        pass\n\nclass B(A):\n    array = [2, 3]\n\n    def __init__(self):\n        super(B, self).__init__()\n\nclass C(B):\n    array = [4]\n\n    def __init__(self):\n        super(C, self).__init__()\n
\n

yields

\n
print(A.array)\n# [1]\n\nprint(B.array)\n# [1, 2, 3]\n\nprint(C.array)\n# [1, 2, 3, 4]\n
\n soup wrap:

To make the classes modify their class attributes at class-definition time (without boilerplate code inside each class definition), you'll need a class decorator or a metaclass.

If you use a class decorator, then you'll have to decorate each class individually.

If you use a metaclass, class A's metaclass will be inherited by class B and class C, so you will only have to modify class A:

class MetaA(type):
    def __init__(cls, name, bases, clsdict):
        super(MetaA, cls).__init__(name, bases, clsdict)
        for base in bases:
            if hasattr(base, 'array'):
                cls.array = base.array + cls.array
                break

class A(object):
    __metaclass__ = MetaA
    array = [1]

    def __init__(self):
        pass

class B(A):
    array = [2, 3]

    def __init__(self):
        super(B, self).__init__()

class C(B):
    array = [4]

    def __init__(self):
        super(C, self).__init__()

yields

print(A.array)
# [1]

print(B.array)
# [1, 2, 3]

print(C.array)
# [1, 2, 3, 4]
qid & accept id: (17687453, 17687543) query: python construct a dictionary {0: [0, 0, 0], 1: [0, 0, 1], 2: [0, 0, 2], 3: [0, 0, 3], ...,999: [9, 9, 9]} soup:
alphabet =  range(10)\nbase = 10\ndict((x*base**2+y*base+z,(x,y,z)) for x in alphabet \n                                  for y in alphabet \n                                  for z in alphabet )\n
\n

is what you want ... i think

\n
alphabet =  range(2)\nbase = 2\ndict((x*base**2+y*base+z,(x,y,z)) for x in alphabet \n                                  for y in alphabet \n                                  for z in alphabet )\n
\n

generates

\n
{0: (0, 0, 0), 1: (0, 0, 1), 2: (0, 1, 0), 3: (0, 1, 1), 4: (1, 0, 0), 5: (1, 0, 1), 6: (1, 1, 0), 7: (1, 1, 1)}\n
\n soup wrap:
alphabet =  range(10)
base = 10
dict((x*base**2+y*base+z,(x,y,z)) for x in alphabet 
                                  for y in alphabet 
                                  for z in alphabet )

is what you want ... i think

alphabet =  range(2)
base = 2
dict((x*base**2+y*base+z,(x,y,z)) for x in alphabet 
                                  for y in alphabet 
                                  for z in alphabet )

generates

{0: (0, 0, 0), 1: (0, 0, 1), 2: (0, 1, 0), 3: (0, 1, 1), 4: (1, 0, 0), 5: (1, 0, 1), 6: (1, 1, 0), 7: (1, 1, 1)}
qid & accept id: (17700964, 17722525) query: Getting World Coordinates with mouse in pygame soup:

The camera calculates the screen coordinate by a given world coordinate.

\n

Since the mouse position is already a screen coordinate, if you want to get the tile under the mouse, you have to substract the offset, not add it.

\n

You can add the following method to the Camera class:

\n
def reverse(self, pos):\n    """Gets the world coordinates by screen coordinates"""\n    return (pos[0] - self.state.left, pos[1] - self.state.top)\n
\n

and use it like this:

\n
    mouse_pos = camera.reverse(pygame.mouse.get_pos())\n    if hit_block:\n        replace_block(mouse_pos)\n
\n soup wrap:

The camera calculates the screen coordinate by a given world coordinate.

Since the mouse position is already a screen coordinate, if you want to get the tile under the mouse, you have to substract the offset, not add it.

You can add the following method to the Camera class:

def reverse(self, pos):
    """Gets the world coordinates by screen coordinates"""
    return (pos[0] - self.state.left, pos[1] - self.state.top)

and use it like this:

    mouse_pos = camera.reverse(pygame.mouse.get_pos())
    if hit_block:
        replace_block(mouse_pos)
qid & accept id: (17711212, 17711267) query: Naming each item in a list which is a value of a dictionary soup:

If you're just trying to print that:

\n
for k, v in dct.iteritems():\n    print repr(k)+ ":(" + ", ".join("Country{}:{}".format(i,c) for i,c in enumerate(v, start=1)) + ")"\n
\n

Output:

\n
'Europe':(Country1:Germany, Country2:France, Country3:Italy)\n'Asia':(Country1:India, Country2:China, Country3:Malaysia)\n
\n

Note: I'm abusing the function of repr() to get the quotes in there. You could just as well do "'" + str(k) + "'".

\n

The reason why your code doesn't work is your use of : outside of a dictionary initialization or comprehension. That is, you can do d = {'a':'b'} but you can't do print 'a':'b'. Also, you shouldn't use dict as a variable name, because it is a keyword.

\n

My solution will work for tuples which have more (or even less) than 3 elements in them, too.

\n soup wrap:

If you're just trying to print that:

for k, v in dct.iteritems():
    print repr(k)+ ":(" + ", ".join("Country{}:{}".format(i,c) for i,c in enumerate(v, start=1)) + ")"

Output:

'Europe':(Country1:Germany, Country2:France, Country3:Italy)
'Asia':(Country1:India, Country2:China, Country3:Malaysia)

Note: I'm abusing the function of repr() to get the quotes in there. You could just as well do "'" + str(k) + "'".

The reason why your code doesn't work is your use of : outside of a dictionary initialization or comprehension. That is, you can do d = {'a':'b'} but you can't do print 'a':'b'. Also, you shouldn't use dict as a variable name, because it is a keyword.

My solution will work for tuples which have more (or even less) than 3 elements in them, too.

qid & accept id: (17734781, 17736780) query: Possibility of writing dictionary items in columns soup:
import csv\n\nmydict = {('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df'):\n          [5.998999999999998, 0.0013169999, 4.0000000000000972],\n          ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de'):\n          [7.89899999, 0.15647999999675390, 8.764380000972, 9.200000000]}\n\nwith open('dict.csv', 'wb') as file:\n    writer = csv.writer(file, delimiter='\t')\n    writer.writerow(mydict.keys())\n    for row in zip(*mydict.values()):\n        writer.writerow(list(row))\n
\n

Output file dict.csv:

\n
('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df')  ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de')\n5.998999999999998   7.89899999\n0.0013169999    0.1564799999967539\n4.000000000000097   8.764380000972\n
\n soup wrap:
import csv

mydict = {('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df'):
          [5.998999999999998, 0.0013169999, 4.0000000000000972],
          ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de'):
          [7.89899999, 0.15647999999675390, 8.764380000972, 9.200000000]}

with open('dict.csv', 'wb') as file:
    writer = csv.writer(file, delimiter='\t')
    writer.writerow(mydict.keys())
    for row in zip(*mydict.values()):
        writer.writerow(list(row))

Output file dict.csv:

('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df')  ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de')
5.998999999999998   7.89899999
0.0013169999    0.1564799999967539
4.000000000000097   8.764380000972
qid & accept id: (17762515, 17762787) query: Best way to make a counter based on time soup:

If you never access the variable, then even if it increased over time it wouldn't be very useful. So what you really want is something you can access in some way which will give you an answer as if it were increasing over time. But to do that, all you need to do is subtract the current time from when you started.

\n

For example, you could use something like this:

\n
import time\n\ndef make_ticker():\n    start = time.time()\n    def elapsed():\n        now = time.time()\n        return now-start\n    return elapsed\n
\n

And then after starting it off, all you need to do is write a() instead of a:

\n
>>> a = make_ticker()\n>>> a()\n3.3126659393310547\n>>> a()\n5.144495010375977\n>>> a()\n7.766999006271362\n
\n

Similarly, if you want to count the number of 2-second periods that have elapsed:

\n
def make_ticker(seconds):\n    start = time.time()\n    def elapsed():\n        now = time.time()\n        return (now-start)//seconds\n    return elapsed\n\n>>> a = make_ticker(2)\n>>> a()\n0.0\n>>> a()\n1.0\n>>> a()\n1.0\n>>> a()\n2.0\n
\n

(You can trivially modify to start at 1.)

\n soup wrap:

If you never access the variable, then even if it increased over time it wouldn't be very useful. So what you really want is something you can access in some way which will give you an answer as if it were increasing over time. But to do that, all you need to do is subtract the current time from when you started.

For example, you could use something like this:

import time

def make_ticker():
    start = time.time()
    def elapsed():
        now = time.time()
        return now-start
    return elapsed

And then after starting it off, all you need to do is write a() instead of a:

>>> a = make_ticker()
>>> a()
3.3126659393310547
>>> a()
5.144495010375977
>>> a()
7.766999006271362

Similarly, if you want to count the number of 2-second periods that have elapsed:

def make_ticker(seconds):
    start = time.time()
    def elapsed():
        now = time.time()
        return (now-start)//seconds
    return elapsed

>>> a = make_ticker(2)
>>> a()
0.0
>>> a()
1.0
>>> a()
1.0
>>> a()
2.0

(You can trivially modify to start at 1.)

qid & accept id: (17774547, 17774582) query: Django how to update more than a row field at once soup:

Sure!

\n
t.value1 = 1\nt.value2 = 2\nt.save()\n
\n

Alternatively,

\n
TheForm.objects.filter(id=1).update(value=1, value2=2)\n
\n

(And you could use **kwargs here)

\n soup wrap:

Sure!

t.value1 = 1
t.value2 = 2
t.save()

Alternatively,

TheForm.objects.filter(id=1).update(value=1, value2=2)

(And you could use **kwargs here)

qid & accept id: (17778394, 17778786) query: List Highest Correlation Pairs from a Large Correlation Matrix in Pandas? soup:

You can use DataFrame.values to get an numpy array of the data and then use NumPy functions such as argsort() to get the most correlated pairs.

\n

But if you want to do this in pandas, you can unstack and order the DataFrame:

\n
import pandas as pd\nimport numpy as np\n\nshape = (50, 4460)\n\ndata = np.random.normal(size=shape)\n\ndata[:, 1000] += data[:, 2000]\n\ndf = pd.DataFrame(data)\n\nc = df.corr().abs()\n\ns = c.unstack()\nso = s.order(kind="quicksort")\n\nprint so[-4470:-4460]\n
\n

Here is the output:

\n
2192  1522    0.636198\n1522  2192    0.636198\n3677  2027    0.641817\n2027  3677    0.641817\n242   130     0.646760\n130   242     0.646760\n1171  2733    0.670048\n2733  1171    0.670048\n1000  2000    0.742340\n2000  1000    0.742340\ndtype: float64\n
\n soup wrap:

You can use DataFrame.values to get an numpy array of the data and then use NumPy functions such as argsort() to get the most correlated pairs.

But if you want to do this in pandas, you can unstack and order the DataFrame:

import pandas as pd
import numpy as np

shape = (50, 4460)

data = np.random.normal(size=shape)

data[:, 1000] += data[:, 2000]

df = pd.DataFrame(data)

c = df.corr().abs()

s = c.unstack()
so = s.order(kind="quicksort")

print so[-4470:-4460]

Here is the output:

2192  1522    0.636198
1522  2192    0.636198
3677  2027    0.641817
2027  3677    0.641817
242   130     0.646760
130   242     0.646760
1171  2733    0.670048
2733  1171    0.670048
1000  2000    0.742340
2000  1000    0.742340
dtype: float64
qid & accept id: (17795362, 17797658) query: MongoDB data Posting soup:

Just take a list (of dicts):

\n
def insert_names(self, names):\n    # `names` is a list of dicts\n\n    inserted_ids = []\n\n    for name in names:\n        name['created'] = datetime.datetime.now()\n        _id = self.myorders.insert(name)\n        inserted_ids.append(_id)\n\n    return inserted_ids\n
\n

Call insert_names with a list that you generate from your html form.

\n
names = [\n    {'name': name0, 'qty': qty0, ...}\n    {'name': name1, 'qty': qty1, ...}\n    {'name': name2, 'qty': qty2, ...}\n    ...\n]\ninsert_names(names)\n
\n soup wrap:

Just take a list (of dicts):

def insert_names(self, names):
    # `names` is a list of dicts

    inserted_ids = []

    for name in names:
        name['created'] = datetime.datetime.now()
        _id = self.myorders.insert(name)
        inserted_ids.append(_id)

    return inserted_ids

Call insert_names with a list that you generate from your html form.

names = [
    {'name': name0, 'qty': qty0, ...}
    {'name': name1, 'qty': qty1, ...}
    {'name': name2, 'qty': qty2, ...}
    ...
]
insert_names(names)
qid & accept id: (17799504, 17799592) query: How to check if elements of a list are in a string soup:

The builtin any() function can help you here:

\n
black_list = ["ab:", "cd:", "ef:", "gh:"]\n\nfor line in some_file:\n    if ":" in line and not any(x in line for x in black_list):\n        pass\n
\n

It's also possible to get the same effect with all():

\n
for line in some_file:\n    if ":" in line and all(x not in line for x in black_list):\n        pass\n
\n

... but I think the first is closer to English, so easier to follow.

\n soup wrap:

The builtin any() function can help you here:

black_list = ["ab:", "cd:", "ef:", "gh:"]

for line in some_file:
    if ":" in line and not any(x in line for x in black_list):
        pass

It's also possible to get the same effect with all():

for line in some_file:
    if ":" in line and all(x not in line for x in black_list):
        pass

... but I think the first is closer to English, so easier to follow.

qid & accept id: (17809274, 17810466) query: Combine multiple heatmaps in matplotlib soup:

There are a few options to present 2 datasets together:

\n

Options 1 - draw a heatmap of the difference of 2 datasets (or ratio, whatever is more appropriate in your case)

\n
pcolor(D2-D1)\n
\n

and then present several of these comparison figures.

\n

Option 2 - present 1 dataset as pcolor, and another as countour:

\n
pcolor(D1)\ncontour(D2)\n
\n

If you really need to show N>2 datasets together, I would go with contour or contourf:

\n
contourf(D1,cmap='Blues')\ncontourf(D2,cmap='Reds', alpha=0.66)\ncontourf(D2,cmap='Reds', alpha=0.33)\n
\n

example output of 3 contourf commands

\n

or

\n
contour(D1,cmap='Blues')\ncontour(D2,cmap='Reds')\ncontour(D2,cmap='Reds')\n
\n

example output of 3 contour commands

\n

unfortunately, simiar alpha tricks do not work well with pcolor.

\n soup wrap:

There are a few options to present 2 datasets together:

Options 1 - draw a heatmap of the difference of 2 datasets (or ratio, whatever is more appropriate in your case)

pcolor(D2-D1)

and then present several of these comparison figures.

Option 2 - present 1 dataset as pcolor, and another as countour:

pcolor(D1)
contour(D2)

If you really need to show N>2 datasets together, I would go with contour or contourf:

contourf(D1,cmap='Blues')
contourf(D2,cmap='Reds', alpha=0.66)
contourf(D2,cmap='Reds', alpha=0.33)

example output of 3 contourf commands

or

contour(D1,cmap='Blues')
contour(D2,cmap='Reds')
contour(D2,cmap='Reds')

example output of 3 contour commands

unfortunately, simiar alpha tricks do not work well with pcolor.

qid & accept id: (17811168, 17811816) query: Fast way to find index of array in array of arrays soup:

This is Jaime's idea, I just love it:

\n
import numpy as np\n\ndef asvoid(arr):\n    """View the array as dtype np.void (bytes)\n    This collapses ND-arrays to 1D-arrays, so you can perform 1D operations on them.\n    https://stackoverflow.com/a/16216866/190597 (Jaime)"""    \n    arr = np.ascontiguousarray(arr)\n    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))\n\ndef find_index(arr, x):\n    arr_as1d = asvoid(arr)\n    x = asvoid(x)\n    return np.nonzero(arr_as1d == x)[0]\n\n\narr = np.array([[  1,  15,   0,   0],\n                [ 30,  10,   0,   0],\n                [ 30,  20,   0,   0],\n                [1, 2, 3, 4],\n                [104, 139, 146,  75],\n                [  9,  11, 146,  74],\n                [  9, 138, 146,  75]], dtype='uint8')\n\narr = np.tile(arr,(1221488,1))\nx = np.array([1,2,3,4], dtype='uint8')\n\nprint(find_index(arr, x))\n
\n

yields

\n
[      3      10      17 ..., 8550398 8550405 8550412]\n
\n
\n

The idea is to view each row of the array as a string. For example,

\n
In [15]: x\nOut[15]: \narray([^A^B^C^D], \n      dtype='|V4')\n
\n

The strings look like garbage, but they are really just the underlying data in each row viewed as bytes. You can then compare arr_as1d == x to find which rows equal x.

\n
\n

There is another way to do it:

\n
def find_index2(arr, x):\n    return np.where((arr == x).all(axis=1))[0]\n
\n

but it turns out to be not as fast:

\n
In [34]: %timeit find_index(arr, x)\n1 loops, best of 3: 209 ms per loop\n\nIn [35]: %timeit find_index2(arr, x)\n1 loops, best of 3: 370 ms per loop\n
\n soup wrap:

This is Jaime's idea, I just love it:

import numpy as np

def asvoid(arr):
    """View the array as dtype np.void (bytes)
    This collapses ND-arrays to 1D-arrays, so you can perform 1D operations on them.
    https://stackoverflow.com/a/16216866/190597 (Jaime)"""    
    arr = np.ascontiguousarray(arr)
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))

def find_index(arr, x):
    arr_as1d = asvoid(arr)
    x = asvoid(x)
    return np.nonzero(arr_as1d == x)[0]


arr = np.array([[  1,  15,   0,   0],
                [ 30,  10,   0,   0],
                [ 30,  20,   0,   0],
                [1, 2, 3, 4],
                [104, 139, 146,  75],
                [  9,  11, 146,  74],
                [  9, 138, 146,  75]], dtype='uint8')

arr = np.tile(arr,(1221488,1))
x = np.array([1,2,3,4], dtype='uint8')

print(find_index(arr, x))

yields

[      3      10      17 ..., 8550398 8550405 8550412]

The idea is to view each row of the array as a string. For example,

In [15]: x
Out[15]: 
array([^A^B^C^D], 
      dtype='|V4')

The strings look like garbage, but they are really just the underlying data in each row viewed as bytes. You can then compare arr_as1d == x to find which rows equal x.


There is another way to do it:

def find_index2(arr, x):
    return np.where((arr == x).all(axis=1))[0]

but it turns out to be not as fast:

In [34]: %timeit find_index(arr, x)
1 loops, best of 3: 209 ms per loop

In [35]: %timeit find_index2(arr, x)
1 loops, best of 3: 370 ms per loop
qid & accept id: (17866724, 17866768) query: Python, logging print statements while having them print to stdout soup:

You can add this to your script:

\n
import sys\nsys.stdout = open('logfile', 'w')\n
\n

This will make the print statements write to logfile.

\n

If you want the option of printing to stdout and a file, you can try this:

\n
class Tee(object):\n    def __init__(self, *files):\n        self.files = files\n    def write(self, obj):\n        for f in self.files:\n            f.write(obj)\n\nf = open('logfile', 'w')\nbackup = sys.stdout\nsys.stdout = Tee(sys.stdout, f)\n\nprint "hello world"  # this should appear in stdout and in file\n
\n

To revert to just printing to console, just restore the "backup"

\n
sys.stdout = backup\n
\n soup wrap:

You can add this to your script:

import sys
sys.stdout = open('logfile', 'w')

This will make the print statements write to logfile.

If you want the option of printing to stdout and a file, you can try this:

class Tee(object):
    def __init__(self, *files):
        self.files = files
    def write(self, obj):
        for f in self.files:
            f.write(obj)

f = open('logfile', 'w')
backup = sys.stdout
sys.stdout = Tee(sys.stdout, f)

print "hello world"  # this should appear in stdout and in file

To revert to just printing to console, just restore the "backup"

sys.stdout = backup
qid & accept id: (17870242, 17870259) query: Return random value from list tuple soup:

Using random.choice.

\n
>>> import random\n>>> moves = [('r', "rock"), ('p', "paper"), ('s', "scissors")]\n>>> random.choice(moves)\n('s', 'scissors')\n
\n

If only the first value of the tuple is wanted:

\n
random.choice(moves)[0]\n
\n soup wrap:

Using random.choice.

>>> import random
>>> moves = [('r', "rock"), ('p', "paper"), ('s', "scissors")]
>>> random.choice(moves)
('s', 'scissors')

If only the first value of the tuple is wanted:

random.choice(moves)[0]
qid & accept id: (17910014, 17910036) query: Removing certain letters from a string soup:

You could use str.translate.

\n
>>> test = 'Today it is Tuesday'\n>>> removeText = 'pqrst'\n>>> test.translate(None, removeText+removeText.upper())\n'oday i i ueday'\n
\n

Since you're on Python 3, use the translate() method like this.

\n
>>> test = 'Today it is Tuesday'\n>>> removeText = 'pqrst'\n>>> test.translate(dict.fromkeys(ord(elem) for elem in removeText+removeText.upper()))\n'oday i i ueday'\n
\n

The problem in your code is that you're removing stuff from the list while iterating over it.

\n

Just doing this works. (Here you make a copy, iterate over it, while removing the element from the original list)

\n
>>> testList = list(test)\n>>> for i in testList[:]:\n        if i in 'pqrstPQRST':\n            testList.remove(i)\n\n\n>>> "".join(testList)\n'oday i i ueday'\n
\n

P.S - Instead of using string = '' and iterating over the list and joining the characters, just use "".join(...).

\n soup wrap:

You could use str.translate.

>>> test = 'Today it is Tuesday'
>>> removeText = 'pqrst'
>>> test.translate(None, removeText+removeText.upper())
'oday i i ueday'

Since you're on Python 3, use the translate() method like this.

>>> test = 'Today it is Tuesday'
>>> removeText = 'pqrst'
>>> test.translate(dict.fromkeys(ord(elem) for elem in removeText+removeText.upper()))
'oday i i ueday'

The problem in your code is that you're removing stuff from the list while iterating over it.

Just doing this works. (Here you make a copy, iterate over it, while removing the element from the original list)

>>> testList = list(test)
>>> for i in testList[:]:
        if i in 'pqrstPQRST':
            testList.remove(i)


>>> "".join(testList)
'oday i i ueday'

P.S - Instead of using string = '' and iterating over the list and joining the characters, just use "".join(...).

qid & accept id: (17910359, 17910365) query: How to generate list combinations? soup:

You're looking for itertools.product(...).

\n
>>> from itertools import product\n>>> list(product([1, 0], repeat=2))\n[(1, 1), (1, 0), (0, 1), (0, 0)]\n
\n

If you want to convert the inner elements to list type, use a list comprehension

\n
>>> [list(elem) for elem in product([1, 0], repeat =2)]\n[[1, 1], [1, 0], [0, 1], [0, 0]]\n
\n

Or by using map()

\n
>>> map(list, product([1, 0], repeat=2))\n[[1, 1], [1, 0], [0, 1], [0, 0]]\n
\n soup wrap:

You're looking for itertools.product(...).

>>> from itertools import product
>>> list(product([1, 0], repeat=2))
[(1, 1), (1, 0), (0, 1), (0, 0)]

If you want to convert the inner elements to list type, use a list comprehension

>>> [list(elem) for elem in product([1, 0], repeat =2)]
[[1, 1], [1, 0], [0, 1], [0, 0]]

Or by using map()

>>> map(list, product([1, 0], repeat=2))
[[1, 1], [1, 0], [0, 1], [0, 0]]
qid & accept id: (17911276, 17911832) query: flask/jinja: creating a leaderboard out of an unordered dict object soup:

I'd suggest you flatten the user: {stats...} dictionary, so the user appears as a futher key. You can do this with a list comprehension:

\n
[dict([('user', k)] + list(v.items())) for k, v in a['users'].items()]\n
\n

This will give you a list of entries that look like

\n
{'user': '', 'negative': 8, 'positive': 32, 'rating': 80.0}\n
\n

Add sorted to get it sorted:

\n
sorted([dict([('user', k)] + list(v.items())) for k, v in a['users'].items()], key=lambda x: x['rating'])\n
\n

and just iterate through the sorted list with for ... in. To sort reverse, use reverse=True in sorted

\n soup wrap:

I'd suggest you flatten the user: {stats...} dictionary, so the user appears as a futher key. You can do this with a list comprehension:

[dict([('user', k)] + list(v.items())) for k, v in a['users'].items()]

This will give you a list of entries that look like

{'user': '', 'negative': 8, 'positive': 32, 'rating': 80.0}

Add sorted to get it sorted:

sorted([dict([('user', k)] + list(v.items())) for k, v in a['users'].items()], key=lambda x: x['rating'])

and just iterate through the sorted list with for ... in. To sort reverse, use reverse=True in sorted

qid & accept id: (17921455, 18204435) query: How to import cython function to cython script soup:

The solution eventually was to create additional .pxd file, which something very similar to classic header .h file in C. It stores functions declarations and when cimport is called, it is in this file where it looks for functions and structures.

\n

So to be specific, all I needed to do was to create file noo.pxd containing:

\n
cdef trol(int * i)\n
\n

and than we can simply cimport this function from foo.pyx by calling

\n
from noo cimport trol\n
\n soup wrap:

The solution eventually was to create additional .pxd file, which something very similar to classic header .h file in C. It stores functions declarations and when cimport is called, it is in this file where it looks for functions and structures.

So to be specific, all I needed to do was to create file noo.pxd containing:

cdef trol(int * i)

and than we can simply cimport this function from foo.pyx by calling

from noo cimport trol
qid & accept id: (17958069, 17958439) query: How can I determine when a user is in the process of entering something as an input in Python? soup:

You can read one character at a time, making hasSomethingEntered true after the first character and false after the end of a line. Unfortunately, this is platform-dependent.

\n

Windows:

\n
import msvcrt\none_character= msvcrt.getch()\n
\n

Unix:

\n
import sys, tty\ntty.setraw(sys.stdin.fileno())\none_character= sys.stdin.read(1)\n
\n

In the later case you will most probably want to save and restore sys.stdin's mode with

\n
import sys, termios\nprevious_mode= termios.tcgetattr( sys.stdin.fileno() )\n
\n

and

\n
import sys, termios\ntermios.tcsetattr(sys.stdin.fileno(),termios.TCSADRAIN, previous_mode )\n
\n

, respectively.

\n soup wrap:

You can read one character at a time, making hasSomethingEntered true after the first character and false after the end of a line. Unfortunately, this is platform-dependent.

Windows:

import msvcrt
one_character= msvcrt.getch()

Unix:

import sys, tty
tty.setraw(sys.stdin.fileno())
one_character= sys.stdin.read(1)

In the later case you will most probably want to save and restore sys.stdin's mode with

import sys, termios
previous_mode= termios.tcgetattr( sys.stdin.fileno() )

and

import sys, termios
termios.tcsetattr(sys.stdin.fileno(),termios.TCSADRAIN, previous_mode )

, respectively.

qid & accept id: (17972025, 17972086) query: How to remove an array containing certain strings from another array in Python soup:

You can use a list comprehension:

\n
[i for i in a if not any(x in i for x in b)]\n
\n

This returns:

\n
['blah', 'tete', 'head']\n
\n soup wrap:

You can use a list comprehension:

[i for i in a if not any(x in i for x in b)]

This returns:

['blah', 'tete', 'head']
qid & accept id: (18016779, 18105689) query: Different databases with the same models on Django soup:

We did it! Let me explain how.

\n

We wrote a custom middleware and registered it as an middleware class inside our settings.py file.

\n
MIDDLEWARE_CLASSES = (\n    'django.middleware.common.CommonMiddleware',\n    'django.contrib.sessions.middleware.SessionMiddleware',\n    'django.middleware.csrf.CsrfViewMiddleware',\n    'our.custom.middleware.Class',\n    'django.contrib.auth.middleware.AuthenticationMiddleware',\n    'django.contrib.messages.middleware.MessageMiddleware',\n)\n
\n

This middleware has a process_request method that creates a thread variable (from threading import local) to store the appropriate database name for the current user. Since every request is handled by a different thread, we know that our variable's value won't be accidently changed by another thread.

\n

The next step was creating a Database Router and registering it as such.

\n
DATABASE_ROUTERS = ('our.custom.database.Router',)\n
\n

Attention: The default settings.py doesn't have a DATABASE_ROUTERS variable. You'll have to create it.

\n

Our custom Router has the same implementations for db_for_read and db_for_write. The only thing these methods do is return the database name stored on our thread variable.

\n

That's it. Now we don't have to call using every time we need to recover or save model objects.

\n soup wrap:

We did it! Let me explain how.

We wrote a custom middleware and registered it as an middleware class inside our settings.py file.

MIDDLEWARE_CLASSES = (
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'our.custom.middleware.Class',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
)

This middleware has a process_request method that creates a thread variable (from threading import local) to store the appropriate database name for the current user. Since every request is handled by a different thread, we know that our variable's value won't be accidently changed by another thread.

The next step was creating a Database Router and registering it as such.

DATABASE_ROUTERS = ('our.custom.database.Router',)

Attention: The default settings.py doesn't have a DATABASE_ROUTERS variable. You'll have to create it.

Our custom Router has the same implementations for db_for_read and db_for_write. The only thing these methods do is return the database name stored on our thread variable.

That's it. Now we don't have to call using every time we need to recover or save model objects.

qid & accept id: (18024402, 18027516) query: find unique first top and bottom lines of fastq file from fasta file soup:

Unless you have a strong reason to do this yourself, use Biopython.

\n

fasta:

\n
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGA\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGG\n
\n

fastq (based on yours but not identical because your output was badly formatted):

\n
@DH1DQQN1:269:C1UKCACXX:1:1107:20386:6577 1:N:0:TTAGGC\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC\n+\nCCCFFFFFHGHHHJIJHFDDDB173@8815BDDB###############\n@DH1DQQN1:269:C1UKCACXX:1:1114:5718:53821 1:N:0:TTAGGC\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n+\nCCCFFFFFHGHHHJIJHFDDDB173@8815BDDB###############\n@DH1DQQN1:269:C1UKCACXX:1:1209:10703:35361 1:N:0:TTAGGC\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\n+\n@@@FFFFFHGHHHGIJHFDDDDDBDD69@6B-707537BDDDB75@@85\n@DH1DQQN1:269:C1UKCACXX:1:1210:18926:75163 1:N:0:TTAGGC\nAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG\n+\n@CCFFFFFHHHHHJJJHFDDD@77BDDDDB077007@B###########\n
\n

Code:

\n
from Bio import SeqIO\n\nwith open("fasta") as fh:\n    fasta = fh.read().splitlines()\n\nseen = set()\n\nfor record in SeqIO.parse(open('fastq'), 'fastq'):\n    seq = str(record.seq)\n    if seq in fasta and seq not in seen:\n        seen.add(seq)\n        print record.format('fastq')\n
\n
\n

EDIT: The above prints records in the order of the fastq file, not the fasta file. If order is not important, you should use that method. Otherwise, you can add the records to a dictionary where the key is their index in the FASTA file, and print them all in the end, sorting the dictionary:

\n
from Bio import SeqIO\nimport sys\n\nwith open("fasta") as fh:\n    fasta = fh.read().splitlines()\n\nseen = set()\nrecords = {}\n\nfor record in SeqIO.parse(open('fastq'), 'fastq'):\n    seq = str(record.seq)\n    if seq in fasta and seq not in seen:\n        seen.add(seq)\n        records[fasta.index(seq)] = record\n\nfor record in sorted(records):\n    sys.stdout.write(records[record].format('fastq'))\n
\n

(Here I also use sys.stdout.write instead of print, to avoid the extra newlines.)

\n soup wrap:

Unless you have a strong reason to do this yourself, use Biopython.

fasta:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGG

fastq (based on yours but not identical because your output was badly formatted):

@DH1DQQN1:269:C1UKCACXX:1:1107:20386:6577 1:N:0:TTAGGC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC
+
CCCFFFFFHGHHHJIJHFDDDB173@8815BDDB###############
@DH1DQQN1:269:C1UKCACXX:1:1114:5718:53821 1:N:0:TTAGGC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
CCCFFFFFHGHHHJIJHFDDDB173@8815BDDB###############
@DH1DQQN1:269:C1UKCACXX:1:1209:10703:35361 1:N:0:TTAGGC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
@@@FFFFFHGHHHGIJHFDDDDDBDD69@6B-707537BDDDB75@@85
@DH1DQQN1:269:C1UKCACXX:1:1210:18926:75163 1:N:0:TTAGGC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAG
+
@CCFFFFFHHHHHJJJHFDDD@77BDDDDB077007@B###########

Code:

from Bio import SeqIO

with open("fasta") as fh:
    fasta = fh.read().splitlines()

seen = set()

for record in SeqIO.parse(open('fastq'), 'fastq'):
    seq = str(record.seq)
    if seq in fasta and seq not in seen:
        seen.add(seq)
        print record.format('fastq')

EDIT: The above prints records in the order of the fastq file, not the fasta file. If order is not important, you should use that method. Otherwise, you can add the records to a dictionary where the key is their index in the FASTA file, and print them all in the end, sorting the dictionary:

from Bio import SeqIO
import sys

with open("fasta") as fh:
    fasta = fh.read().splitlines()

seen = set()
records = {}

for record in SeqIO.parse(open('fastq'), 'fastq'):
    seq = str(record.seq)
    if seq in fasta and seq not in seen:
        seen.add(seq)
        records[fasta.index(seq)] = record

for record in sorted(records):
    sys.stdout.write(records[record].format('fastq'))

(Here I also use sys.stdout.write instead of print, to avoid the extra newlines.)

qid & accept id: (18040702, 18040889) query: How to find and select a table in html code with xpath soup:

Use

\n
tables = root.xpath('.//table[preceding-sibling::h3[text()="Impact"]]')\n
\n

or

\n
tables = root.xpath('.//h3[text()="Impact"]/following-sibling::table')\n
\n

or

\n
tables = root.cssselect('h3:contains(Impact) ~ table')\n
\n

Complete solution

\n
root = tree.getroot()\ntables = root.xpath('.//h3[text()="Impact"]/following-sibling::table')\nfor table in tables:\n    print str\n
\n soup wrap:

Use

tables = root.xpath('.//table[preceding-sibling::h3[text()="Impact"]]')

or

tables = root.xpath('.//h3[text()="Impact"]/following-sibling::table')

or

tables = root.cssselect('h3:contains(Impact) ~ table')

Complete solution

root = tree.getroot()
tables = root.xpath('.//h3[text()="Impact"]/following-sibling::table')
for table in tables:
    print str
qid & accept id: (18049531, 18049874) query: PySide custom Tab soup:

Actually, Qt has really good documentation with very good examples, all here. Main page to stylesheets.

\n

All you have to do is to set the stylesheet either with QtDesigner or in python itself, like this:

\n
self.tabWidget.setStyleSheet("background-color: rgb(255, 255, 255);\n"\n                                    "border:1px solid rgb(255, 170, 255);")\n
\n

Here is a sample stylesheet (the third example from Qtabwidget)

\n
 QTabWidget::pane { /* The tab widget frame */\n     border-top: 2px solid #C2C7CB;\n     position: absolute;\n     top: -0.5em;\n }\n\n QTabWidget::tab-bar {\n     alignment: center;\n }\n\n /* Style the tab using the tab sub-control. Note that\n     it reads QTabBar _not_ QTabWidget */\n QTabBar::tab {\n     background: qlineargradient(x1: 0, y1: 0, x2: 0, y2: 1,\n                                 stop: 0 #E1E1E1, stop: 0.4 #DDDDDD,\n                                 stop: 0.5 #D8D8D8, stop: 1.0 #D3D3D3);\n     border: 2px solid #C4C4C3;\n     border-bottom-color: #C2C7CB; /* same as the pane color */\n     border-top-left-radius: 4px;\n     border-top-right-radius: 4px;\n     min-width: 8ex;\n     padding: 2px;\n }\n\n QTabBar::tab:selected, QTabBar::tab:hover {\n     background: qlineargradient(x1: 0, y1: 0, x2: 0, y2: 1,\n                                 stop: 0 #fafafa, stop: 0.4 #f4f4f4,\n                                 stop: 0.5 #e7e7e7, stop: 1.0 #fafafa);\n }\n\n QTabBar::tab:selected {\n     border-color: #9B9B9B;\n     border-bottom-color: #C2C7CB; /* same as pane color */\n }\n
\n soup wrap:

Actually, Qt has really good documentation with very good examples, all here. Main page to stylesheets.

All you have to do is to set the stylesheet either with QtDesigner or in python itself, like this:

self.tabWidget.setStyleSheet("background-color: rgb(255, 255, 255);\n"
                                    "border:1px solid rgb(255, 170, 255);")

Here is a sample stylesheet (the third example from Qtabwidget)

 QTabWidget::pane { /* The tab widget frame */
     border-top: 2px solid #C2C7CB;
     position: absolute;
     top: -0.5em;
 }

 QTabWidget::tab-bar {
     alignment: center;
 }

 /* Style the tab using the tab sub-control. Note that
     it reads QTabBar _not_ QTabWidget */
 QTabBar::tab {
     background: qlineargradient(x1: 0, y1: 0, x2: 0, y2: 1,
                                 stop: 0 #E1E1E1, stop: 0.4 #DDDDDD,
                                 stop: 0.5 #D8D8D8, stop: 1.0 #D3D3D3);
     border: 2px solid #C4C4C3;
     border-bottom-color: #C2C7CB; /* same as the pane color */
     border-top-left-radius: 4px;
     border-top-right-radius: 4px;
     min-width: 8ex;
     padding: 2px;
 }

 QTabBar::tab:selected, QTabBar::tab:hover {
     background: qlineargradient(x1: 0, y1: 0, x2: 0, y2: 1,
                                 stop: 0 #fafafa, stop: 0.4 #f4f4f4,
                                 stop: 0.5 #e7e7e7, stop: 1.0 #fafafa);
 }

 QTabBar::tab:selected {
     border-color: #9B9B9B;
     border-bottom-color: #C2C7CB; /* same as pane color */
 }
qid & accept id: (18050937, 18050955) query: How can I execute shell command with a | pipe in it soup:

Call with shell=True argument. For example,

\n
import subprocess\n\nsubprocess.call('grep -r PASSED *.log | sort -u | wc -l', shell=True)\n
\n
\n

Hard way

\n
import glob\nimport subprocess\n\ngrep = subprocess.Popen(['grep', '-r', 'PASSED'] + glob.glob('*.log'), stdout=subprocess.PIPE)\nsort = subprocess.Popen(['sort', '-u'], stdin=grep.stdout, stdout=subprocess.PIPE)\nexit_status = subprocess.call(['wc', '-l'], stdin=sort.stdout)\n
\n

See Replacing shell pipeline.

\n soup wrap:

Call with shell=True argument. For example,

import subprocess

subprocess.call('grep -r PASSED *.log | sort -u | wc -l', shell=True)

Hard way

import glob
import subprocess

grep = subprocess.Popen(['grep', '-r', 'PASSED'] + glob.glob('*.log'), stdout=subprocess.PIPE)
sort = subprocess.Popen(['sort', '-u'], stdin=grep.stdout, stdout=subprocess.PIPE)
exit_status = subprocess.call(['wc', '-l'], stdin=sort.stdout)

See Replacing shell pipeline.

qid & accept id: (18062466, 18062571) query: Creating a dictionary and adding a set as its value soup:

Try this, for adding new set elements as values for a given key:

\n
d = {}\nd.setdefault(key, set()).add(value)\n
\n

Alternatively, use a defaultdict:

\n
from collections import defaultdict\nd = defaultdict(set)\nd[key].add(value)\n
\n

Either solution will effectively create a multimap: a data structure that for a given key can hold multiple values - in this case, inside a set. For your example in particular, this is how you'd use it:

\n
d = {}\nfor num in datasource:\n    d.setdefault(key, set()).add(num)\n
\n

Alternatively:

\n
from collections import defaultdict\nd = defaultdict(set)\nfor num in datasource:\n    d[key].add(num)\n
\n soup wrap:

Try this, for adding new set elements as values for a given key:

d = {}
d.setdefault(key, set()).add(value)

Alternatively, use a defaultdict:

from collections import defaultdict
d = defaultdict(set)
d[key].add(value)

Either solution will effectively create a multimap: a data structure that for a given key can hold multiple values - in this case, inside a set. For your example in particular, this is how you'd use it:

d = {}
for num in datasource:
    d.setdefault(key, set()).add(num)

Alternatively:

from collections import defaultdict
d = defaultdict(set)
for num in datasource:
    d[key].add(num)
qid & accept id: (18082130, 18082240) query: Python regex to remove all words which contains number soup:

Do you need a regex? You can do something like

\n
>>> words = "ABCD abcd AB55 55CD A55D 5555"\n>>> ' '.join(s for s in words.split() if not any(c.isdigit() for c in s))\n'ABCD abcd'\n
\n

If you really want to use regex, you can try \w*\d\w*:

\n
>>> re.sub(r'\w*\d\w*', '', words).strip()\n'ABCD abcd'\n
\n soup wrap:

Do you need a regex? You can do something like

>>> words = "ABCD abcd AB55 55CD A55D 5555"
>>> ' '.join(s for s in words.split() if not any(c.isdigit() for c in s))
'ABCD abcd'

If you really want to use regex, you can try \w*\d\w*:

>>> re.sub(r'\w*\d\w*', '', words).strip()
'ABCD abcd'
qid & accept id: (18085030, 18122726) query: Python argparser. List of dict in INI soup:

I'm not sure I understand this correctly but if you want to create a config file to easily read a list like you've shown then create a section in your configs.ini

\n
[section]\nkey = value\nkey2 = value2\nkey3 = value3\n
\n

and then

\n
>> config = ConfigParser.RawConfigParser()\n>> config.read('configs.ini')\n>> items = config.items('section')\n>> items\n[('key', 'value'), ('key2', 'value2'), ('key3', 'value3')]\n
\n

which is basically what you say you need.

\n

If on the other hand what you are saying is that your config file contains:

\n
[section]\ncouples = [("somekey1", "somevalue1"), ("somekey2", "somevalue2"), ("somekey3", "somevalue3")]\n
\n

what you could do is extend the config parser like for example so:

\n
class MyConfigParser(ConfigParser.RawConfigParser):\n\n    def get_list_of_tups(self, section, option):\n        value = self.get(section, option)\n        import re\n        couples = re.finditer('\("([a-z0-9]*)", "([a-z0-9]*)"\)', value)\n        return [(c.group(1), c.group(2)) for c in couples]\n
\n

and then your new parser can get fetch your list for you:

\n
>> my_config = MyConfigParser()\n>> my_config.read('example.cfg')\n>> couples = my_config.get_list_of_tups('section', 'couples')\n>> couples\n[('somekey1', 'somevalue1'), ('somekey2', 'somevalue2'), ('somekey3', 'somevalue3')]\n
\n

The second situation is just making things hard for yourself I think.

\n soup wrap:

I'm not sure I understand this correctly but if you want to create a config file to easily read a list like you've shown then create a section in your configs.ini

[section]
key = value
key2 = value2
key3 = value3

and then

>> config = ConfigParser.RawConfigParser()
>> config.read('configs.ini')
>> items = config.items('section')
>> items
[('key', 'value'), ('key2', 'value2'), ('key3', 'value3')]

which is basically what you say you need.

If on the other hand what you are saying is that your config file contains:

[section]
couples = [("somekey1", "somevalue1"), ("somekey2", "somevalue2"), ("somekey3", "somevalue3")]

what you could do is extend the config parser like for example so:

class MyConfigParser(ConfigParser.RawConfigParser):

    def get_list_of_tups(self, section, option):
        value = self.get(section, option)
        import re
        couples = re.finditer('\("([a-z0-9]*)", "([a-z0-9]*)"\)', value)
        return [(c.group(1), c.group(2)) for c in couples]

and then your new parser can get fetch your list for you:

>> my_config = MyConfigParser()
>> my_config.read('example.cfg')
>> couples = my_config.get_list_of_tups('section', 'couples')
>> couples
[('somekey1', 'somevalue1'), ('somekey2', 'somevalue2'), ('somekey3', 'somevalue3')]

The second situation is just making things hard for yourself I think.

qid & accept id: (18086307, 18086366) query: Python - Comparing two lists of sets soup:

You can flatten the two lists of sets into sets:

\n
l1 = set(s for x in list1 for s in x)\nl2 = set(s for x in list2 for s in x)\n
\n

Then you can compute the intersection:

\n
common = l1.intersection(l2)  # common will give common elements\nprint len(common) # this will give you the number of elements in common.\n
\n

Results:

\n
>>> print common\nset(['3123', '3115', '3107', '3126'])\n>>> len(common)\n4\n
\n soup wrap:

You can flatten the two lists of sets into sets:

l1 = set(s for x in list1 for s in x)
l2 = set(s for x in list2 for s in x)

Then you can compute the intersection:

common = l1.intersection(l2)  # common will give common elements
print len(common) # this will give you the number of elements in common.

Results:

>>> print common
set(['3123', '3115', '3107', '3126'])
>>> len(common)
4
qid & accept id: (18088496, 18088659) query: capturing the usernames after List: tag soup:

Using regex, str.translate and str.split :

\n
>>> import re\n>>> from string import whitespace\n>>> strs = re.search(r'List:(.*)(\s\S*\w+):', ph, re.DOTALL).group(1)\n>>> strs.translate(None, ':'+whitespace).split(',')\n['username1', 'username2', 'username3', 'username4', 'username5']\n
\n

You can also create a dict here, which will allow you to access any attribute:

\n
def func(lis):\n    return ''.join(lis).translate(None, ':'+whitespace)\n\nlis = [x.split() for x in re.split(r'(?<=\w):',ph.strip(), re.DOTALL)]\ndic = {}\nfor x, y in zip(lis[:-1], lis[1:-1]):\n    dic[x[-1]] = func(y[:-1]).split(',')\ndic[lis[-2][-1]] = func(lis[-1]).split(',')\n\nprint dic['List']\nprint dic['Members']\nprint dic['alias']\n
\n

Output:

\n
['username1', 'username2', 'username3', 'username4', 'username5']\n['User1', 'User2', 'User3', 'User4', 'User5']\n['tech.sw.host']\n
\n soup wrap:

Using regex, str.translate and str.split :

>>> import re
>>> from string import whitespace
>>> strs = re.search(r'List:(.*)(\s\S*\w+):', ph, re.DOTALL).group(1)
>>> strs.translate(None, ':'+whitespace).split(',')
['username1', 'username2', 'username3', 'username4', 'username5']

You can also create a dict here, which will allow you to access any attribute:

def func(lis):
    return ''.join(lis).translate(None, ':'+whitespace)

lis = [x.split() for x in re.split(r'(?<=\w):',ph.strip(), re.DOTALL)]
dic = {}
for x, y in zip(lis[:-1], lis[1:-1]):
    dic[x[-1]] = func(y[:-1]).split(',')
dic[lis[-2][-1]] = func(lis[-1]).split(',')

print dic['List']
print dic['Members']
print dic['alias']

Output:

['username1', 'username2', 'username3', 'username4', 'username5']
['User1', 'User2', 'User3', 'User4', 'User5']
['tech.sw.host']
qid & accept id: (18108438, 18108533) query: evaluating values of a dictionary soup:

You could use (Use itervalues() for Py2x)

\n
all(elem[2] in ('', None) for elem in test.values())\n
\n

See the demo -

\n
>>> test = {'a': (1, 2, None), 'b':(2, 3, '')}\n>>> all(elem[2] in ('', None) for elem in test.values())\nTrue\n>>> test['c'] = (1, 2, 3)\n>>> all(elem[2] in ('', None) for elem in test.values())\nFalse\n
\n soup wrap:

You could use (Use itervalues() for Py2x)

all(elem[2] in ('', None) for elem in test.values())

See the demo -

>>> test = {'a': (1, 2, None), 'b':(2, 3, '')}
>>> all(elem[2] in ('', None) for elem in test.values())
True
>>> test['c'] = (1, 2, 3)
>>> all(elem[2] in ('', None) for elem in test.values())
False
qid & accept id: (18111031, 18111302) query: Python: Append dictionary in another file soup:

You can store and retrieve data structures using the pickle module in python, which provides object serialisation.

\n

Save the dictionary

\n
import pickle\nsome_dict = {'this':1,'is':2,'an':3,'example':4}\n\nwith open('saved_dict.pkl','w') as pickle_out:\n    pickle.dump(some_dict,pickle_out)\n
\n

Load the dictionary

\n
with open('saved_dict.pkl.'r') as pickle_in:\n    that_dict_again = pickle.load(pickle_in)\n
\n soup wrap:

You can store and retrieve data structures using the pickle module in python, which provides object serialisation.

Save the dictionary

import pickle
some_dict = {'this':1,'is':2,'an':3,'example':4}

with open('saved_dict.pkl','w') as pickle_out:
    pickle.dump(some_dict,pickle_out)

Load the dictionary

with open('saved_dict.pkl.'r') as pickle_in:
    that_dict_again = pickle.load(pickle_in)
qid & accept id: (18114415, 18114485) query: how do I concatenate 3 lists using a list comprehension? soup:

A better solution is to use itertools.chain instead of addition. That way, instead of creating the intermediate list list1 + list2, and then another intermediate list list1 + list2 + list3, you just create the final list with no intermediates:

\n
allList = [x for x in itertools.chain(list1, list2, list3)]\n
\n

However, an empty list comprehension like this is pretty silly; just use the list function to turn any arbitrary iterable into a list:

\n
allList = list(itertools.chain(list1, list2, list3))\n
\n

Or, even better… if the only reason you need this is to loop over it, just leave it as an iterator:

\n
for thing in itertools.chain(list1, list2, list3):\n    do_stuff(thing)\n
\n
\n

While we're at it, the "similar question" you linked to is actually a very different, and more complicated, question. But, because itertools is so cool, it's still a one-liner in Python:

\n
itertools.product(list1, list2, list3)\n
\n

Or, if you want to print it out in the format specified by that question:

\n
print('\n'.join(map(' '.join, itertools.product(list1, list2, list3))))\n
\n soup wrap:

A better solution is to use itertools.chain instead of addition. That way, instead of creating the intermediate list list1 + list2, and then another intermediate list list1 + list2 + list3, you just create the final list with no intermediates:

allList = [x for x in itertools.chain(list1, list2, list3)]

However, an empty list comprehension like this is pretty silly; just use the list function to turn any arbitrary iterable into a list:

allList = list(itertools.chain(list1, list2, list3))

Or, even better… if the only reason you need this is to loop over it, just leave it as an iterator:

for thing in itertools.chain(list1, list2, list3):
    do_stuff(thing)

While we're at it, the "similar question" you linked to is actually a very different, and more complicated, question. But, because itertools is so cool, it's still a one-liner in Python:

itertools.product(list1, list2, list3)

Or, if you want to print it out in the format specified by that question:

print('\n'.join(map(' '.join, itertools.product(list1, list2, list3))))
qid & accept id: (18141698, 18142139) query: How can I set a code for users when they enter a valud URL or not with PYTHON/Flask? soup:

You can use mechanize:

\n
from mechanize import Browser\n\nbr = Browser()    \nr = br.open("http://www.example.com/")\n\nif r.code == 200:\n    for link in br.links():\n        print link\nelse:\n    print "Error loading page"\n
\n

Or urllib2 and BeautifulSoup

\n
from BeautifulSoup import BeautifulSoup\nimport urllib2\n\nhtml_page = urllib2.urlopen("http://www.example.com")\nif html_page.getcode() == 200:\n    soup = BeautifulSoup(html_page)\n    for link in soup.findAll('a'):\n        print link.get('href')\nelse:\n    print "Error loading page"\n
\n
\n

I haven't worked much with Flask before, but try this:

\n

As I understand urlsearch is the URL that you are getting from form, so add check of it

\n
@app.route('/search', methods=['POST', 'GET'])\ndef search():\n    error = True\n    if request.method == 'POST':\n        return request.form['urlsearch']\n    else:    \n        br = Browser()    \n        r = br.open(request.args.get('urlsearch'))\n\n        if r.code == 200:\n            return br.links()\n        else:\n            return "Error loading page"\n
\n soup wrap:

You can use mechanize:

from mechanize import Browser

br = Browser()    
r = br.open("http://www.example.com/")

if r.code == 200:
    for link in br.links():
        print link
else:
    print "Error loading page"

Or urllib2 and BeautifulSoup

from BeautifulSoup import BeautifulSoup
import urllib2

html_page = urllib2.urlopen("http://www.example.com")
if html_page.getcode() == 200:
    soup = BeautifulSoup(html_page)
    for link in soup.findAll('a'):
        print link.get('href')
else:
    print "Error loading page"

I haven't worked much with Flask before, but try this:

As I understand urlsearch is the URL that you are getting from form, so add check of it

@app.route('/search', methods=['POST', 'GET'])
def search():
    error = True
    if request.method == 'POST':
        return request.form['urlsearch']
    else:    
        br = Browser()    
        r = br.open(request.args.get('urlsearch'))

        if r.code == 200:
            return br.links()
        else:
            return "Error loading page"
qid & accept id: (18157559, 18158140) query: Grouping data in a list of of dicts soup:

Here's how I'd do it:

\n
def merge_dicts(list_of_dicts):\n    lookup = {}\n    results = []\n    for d in list_of_dicts:\n        key = (d['type'], d['obj_id'])\n        try: # it's easier to ask forgiveness than permission\n            lookup[key]['actor'].append(d['actor'])\n        except KeyError:\n            val = {'type': d['type'],\n                   'obj_id': d['obj_id'],\n                   'actor': [d['actor']], # note, extra [] around value to make it a list\n                   'extra_fields': d['extra_fields']}\n            lookup[key] = val\n            results.append(val)\n\n    return results\n
\n

The lookup dict maps from the a tuple of the key values to the dictionaries that have been included in the results list. Those output dictionaries will have their actor value mutated if other dictionaries with the same key are encountered later on.

\n

A rather more natural solution though would be to get rid of the list-of-dictionaries data structure and instead go for a single dictionary that maps from type, obj_id keys to actors, extra_fields values. Here's what that would look like:

\n
def merge_dicts2(list_of_dicts):\n    results = {}\n    for d in list_of_dicts:\n        key = (d['type'], d['obj_id'])\n        try:\n            results[key][0].append(d['actor'])\n        except KeyError:\n            results[key] = ([d['actor']], d['extra_fields'])\n\n    return results\n
\n

This has most of the data that your list of dicts had, only the order has been lost (and since you were merging items from the old list, some of that order was going to be lost regardless).

\n

If you're going to be iterating over the collection later, this way is much easier, since you can unpack tuples (even nested ones) right in the loop:

\n
combined_dict = merge_dicts(list_of_dicts)\n\nfor (type, obj_id), (actors, extra_fields) in combined_dict.items():\n    # do stuff with type, obj_id, actors, extra_fields\n
\n soup wrap:

Here's how I'd do it:

def merge_dicts(list_of_dicts):
    lookup = {}
    results = []
    for d in list_of_dicts:
        key = (d['type'], d['obj_id'])
        try: # it's easier to ask forgiveness than permission
            lookup[key]['actor'].append(d['actor'])
        except KeyError:
            val = {'type': d['type'],
                   'obj_id': d['obj_id'],
                   'actor': [d['actor']], # note, extra [] around value to make it a list
                   'extra_fields': d['extra_fields']}
            lookup[key] = val
            results.append(val)

    return results

The lookup dict maps from the a tuple of the key values to the dictionaries that have been included in the results list. Those output dictionaries will have their actor value mutated if other dictionaries with the same key are encountered later on.

A rather more natural solution though would be to get rid of the list-of-dictionaries data structure and instead go for a single dictionary that maps from type, obj_id keys to actors, extra_fields values. Here's what that would look like:

def merge_dicts2(list_of_dicts):
    results = {}
    for d in list_of_dicts:
        key = (d['type'], d['obj_id'])
        try:
            results[key][0].append(d['actor'])
        except KeyError:
            results[key] = ([d['actor']], d['extra_fields'])

    return results

This has most of the data that your list of dicts had, only the order has been lost (and since you were merging items from the old list, some of that order was going to be lost regardless).

If you're going to be iterating over the collection later, this way is much easier, since you can unpack tuples (even nested ones) right in the loop:

combined_dict = merge_dicts(list_of_dicts)

for (type, obj_id), (actors, extra_fields) in combined_dict.items():
    # do stuff with type, obj_id, actors, extra_fields
qid & accept id: (18217858, 18217884) query: Python list manipulation: Add an item to a string element to make it a 2 item list soup:

Modify a[0]:

\n
>>> a = ['spam', 'eggs', 100, 1234]\n>>> a[0] = [a[0], 'Devon']\n>>> a\n[['spam', 'Devon'], 'eggs', 100, 1234]\n
\n

For your updated question:

\n
>>> items = ['devon', 'baloney']\n>>> a = ['spam', 'eggs', 100, 1234]\n>>> a[0] = [a[0]] + items\n>>> a\n[['spam', 'devon', 'baloney'], 'eggs', 100, 1234]\n
\n

If you're not sure about the position of 'spam' then use a list comprehension:

\n
>>> a = ['spam', 'eggs', 100, 1234]\n>>> [item if item != 'spam' else [item, 'devon'] for item in a]\n[['spam', 'devon'], 'eggs', 100, 1234]\n
\n soup wrap:

Modify a[0]:

>>> a = ['spam', 'eggs', 100, 1234]
>>> a[0] = [a[0], 'Devon']
>>> a
[['spam', 'Devon'], 'eggs', 100, 1234]

For your updated question:

>>> items = ['devon', 'baloney']
>>> a = ['spam', 'eggs', 100, 1234]
>>> a[0] = [a[0]] + items
>>> a
[['spam', 'devon', 'baloney'], 'eggs', 100, 1234]

If you're not sure about the position of 'spam' then use a list comprehension:

>>> a = ['spam', 'eggs', 100, 1234]
>>> [item if item != 'spam' else [item, 'devon'] for item in a]
[['spam', 'devon'], 'eggs', 100, 1234]
qid & accept id: (18233948, 18233976) query: Python line read size in bytes soup:

Use len instead of sys.getsizeof():

\n

sys.getsizeof() return used byte by interpreter to hold that object.

\n
>>> len('asdf')\n4\n>>> import sys\n>>> sys.getsizeof('asdf')\n37\n
\n

In addition to that, if you are running the program in the Window, you should use binary mode.

\n
open(myfile, 'rb')\n
\n

NOTE

\n

Using file.tell, you don't need to calculate current position.

\n soup wrap:

Use len instead of sys.getsizeof():

sys.getsizeof() return used byte by interpreter to hold that object.

>>> len('asdf')
4
>>> import sys
>>> sys.getsizeof('asdf')
37

In addition to that, if you are running the program in the Window, you should use binary mode.

open(myfile, 'rb')

NOTE

Using file.tell, you don't need to calculate current position.

qid & accept id: (18244791, 18244875) query: Python: how to turn string into a list? soup:

I am not quite sure I understand your question correctly, looks like you already have a list my_item, which is a list of two strings, string1 - 'maria' and string2 - 'jose'.

\n

If you mean the complete string is something looks like: """my_item = ['maria','jose']"""\nThen you do something like this below:

\n
inputString = "my_item = ['maria','jose']"\n\n# value is a list type \nvalue = eval(inputString.split("=")[1])\n# key is a string type\nkey = inputString.split("=")[0].strip()\n\n# I don't think you can define a variable name while the script is running. \n# but you can use dictionary type to call it.\nmydict = {}\nmydict[key] = value\n
\n

Then you can call mydict[key] to pick up the value which you want to lookup.

\n
>>> print mydict['my_item'] \n['maria', 'jose']\n
\n soup wrap:

I am not quite sure I understand your question correctly, looks like you already have a list my_item, which is a list of two strings, string1 - 'maria' and string2 - 'jose'.

If you mean the complete string is something looks like: """my_item = ['maria','jose']""" Then you do something like this below:

inputString = "my_item = ['maria','jose']"

# value is a list type 
value = eval(inputString.split("=")[1])
# key is a string type
key = inputString.split("=")[0].strip()

# I don't think you can define a variable name while the script is running. 
# but you can use dictionary type to call it.
mydict = {}
mydict[key] = value

Then you can call mydict[key] to pick up the value which you want to lookup.

>>> print mydict['my_item'] 
['maria', 'jose']
qid & accept id: (18250516, 18250549) query: How to add in a dictionary the values that have similar keys? soup:

Take the first character of each key, call .upper() on that and sum your values by that uppercased letter. The following loop

\n
out = {}\nfor key, value in original.iteritems():\n    out[key[0].upper()] = out.get(key[0].upper(), 0) + value\n
\n

should do it.

\n

You can also use a collections.defaultdict() object to simplify that a little:

\n
from collections import defaultdict:\n\nout = defaultdict(int)\nfor key, value in original.iteritems():\n    out[key[0].upper()] += value\n
\n

or you could use itertools.groupby():

\n
from itertools import groupby\n\nkey = lambda i: i[0][0].upper()\nout = {key: sum(v for k, v in group) for key, group in groupby(sorted(original.items(), key=key), key=key)}\n
\n soup wrap:

Take the first character of each key, call .upper() on that and sum your values by that uppercased letter. The following loop

out = {}
for key, value in original.iteritems():
    out[key[0].upper()] = out.get(key[0].upper(), 0) + value

should do it.

You can also use a collections.defaultdict() object to simplify that a little:

from collections import defaultdict:

out = defaultdict(int)
for key, value in original.iteritems():
    out[key[0].upper()] += value

or you could use itertools.groupby():

from itertools import groupby

key = lambda i: i[0][0].upper()
out = {key: sum(v for k, v in group) for key, group in groupby(sorted(original.items(), key=key), key=key)}
qid & accept id: (18281342, 18281400) query: regex to find a specific pattern in python soup:

You can just do this in a pretty straightforward way:

\n
import re\n\ntext = """\nNovember 5 - December 10\nSeptember 23 - December 16\n"""\n\nmatches = re.findall("\w+\s\d+\s\-\s\w+\s\d+", text)\nprint matches\n
\n

prints:

\n
['November 5 - December 10', 'September 23 - December 16']\n
\n

But, if these words are just month names, you can improve your regexp by specifying a list of months instead of just \w+:

\n
months = "|".join(calendar.month_name)[1:]\nmatches = re.findall("{0}\s\d+\s\-\s{0}\s\d+".format(months), text)\n
\n soup wrap:

You can just do this in a pretty straightforward way:

import re

text = """
November 5 - December 10
September 23 - December 16
"""

matches = re.findall("\w+\s\d+\s\-\s\w+\s\d+", text)
print matches

prints:

['November 5 - December 10', 'September 23 - December 16']

But, if these words are just month names, you can improve your regexp by specifying a list of months instead of just \w+:

months = "|".join(calendar.month_name)[1:]
matches = re.findall("{0}\s\d+\s\-\s{0}\s\d+".format(months), text)
qid & accept id: (18286362, 18286631) query: How to Get Variable from another .py soup:

Use the subprocess module and create a pipe. Then you can pickle the variable and send it through the pipe (see documentation of subprocess).

\n

Here is an example:

\n

module for communication communicate.py:

\n
import sys\nimport subprocess as sp\nimport cPickle\n\nBEGIN = 'pickle_begin'\n\ndef send_and_exit(x):\n    sys.stdout.write(BEGIN + cPickle.dumps(x))\n    sys.stdout.flush()\n    sys.exit(0)\n\ndef execute_and_receive(filename):\n    p = sp.Popen(["python", filename], stdout=sp.PIPE)\n    (out, err) = p.communicate()\n    return cPickle.loads(out[out.find(BEGIN) + len(BEGIN):])\n
\n

1.py:

\n
from communicate import *\nx = execute_and_receive("2.py")\ny = x + 2\n
\n

2.py:

\n
from communicate import *\nx = 2 + 2\nsend_and_exit(x)\n
\n

To make sure, you start unpickling at the correct point of the stdout stream I recommend to set a marker, like I did with the BEGIN string. Probably there are more elegant solutions, if so, I'm interested as well.

\n soup wrap:

Use the subprocess module and create a pipe. Then you can pickle the variable and send it through the pipe (see documentation of subprocess).

Here is an example:

module for communication communicate.py:

import sys
import subprocess as sp
import cPickle

BEGIN = 'pickle_begin'

def send_and_exit(x):
    sys.stdout.write(BEGIN + cPickle.dumps(x))
    sys.stdout.flush()
    sys.exit(0)

def execute_and_receive(filename):
    p = sp.Popen(["python", filename], stdout=sp.PIPE)
    (out, err) = p.communicate()
    return cPickle.loads(out[out.find(BEGIN) + len(BEGIN):])

1.py:

from communicate import *
x = execute_and_receive("2.py")
y = x + 2

2.py:

from communicate import *
x = 2 + 2
send_and_exit(x)

To make sure, you start unpickling at the correct point of the stdout stream I recommend to set a marker, like I did with the BEGIN string. Probably there are more elegant solutions, if so, I'm interested as well.

qid & accept id: (18289871, 18289908) query: Lazy class property decorator soup:

Pyramid framework has a very nice decorator called reify, but it only works at instance level, and you want class level, so let's modify it a bit

\n
class class_reify(object):\n    def __init__(self, wrapped):\n        self.wrapped = wrapped\n        try:\n            self.__doc__ = wrapped.__doc__\n        except: # pragma: no cover\n            pass\n\n    # original sets the attributes on the instance\n    # def __get__(self, inst, objtype=None):\n    #    if inst is None:\n    #        return self\n    #    val = self.wrapped(inst)\n    #    setattr(inst, self.wrapped.__name__, val)\n    #    return val\n\n    # ignore the instance, and just set them on the class\n    # if called on a class, inst is None and objtype is the class\n    # if called on an instance, inst is the instance, and objtype \n    # the class\n    def __get__(self, inst, objtype=None):\n        # ask the value from the wrapped object, giving it\n        # our class\n        val = self.wrapped(objtype)\n\n        # and set the attribute directly to the class, thereby\n        # avoiding the descriptor to be called multiple times\n        setattr(objtype, self.wrapped.__name__, val)\n\n        # and return the calculated value\n        return val\n\nclass Test(object):\n    @class_reify\n    def foo(cls):\n        print "foo called for class", cls\n        return 42\n\nprint Test.foo\nprint Test.foo\n
\n

Run the program and it prints

\n
foo called for class \n42\n42\n
\n soup wrap:

Pyramid framework has a very nice decorator called reify, but it only works at instance level, and you want class level, so let's modify it a bit

class class_reify(object):
    def __init__(self, wrapped):
        self.wrapped = wrapped
        try:
            self.__doc__ = wrapped.__doc__
        except: # pragma: no cover
            pass

    # original sets the attributes on the instance
    # def __get__(self, inst, objtype=None):
    #    if inst is None:
    #        return self
    #    val = self.wrapped(inst)
    #    setattr(inst, self.wrapped.__name__, val)
    #    return val

    # ignore the instance, and just set them on the class
    # if called on a class, inst is None and objtype is the class
    # if called on an instance, inst is the instance, and objtype 
    # the class
    def __get__(self, inst, objtype=None):
        # ask the value from the wrapped object, giving it
        # our class
        val = self.wrapped(objtype)

        # and set the attribute directly to the class, thereby
        # avoiding the descriptor to be called multiple times
        setattr(objtype, self.wrapped.__name__, val)

        # and return the calculated value
        return val

class Test(object):
    @class_reify
    def foo(cls):
        print "foo called for class", cls
        return 42

print Test.foo
print Test.foo

Run the program and it prints

foo called for class 
42
42
qid & accept id: (18325280, 18325300) query: How can I check if a string has the same characters? Python soup:

Sort the two strings and then compare them:

\n
sorted(str1) == sorted(str2)\n
\n

If the strings might not be the same length, you might want to make sure of that first to save time:

\n
len(str1) == len(str2) and sorted(str1) == sorted(str2)\n
\n soup wrap:

Sort the two strings and then compare them:

sorted(str1) == sorted(str2)

If the strings might not be the same length, you might want to make sure of that first to save time:

len(str1) == len(str2) and sorted(str1) == sorted(str2)
qid & accept id: (18340475, 18340527) query: Parenthesized repetitions in Python regular expressions soup:

Do not use a capturing group that repeats; only the last value will be captured. re.findall() will only return captured groups when you use them.

\n

A non-capturing group for the repeat would work much better here:

\n
m = re.findall(r'TEST\s\((?:\d+\s?)*\)', str)\n
\n

Demo:

\n
>>> import re\n>>> s = '(((TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)) (TRAIN (0 1 2 3 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 39 40 41 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 102 103 105 106 107 109 110 111 112 114 115 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 136 137 138 139 140 141 142 143 144 145 147 149 150 151))) ((TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)) (TRAIN (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 49 50 51 52 53 54 55 57 58 60 62 63 64 66 67 68 70 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151)))'\n>>> re.findall(r'TEST\s\((?:\d+\s?)*\)', s)\n['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']\n
\n

Without the capturing group, re.findall() returns the whole match.

\n soup wrap:

Do not use a capturing group that repeats; only the last value will be captured. re.findall() will only return captured groups when you use them.

A non-capturing group for the repeat would work much better here:

m = re.findall(r'TEST\s\((?:\d+\s?)*\)', str)

Demo:

>>> import re
>>> s = '(((TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)) (TRAIN (0 1 2 3 6 7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 34 35 36 37 39 40 41 42 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 94 95 96 97 98 99 100 102 103 105 106 107 109 110 111 112 114 115 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 136 137 138 139 140 141 142 143 144 145 147 149 150 151))) ((TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)) (TRAIN (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42 43 44 45 49 50 51 52 53 54 55 57 58 60 62 63 64 66 67 68 70 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 106 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151)))'
>>> re.findall(r'TEST\s\((?:\d+\s?)*\)', s)
['TEST (4 5 17 33 38 45 93 101 104 108 113 116 135 146 148)', 'TEST (19 35 46 47 48 56 59 61 65 69 71 84 105 107 130)']

Without the capturing group, re.findall() returns the whole match.

qid & accept id: (18353465, 18353698) query: How to apply an array of functions to a value using list comprehension? soup:
your_value = 3\nresult = reduce(lambda x, y: y(x), function_list, your_value)\n
\n

For example:

\n
>>> functions = [lambda x: x + 2, lambda x: x * 2]\n>>> reduce(lambda x, y: y(x), functions, 1)\n6\n
\n soup wrap:
your_value = 3
result = reduce(lambda x, y: y(x), function_list, your_value)

For example:

>>> functions = [lambda x: x + 2, lambda x: x * 2]
>>> reduce(lambda x, y: y(x), functions, 1)
6
qid & accept id: (18386302, 18386496) query: Resizing a 3D image (and resampling) soup:

From the docstring for scipy.ndimage.interpolate.zoom:

\n
"""\nzoom : float or sequence, optional\n    The zoom factor along the axes. If a float, `zoom` is the same for each\n    axis. If a sequence, `zoom` should contain one value for each axis.\n"""\n
\n

What is the scale factor between the two images? Is it constant across all axes (i.e. are you scaling isometrically)? In that case zoom should be a single float value. Otherwise it should be a sequence of floats, one per axis.

\n

For example, if the physical dimensions of whole and flash can be assumed to be equal, then you could do something like this:

\n
 dsfactor = [w/float(f) for w,f in zip(whole.shape, flash.shape)]\n downed = nd.interpolation.zoom(flash, zoom=dsfactor)\n
\n soup wrap:

From the docstring for scipy.ndimage.interpolate.zoom:

"""
zoom : float or sequence, optional
    The zoom factor along the axes. If a float, `zoom` is the same for each
    axis. If a sequence, `zoom` should contain one value for each axis.
"""

What is the scale factor between the two images? Is it constant across all axes (i.e. are you scaling isometrically)? In that case zoom should be a single float value. Otherwise it should be a sequence of floats, one per axis.

For example, if the physical dimensions of whole and flash can be assumed to be equal, then you could do something like this:

 dsfactor = [w/float(f) for w,f in zip(whole.shape, flash.shape)]
 downed = nd.interpolation.zoom(flash, zoom=dsfactor)
qid & accept id: (18443220, 18443288) query: class name as variable in python soup:

This creates a reference to MyClass:

\n
>>> class MyClass(object):\n...     pass\n... \n>>> myObj = MyClass()\n>>> NewClass = myObj.__class__\n>>> newObj = NewClass()\n>>> myObj, newObj\n(<__main__.MyClass object at 0x102740d90>, <__main__.MyClass object at 0x102740d50>)\n
\n

This creates a new class based on myObj's class:

\n
>>> myObj = MyClass()\n>>> NewClass = type("NewClass", (myObj.__class__,), {})\n>>> newObj = NewClass()\n>>> myObj, newObj\n(<__main__.MyClass object at 0x102740d90>, <__main__.NewClass object at 0x102752610>)\n>>> \n
\n soup wrap:

This creates a reference to MyClass:

>>> class MyClass(object):
...     pass
... 
>>> myObj = MyClass()
>>> NewClass = myObj.__class__
>>> newObj = NewClass()
>>> myObj, newObj
(<__main__.MyClass object at 0x102740d90>, <__main__.MyClass object at 0x102740d50>)

This creates a new class based on myObj's class:

>>> myObj = MyClass()
>>> NewClass = type("NewClass", (myObj.__class__,), {})
>>> newObj = NewClass()
>>> myObj, newObj
(<__main__.MyClass object at 0x102740d90>, <__main__.NewClass object at 0x102752610>)
>>> 
qid & accept id: (18472262, 18472298) query: Python Regex on comma, space soup:

Your first case doesn't even need a regex. You can simply do:

\n
"Dogs,Cats".split(",")\n
\n

For your 2nd case, you can use:

\n
re.split(r',\s*', "Dogs, Cats")\n
\n soup wrap:

Your first case doesn't even need a regex. You can simply do:

"Dogs,Cats".split(",")

For your 2nd case, you can use:

re.split(r',\s*', "Dogs, Cats")
qid & accept id: (18474538, 18475712) query: Connect to MSSQL Server 2008 on linux soup:

Do you have all the software you need? This is what you need for Ubuntu 12.04:

\n
sudo apt-get install php5-odbc php5-sybase tdsodbc\n
\n

Have you configured your these files on your Linux server? (These are taken from an Ubuntu 12.04 server)

\n

/etc/odbc.ini

\n
# Define a connection to the MSSQL server.\n# The Description can be whatever we want it to be.\n# The Driver value must match what we have defined in /etc/odbcinst.ini\n# The Database name must be the name of the database this connection will connect to.\n# The ServerName is the name we defined in /etc/freetds/freetds.conf\n# The TDS_Version should match what we defined in /etc/freetds/freetds.conf\n[mssql]\nDescription             = MSSQL Server\nDriver                  = freetds\nDatabase                = MyDatabase\nServerName              = mssql\nTDS_Version             = 8.0\n
\n

/etc/odbcinst.ini

\n
# Define where to find the driver for the Free TDS connections.\n[freetds]\nDescription     = MS SQL database access with Free TDS\nDriver          = /usr/lib/i386-linux-gnu/odbc/libtdsodbc.so\nSetup           = /usr/lib/i386-linux-gnu/odbc/libtdsS.so\nUsageCount      = 1\n
\n

/etc/freetds/freetds.conf

\n
# The basics for defining a DSN (Data Source Name)\n# [data_source_name]\n#       host = \n#       port = \n#       tds version = \n\n# Define a connection to the MSSQL server.\n[mssql]\n        host = mssql_server_ip_or_domain_name\n        port = 1433\n        tds version = 8.0\n
\n

I've read several accounts of the tds version causing problems. It seems like 8.0 words best but I've also seen people say they got things working with 7.5 and 7.0.

\n

Then test your connection:

\n
isql mssql username password\n
\n

Depending on your environment your username might have to be in the format: domain\username

\n

After issuing the command you should see something like:

\n
+---------------------------------------+\n| Connected!                            |\n|                                       |\n| sql-statement                         |\n| help [tablename]                      |\n| quit                                  |\n|                                       |\n+---------------------------------------+\nSQL>\n
\n

And here's what I think your connect command should look like (NOTE: I don't know Python):

\n
cnxn = pyodbc.connect('DRIVER=freetds;SERVER=FOOBAR;PORT=1433;DATABASE=T2;UID=FOO;PWD=bar;TDS_Version=8.0;')\n
\n soup wrap:

Do you have all the software you need? This is what you need for Ubuntu 12.04:

sudo apt-get install php5-odbc php5-sybase tdsodbc

Have you configured your these files on your Linux server? (These are taken from an Ubuntu 12.04 server)

/etc/odbc.ini

# Define a connection to the MSSQL server.
# The Description can be whatever we want it to be.
# The Driver value must match what we have defined in /etc/odbcinst.ini
# The Database name must be the name of the database this connection will connect to.
# The ServerName is the name we defined in /etc/freetds/freetds.conf
# The TDS_Version should match what we defined in /etc/freetds/freetds.conf
[mssql]
Description             = MSSQL Server
Driver                  = freetds
Database                = MyDatabase
ServerName              = mssql
TDS_Version             = 8.0

/etc/odbcinst.ini

# Define where to find the driver for the Free TDS connections.
[freetds]
Description     = MS SQL database access with Free TDS
Driver          = /usr/lib/i386-linux-gnu/odbc/libtdsodbc.so
Setup           = /usr/lib/i386-linux-gnu/odbc/libtdsS.so
UsageCount      = 1

/etc/freetds/freetds.conf

# The basics for defining a DSN (Data Source Name)
# [data_source_name]
#       host = 
#       port = 
#       tds version = 

# Define a connection to the MSSQL server.
[mssql]
        host = mssql_server_ip_or_domain_name
        port = 1433
        tds version = 8.0

I've read several accounts of the tds version causing problems. It seems like 8.0 words best but I've also seen people say they got things working with 7.5 and 7.0.

Then test your connection:

isql mssql username password

Depending on your environment your username might have to be in the format: domain\username

After issuing the command you should see something like:

+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL>

And here's what I think your connect command should look like (NOTE: I don't know Python):

cnxn = pyodbc.connect('DRIVER=freetds;SERVER=FOOBAR;PORT=1433;DATABASE=T2;UID=FOO;PWD=bar;TDS_Version=8.0;')
qid & accept id: (18497810, 18497887) query: Append to several lists inside list soup:
>>> lis_A = [[], [], []]\n>>> vals = [1,2,3]\n>>> [x.append(y) for x, y in zip(lis_A, vals)]\n>>> lis_A\n[[1], [2], [3]]\n
\n

Or if you wan't a fast for loop without side effects use:

\n
from itertools import izip\n\nfor x, y in izip(lis_A, vals):\n    x.append(y)\n
\n soup wrap:
>>> lis_A = [[], [], []]
>>> vals = [1,2,3]
>>> [x.append(y) for x, y in zip(lis_A, vals)]
>>> lis_A
[[1], [2], [3]]

Or if you wan't a fast for loop without side effects use:

from itertools import izip

for x, y in izip(lis_A, vals):
    x.append(y)
qid & accept id: (18507244, 18513847) query: boost python overload operator () soup:

When exposing the Queuer class, define a __call__ method for each Queuer::operator() member function. Boost.Python will handle the appropriate dispatching based on types. The only complexity is introduced with pointer-to-member-function syntax, as the caller is required to disambiguate &Queuer::operator().

\n

Additionally, when attempting to pass derived classes in Python to a C++ function with a parameter of the Base class, then some additional information needs to be exposed to Boost.Python:

\n
    \n
  • The base C++ class needs to be exposed with class_. For example, class_("Base").
  • \n
  • The derived class needs to explicitly list its base classes when being exposed with bases_. For example, class_ >("Derived"). With this information, Boost.Python can do proper casting while dispatching.
  • \n
\n
\n

Here is a complete example:

\n
#include \n\n#include \n\n// Mockup classes.\nstruct AgentBase   {};\nstruct MessageBase {};\nstruct QueueBase   {};\nstruct SpamBase    {};\nstruct Agent:   AgentBase   {};\nstruct Message: MessageBase {};\nstruct Queue:   QueueBase   {};\nstruct Spam:    SpamBase    {};\n\n// Class with overloaded operator().\nclass Queuer\n{ \npublic:\n\n  void operator()(const AgentBase&, const MessageBase&) const\n  {\n    std::cout << "Queuer::operator() with Agent." << std::endl;\n  }\n\n  void operator()(const QueueBase&, const MessageBase&) const\n  {\n    std::cout << "Queuer::operator() with Queue." << std::endl;\n  }\n\n  void operator()(const SpamBase&, const MessageBase&) const\n  {\n    std::cout << "Queuer::operator() with Spam." << std::endl;\n  }\n};\n\n/// Depending on the overlaod signatures, helper types may make the\n/// code slightly more readable by reducing pointer-to-member-function syntax.\ntemplate \nstruct queuer_overload\n{\n  typedef void (Queuer::*type)(const A1&, const MessageBase&) const;\n  static type get(type fn) { return fn; }\n};\n\nBOOST_PYTHON_MODULE(example)\n{\n  namespace python = boost::python;\n  // Expose only the base class types.  Do not allow the classes to be\n  // directly initialized in Python.\n  python::class_("AgentBase",   python::no_init);\n  python::class_("MessageBase", python::no_init);\n  python::class_("QueueBase",   python::no_init);\n  python::class_("SpamBase",    python::no_init);\n\n  // Expose the user types.  These classes inerit from their respective\n  // base classes.\n  python::class_   >("Agent");\n  python::class_ >("Message");\n  python::class_   >("Queue");\n  python::class_    >("Spam");\n\n  // Disambiguate via a varaible.\n  queuer_overload::type queuer_op_agent = &Queuer::operator();\n\n  python::class_("Queuer")\n    // Disambiguate via a variable.\n    .def("__call__", queuer_op_agent)\n    // Disambiguate via a helper type.\n    .def("__call__", queuer_overload::get(&Queuer::operator()))\n    // Disambiguate via explicit cast.\n    .def("__call__",\n         static_cast(\n             &Queuer::operator()))\n    ;\n}\n
\n

And its usage:

\n
>>> import example\n>>> queuer = example.Queuer()\n>>> queuer(example.Agent(), example.Message())\nQueuer::operator() with Agent.\n>>> queuer(example.Queue(), example.Message())\nQueuer::operator() with Queue.\n>>> queuer(example.Spam(), example.Message())\nQueuer::operator() with Spam.\n
\n soup wrap:

When exposing the Queuer class, define a __call__ method for each Queuer::operator() member function. Boost.Python will handle the appropriate dispatching based on types. The only complexity is introduced with pointer-to-member-function syntax, as the caller is required to disambiguate &Queuer::operator().

Additionally, when attempting to pass derived classes in Python to a C++ function with a parameter of the Base class, then some additional information needs to be exposed to Boost.Python:

  • The base C++ class needs to be exposed with class_. For example, class_("Base").
  • The derived class needs to explicitly list its base classes when being exposed with bases_. For example, class_ >("Derived"). With this information, Boost.Python can do proper casting while dispatching.

Here is a complete example:

#include 

#include 

// Mockup classes.
struct AgentBase   {};
struct MessageBase {};
struct QueueBase   {};
struct SpamBase    {};
struct Agent:   AgentBase   {};
struct Message: MessageBase {};
struct Queue:   QueueBase   {};
struct Spam:    SpamBase    {};

// Class with overloaded operator().
class Queuer
{ 
public:

  void operator()(const AgentBase&, const MessageBase&) const
  {
    std::cout << "Queuer::operator() with Agent." << std::endl;
  }

  void operator()(const QueueBase&, const MessageBase&) const
  {
    std::cout << "Queuer::operator() with Queue." << std::endl;
  }

  void operator()(const SpamBase&, const MessageBase&) const
  {
    std::cout << "Queuer::operator() with Spam." << std::endl;
  }
};

/// Depending on the overlaod signatures, helper types may make the
/// code slightly more readable by reducing pointer-to-member-function syntax.
template 
struct queuer_overload
{
  typedef void (Queuer::*type)(const A1&, const MessageBase&) const;
  static type get(type fn) { return fn; }
};

BOOST_PYTHON_MODULE(example)
{
  namespace python = boost::python;
  // Expose only the base class types.  Do not allow the classes to be
  // directly initialized in Python.
  python::class_("AgentBase",   python::no_init);
  python::class_("MessageBase", python::no_init);
  python::class_("QueueBase",   python::no_init);
  python::class_("SpamBase",    python::no_init);

  // Expose the user types.  These classes inerit from their respective
  // base classes.
  python::class_   >("Agent");
  python::class_ >("Message");
  python::class_   >("Queue");
  python::class_    >("Spam");

  // Disambiguate via a varaible.
  queuer_overload::type queuer_op_agent = &Queuer::operator();

  python::class_("Queuer")
    // Disambiguate via a variable.
    .def("__call__", queuer_op_agent)
    // Disambiguate via a helper type.
    .def("__call__", queuer_overload::get(&Queuer::operator()))
    // Disambiguate via explicit cast.
    .def("__call__",
         static_cast(
             &Queuer::operator()))
    ;
}

And its usage:

>>> import example
>>> queuer = example.Queuer()
>>> queuer(example.Agent(), example.Message())
Queuer::operator() with Agent.
>>> queuer(example.Queue(), example.Message())
Queuer::operator() with Queue.
>>> queuer(example.Spam(), example.Message())
Queuer::operator() with Spam.
qid & accept id: (18507736, 18507858) query: how to find the line number where specific text exists? soup:

You can do something like this:

\n
with open ('test.txt', 'r') as infile:\n    data = infile.readlines()\n    for line, content in enumerate(data, start=1):\n            if content.strip() == 'this is my horse':\n                print line\n
\n

which in case of your file will print:

\n
4\n
\n soup wrap:

You can do something like this:

with open ('test.txt', 'r') as infile:
    data = infile.readlines()
    for line, content in enumerate(data, start=1):
            if content.strip() == 'this is my horse':
                print line

which in case of your file will print:

4
qid & accept id: (18535645, 18536234) query: How to *append* a text to a database file opened with shelve? soup:

Assuming your values are lists:

\n
    \n
  • Use db = shelve.open('store',writeback=True) and then append the value to the same key.

  • \n
  • Since your code does not open 'store' with writeback=True you\nmust assign a variable the value of the key, temp = db['some variable'], which would be\nsome value, and then append that variable, temp.append(another\nvalue), and then reassign that keys value, db['some variable'] =\ntemp.

  • \n
\n

Should not your third line of code be db['some variable'] = another value' in order to replace the value?

\n

Edit: Other possible meaning of question?

\n

Do you mean you want to load the database into your object and continue to use your "UI" code to edit it after closing the program? If so then you can do something like:

\n
class Update_MyStore(MyStore):\n    def __init__(self, store):\n        db = shelve.open(store)\n        for i in db:\n            setattr(self, i, db[i])\n        self.items()\n        self.store_in_db()\nUpdate_MyStore('store')\n
\n

Edit: Another option to update, if that is the case, if you want to add or update specific items:

\n
while True:\n    store = shelve.open('store',writeback = True)\n    Item = input('Enter an item: ').capitalize() #I prefer str(raw_input('Question '))\n    if not Item or Item == 'Break':\n        break\n    store['item_quantity'][Item] = int(input(('Enter the number of {0} available in the store: ').format(Item)))\n    store['item_rate'][Item] = float(input(('Enter the rate of {0}: ').format(Item)))\n    store.sync()\n    store.close()\n
\n soup wrap:

Assuming your values are lists:

  • Use db = shelve.open('store',writeback=True) and then append the value to the same key.

  • Since your code does not open 'store' with writeback=True you must assign a variable the value of the key, temp = db['some variable'], which would be some value, and then append that variable, temp.append(another value), and then reassign that keys value, db['some variable'] = temp.

Should not your third line of code be db['some variable'] = another value' in order to replace the value?

Edit: Other possible meaning of question?

Do you mean you want to load the database into your object and continue to use your "UI" code to edit it after closing the program? If so then you can do something like:

class Update_MyStore(MyStore):
    def __init__(self, store):
        db = shelve.open(store)
        for i in db:
            setattr(self, i, db[i])
        self.items()
        self.store_in_db()
Update_MyStore('store')

Edit: Another option to update, if that is the case, if you want to add or update specific items:

while True:
    store = shelve.open('store',writeback = True)
    Item = input('Enter an item: ').capitalize() #I prefer str(raw_input('Question '))
    if not Item or Item == 'Break':
        break
    store['item_quantity'][Item] = int(input(('Enter the number of {0} available in the store: ').format(Item)))
    store['item_rate'][Item] = float(input(('Enter the rate of {0}: ').format(Item)))
    store.sync()
    store.close()
qid & accept id: (18540321, 18540971) query: python for loop using lambda syntax soup:

It appears that the OP meant his code fragment to be a framework for a loop and not actually complete code for what he was trying to do. This is probably closer to his intent:

\n
val = []\nfor c in range(len(x)//4):\n    val.append(x[c*4:c*4+4])\n\nfor a, b, c, d in val:\n\n    ...  do something with a, b, c, d  ...\n
\n

For the example list from his comments above:

\n
x = ['1', '1377877381', 'off', '0', \n     '2', '1377886582', 'on', '0', \n     '3', '1376238596', 'off', '0', \n     '4', '1377812526', 'off', '0']\n
\n

val ends up containing

\n
[['1', '1377877381', 'off', '0'],\n ['2', '1377886582', 'on', '0'],\n ['3', '1376238596', 'off', '0'],\n ['4', '1377812526', 'off', '0']]\n
\n

An equivalent line using map() and lambda looks like this:

\n
val = map(lambda y: x[y*4:y*4+4], range(len(x)//4))\n
\n

One could now loop over this call to map, assigning to a, b, c, d, and do whatever:

\n
for a, b, c, d in map(lambda y: x[y*4:y*4+4], range(len(x)//4)):\n\n    ...  more code ...\n
\n

Hope that helps a bit.

\n

Of course, the more efficient way to do this (particularly for long lists) would be to use itertools.imap() and xrange() (in python 2.x. If you're using Python 3.x, just leave it "range.") This approach doesn't actually construct the complete resulting list, but still allows iterating over it.

\n

Here's my Python 2.x version:

\n
from itertools import imap\n\nfor a, b, c, d in imap(lambda y: x[y*4:y*4+4], xrange(len(x)//4)):\n\n    ...  more code  ...\n
\n soup wrap:

It appears that the OP meant his code fragment to be a framework for a loop and not actually complete code for what he was trying to do. This is probably closer to his intent:

val = []
for c in range(len(x)//4):
    val.append(x[c*4:c*4+4])

for a, b, c, d in val:

    ...  do something with a, b, c, d  ...

For the example list from his comments above:

x = ['1', '1377877381', 'off', '0', 
     '2', '1377886582', 'on', '0', 
     '3', '1376238596', 'off', '0', 
     '4', '1377812526', 'off', '0']

val ends up containing

[['1', '1377877381', 'off', '0'],
 ['2', '1377886582', 'on', '0'],
 ['3', '1376238596', 'off', '0'],
 ['4', '1377812526', 'off', '0']]

An equivalent line using map() and lambda looks like this:

val = map(lambda y: x[y*4:y*4+4], range(len(x)//4))

One could now loop over this call to map, assigning to a, b, c, d, and do whatever:

for a, b, c, d in map(lambda y: x[y*4:y*4+4], range(len(x)//4)):

    ...  more code ...

Hope that helps a bit.

Of course, the more efficient way to do this (particularly for long lists) would be to use itertools.imap() and xrange() (in python 2.x. If you're using Python 3.x, just leave it "range.") This approach doesn't actually construct the complete resulting list, but still allows iterating over it.

Here's my Python 2.x version:

from itertools import imap

for a, b, c, d in imap(lambda y: x[y*4:y*4+4], xrange(len(x)//4)):

    ...  more code  ...
qid & accept id: (18556448, 18556474) query: String of list or strings to a tuple soup:

Type checking would be a great way to achieve this. How would you otherwise decide if the input was a list or a string?.

\n

You could make a function which tests if the input is a list or a string and returns appropriately and handles the rest as you see fit. Something along the lines of

\n
>>> def convert_to_tuple(elem):\n        if isinstance(elem, list):\n            return tuple(elem)\n        elif isinstance(elem, basestring):\n            return (elem,)\n        else:\n            # Do Something\n            pass\n\n\n>>> convert_to_tuple('abc')\n('abc',)\n>>> convert_to_tuple(['abc', 'def'])\n('abc', 'def')\n
\n

You could only check for strings too, (Assuming Python 2.x, replace basestring with str in Py3)

\n
>>> def convert_to_tuple(elem):\n        if isinstance(elem, basestring):\n            return (elem,)\n        else:\n            return tuple(elem)\n\n\n>>> convert_to_tuple('abc')\n('abc',)\n>>> convert_to_tuple(('abc', 'def'))\n('abc', 'def')\n>>> convert_to_tuple(['abc', 'def'])\n('abc', 'def')\n
\n

Converting the function to a one liner is possible too.

\n
>>> def convert_to_tuple(elem):\n        return (elem,) if isinstance(elem, basestring) else tuple(elem)\n
\n soup wrap:

Type checking would be a great way to achieve this. How would you otherwise decide if the input was a list or a string?.

You could make a function which tests if the input is a list or a string and returns appropriately and handles the rest as you see fit. Something along the lines of

>>> def convert_to_tuple(elem):
        if isinstance(elem, list):
            return tuple(elem)
        elif isinstance(elem, basestring):
            return (elem,)
        else:
            # Do Something
            pass


>>> convert_to_tuple('abc')
('abc',)
>>> convert_to_tuple(['abc', 'def'])
('abc', 'def')

You could only check for strings too, (Assuming Python 2.x, replace basestring with str in Py3)

>>> def convert_to_tuple(elem):
        if isinstance(elem, basestring):
            return (elem,)
        else:
            return tuple(elem)


>>> convert_to_tuple('abc')
('abc',)
>>> convert_to_tuple(('abc', 'def'))
('abc', 'def')
>>> convert_to_tuple(['abc', 'def'])
('abc', 'def')

Converting the function to a one liner is possible too.

>>> def convert_to_tuple(elem):
        return (elem,) if isinstance(elem, basestring) else tuple(elem)
qid & accept id: (18580032, 18580058) query: Remove elements of one list from another, while keeping duplicates soup:

You are looking for multisets, really. Use collections.Counter(), the Python implementation of a multiset:

\n
from collections import Counter\n\nacount = Counter(a)\nbcount = Counter(b)\nresult = list((acount - bcount).elements())\n
\n

Demo:

\n
>>> from collections import Counter\n>>> a = ['a', 'a', 'b', 'c', 'c', 'c', 'd', 'e', 'f']\n>>> b = ['a', 'b', 'c', 'd', 'e', 'f']\n>>> Counter(a) - Counter(b)\nCounter({'c': 2, 'a': 1})\n>>> list((Counter(a) - Counter(b)).elements())\n['a', 'c', 'c']\n
\n

You may want to retain the Counter() instances however; but if you need it the Counter.elements() method generates a sequence of elements times their count to produce your desired output again.

\n soup wrap:

You are looking for multisets, really. Use collections.Counter(), the Python implementation of a multiset:

from collections import Counter

acount = Counter(a)
bcount = Counter(b)
result = list((acount - bcount).elements())

Demo:

>>> from collections import Counter
>>> a = ['a', 'a', 'b', 'c', 'c', 'c', 'd', 'e', 'f']
>>> b = ['a', 'b', 'c', 'd', 'e', 'f']
>>> Counter(a) - Counter(b)
Counter({'c': 2, 'a': 1})
>>> list((Counter(a) - Counter(b)).elements())
['a', 'c', 'c']

You may want to retain the Counter() instances however; but if you need it the Counter.elements() method generates a sequence of elements times their count to produce your desired output again.

qid & accept id: (18589821, 18590046) query: Create a Series from a Pandas DataFrame by choosing an element from different columns on each row soup:

One solution could be to use get dummies (which should be more efficient that apply):

\n
In [11]: (pd.get_dummies(useProb) * pred).sum(axis=1)\nOut[11]:\nTimestamp\n2010-12-21 00:00:00    0\n2010-12-20 00:00:00    1\n2010-12-17 00:00:00    1\n2010-12-16 00:00:00    1\n2010-12-15 00:00:00    1\n2010-12-14 00:00:00    1\n2010-12-13 00:00:00    0\n2010-12-10 00:00:00    1\n2010-12-09 00:00:00    1\n2010-12-08 00:00:00    0\ndtype: float64\n
\n

You could use an apply with a couple of locs:

\n
In [21]: pred.apply(lambda row: row.loc[useProb.loc[row.name]], axis=1)\nOut[21]:\nTimestamp\n2010-12-21 00:00:00    0\n2010-12-20 00:00:00    1\n2010-12-17 00:00:00    1\n2010-12-16 00:00:00    1\n2010-12-15 00:00:00    1\n2010-12-14 00:00:00    1\n2010-12-13 00:00:00    0\n2010-12-10 00:00:00    1\n2010-12-09 00:00:00    1\n2010-12-08 00:00:00    0\ndtype: int64\n
\n

The trick being that you have access to the rows index via the name property.

\n soup wrap:

One solution could be to use get dummies (which should be more efficient that apply):

In [11]: (pd.get_dummies(useProb) * pred).sum(axis=1)
Out[11]:
Timestamp
2010-12-21 00:00:00    0
2010-12-20 00:00:00    1
2010-12-17 00:00:00    1
2010-12-16 00:00:00    1
2010-12-15 00:00:00    1
2010-12-14 00:00:00    1
2010-12-13 00:00:00    0
2010-12-10 00:00:00    1
2010-12-09 00:00:00    1
2010-12-08 00:00:00    0
dtype: float64

You could use an apply with a couple of locs:

In [21]: pred.apply(lambda row: row.loc[useProb.loc[row.name]], axis=1)
Out[21]:
Timestamp
2010-12-21 00:00:00    0
2010-12-20 00:00:00    1
2010-12-17 00:00:00    1
2010-12-16 00:00:00    1
2010-12-15 00:00:00    1
2010-12-14 00:00:00    1
2010-12-13 00:00:00    0
2010-12-10 00:00:00    1
2010-12-09 00:00:00    1
2010-12-08 00:00:00    0
dtype: int64

The trick being that you have access to the rows index via the name property.

qid & accept id: (18590658, 18595877) query: How to link PyQt4 script button to activate another script? soup:

In your main script, you need to make it accept arguments. A really simple way to do that is do something like this:

\n
# dd.py\nimport sys\ndef main(arg):\n    # do something here\n    print arg\n\nif __name__ == "__main__":\n    arg = sys.argv[1]\n    main(arg)\n
\n

Then in your GUI, you would use the subprocess module to call your main script and pass the argument. So in your button's event handler, you'd do something like this:

\n
subprocess.Popen("/path/to/dd.py", arg)\n
\n

If you need to be able to pass switches or flags along with arguments, you should read up on argparse or optparse, depending on which version of Python you're using.

\n soup wrap:

In your main script, you need to make it accept arguments. A really simple way to do that is do something like this:

# dd.py
import sys
def main(arg):
    # do something here
    print arg

if __name__ == "__main__":
    arg = sys.argv[1]
    main(arg)

Then in your GUI, you would use the subprocess module to call your main script and pass the argument. So in your button's event handler, you'd do something like this:

subprocess.Popen("/path/to/dd.py", arg)

If you need to be able to pass switches or flags along with arguments, you should read up on argparse or optparse, depending on which version of Python you're using.

qid & accept id: (18608160, 18608213) query: Python: How to time script from beginning to end? soup:

Try datetime module.

\n
from datetime import datetime\nstart = datetime.now()\n
\n

Later

\n
difference = datetime.now() - start\n
\n soup wrap:

Try datetime module.

from datetime import datetime
start = datetime.now()

Later

difference = datetime.now() - start
qid & accept id: (18619131, 18619544) query: Solve equation with a set of points soup:

Just use curve_fit in scipy.optimize:

\n
import numpy as np\nfrom scipy.optimize import curve_fit\nfrom pylab import *\n\ndef myFunc(t, V, W, k):\n    y = V * t - ((V - W) * (1 - np.exp(-k * t)) / k)\n    return y\n\n# this generates some fake data to fit. For youm just read in the \n# data in CSV or whatever you've\nx = np.linspace(0,4,50)\ny = myFunc(x, 2.5, 1.3, 0.5)\n# add some noise to the fake data to make it more realistic. . .\nyn = y + 0.2*np.random.normal(size=len(x))\n\n#fit the data, return the best fit parameters and the covariance matrix\npopt, pcov = curve_fit(myFunc, x, yn)\nprint popt\nprint pcov\n\n#plot the data\nclf()\nplot(x, yn, "rs")\n#overplot the best fit curve\nplot(x, myFunc(x, popt[0], popt[1], popt[2]))\ngrid(True)\nshow()\n
\n

This gives something like the plot below. The red points are the (noisy) data, and the blue line is the best fit curve, with the best fitting parameters for that particular data of:

\n
[ 2.32751132, 1.27686053, 0.65986596]\n
\n

Which is pretty close to the expected parameters of 2.5, 1.3, 0.5. The difference is due to the noise that I added in to the fake data.

\n

example of fit

\n soup wrap:

Just use curve_fit in scipy.optimize:

import numpy as np
from scipy.optimize import curve_fit
from pylab import *

def myFunc(t, V, W, k):
    y = V * t - ((V - W) * (1 - np.exp(-k * t)) / k)
    return y

# this generates some fake data to fit. For youm just read in the 
# data in CSV or whatever you've
x = np.linspace(0,4,50)
y = myFunc(x, 2.5, 1.3, 0.5)
# add some noise to the fake data to make it more realistic. . .
yn = y + 0.2*np.random.normal(size=len(x))

#fit the data, return the best fit parameters and the covariance matrix
popt, pcov = curve_fit(myFunc, x, yn)
print popt
print pcov

#plot the data
clf()
plot(x, yn, "rs")
#overplot the best fit curve
plot(x, myFunc(x, popt[0], popt[1], popt[2]))
grid(True)
show()

This gives something like the plot below. The red points are the (noisy) data, and the blue line is the best fit curve, with the best fitting parameters for that particular data of:

[ 2.32751132, 1.27686053, 0.65986596]

Which is pretty close to the expected parameters of 2.5, 1.3, 0.5. The difference is due to the noise that I added in to the fake data.

example of fit

qid & accept id: (18631669, 19129868) query: Django Celery get task count soup:

If your broker is configured as redis://localhost:6379/1, and your tasks are submitted to the general celery queue, then you can get the length by the following means:

\n
import redis\nqueue_name = "celery"\nclient = redis.Redis(host="localhost", port=6379, db=1)\nlength = client.llen(queue_name)\n
\n

Or, from a shell script (good for monitors and such):

\n
$ redis-cli -n 1 -h localhost -p 6379 llen celery\n
\n soup wrap:

If your broker is configured as redis://localhost:6379/1, and your tasks are submitted to the general celery queue, then you can get the length by the following means:

import redis
queue_name = "celery"
client = redis.Redis(host="localhost", port=6379, db=1)
length = client.llen(queue_name)

Or, from a shell script (good for monitors and such):

$ redis-cli -n 1 -h localhost -p 6379 llen celery
qid & accept id: (18646076, 18646275) query: Add numpy array as column to Pandas data frame soup:
import numpy as np\nimport pandas as pd\nimport scipy.sparse as sparse\n\ndf = pd.DataFrame(np.arange(1,10).reshape(3,3))\narr = sparse.coo_matrix(([1,1,1], ([0,1,2], [1,2,0])), shape=(3,3))\ndf['newcol'] = arr.toarray().tolist()\nprint(df)\n
\n

yields

\n
   0  1  2     newcol\n0  1  2  3  [0, 1, 0]\n1  4  5  6  [0, 0, 1]\n2  7  8  9  [1, 0, 0]\n
\n soup wrap:
import numpy as np
import pandas as pd
import scipy.sparse as sparse

df = pd.DataFrame(np.arange(1,10).reshape(3,3))
arr = sparse.coo_matrix(([1,1,1], ([0,1,2], [1,2,0])), shape=(3,3))
df['newcol'] = arr.toarray().tolist()
print(df)

yields

   0  1  2     newcol
0  1  2  3  [0, 1, 0]
1  4  5  6  [0, 0, 1]
2  7  8  9  [1, 0, 0]
qid & accept id: (18673420, 18673559) query: Indexing pandas dataframe to return first data point from each day soup:

With this setup:

\n
import pandas as pd\ndata = '''\\n2013-01-01 01:00\n2013-01-01 05:00\n2013-01-01 14:00\n2013-01-02 01:00\n2013-01-02 05:00\n2013-01-04 14:00'''\ndates = pd.to_datetime(data.splitlines())\ndf = pd.DataFrame({'date': dates, 'val': range(len(dates))})\n\n>>> df\n                 date  val\n0 2013-01-01 01:00:00    0\n1 2013-01-01 05:00:00    1\n2 2013-01-01 14:00:00    2\n3 2013-01-02 01:00:00    3\n4 2013-01-02 05:00:00    4\n5 2013-01-04 14:00:00    5\n
\n

You can produce the desired DataFrame using groupby and agg:

\n
grouped = df.groupby([d.strftime('%Y%m%d') for d in df['date']])\nnewdf = grouped.agg('first')\nprint(newdf)\n
\n

yields

\n
                        date  val\n20130101 2013-01-01 01:00:00    0\n20130102 2013-01-02 01:00:00    3\n20130104 2013-01-04 14:00:00    5\n
\n soup wrap:

With this setup:

import pandas as pd
data = '''\
2013-01-01 01:00
2013-01-01 05:00
2013-01-01 14:00
2013-01-02 01:00
2013-01-02 05:00
2013-01-04 14:00'''
dates = pd.to_datetime(data.splitlines())
df = pd.DataFrame({'date': dates, 'val': range(len(dates))})

>>> df
                 date  val
0 2013-01-01 01:00:00    0
1 2013-01-01 05:00:00    1
2 2013-01-01 14:00:00    2
3 2013-01-02 01:00:00    3
4 2013-01-02 05:00:00    4
5 2013-01-04 14:00:00    5

You can produce the desired DataFrame using groupby and agg:

grouped = df.groupby([d.strftime('%Y%m%d') for d in df['date']])
newdf = grouped.agg('first')
print(newdf)

yields

                        date  val
20130101 2013-01-01 01:00:00    0
20130102 2013-01-02 01:00:00    3
20130104 2013-01-04 14:00:00    5
qid & accept id: (18679264, 18679558) query: How to use malloc and free with python ctypes? soup:

You can allocate buffers using ctypes and assign them to the pointers. Once the Python ctypes objects have no references they will be freed automatically. Here's a simple example (with a Windows DLL...don't have a Linux machine handy, but the idea is the same) and a Python wrapper.

\n

create_string_buffer allocates a writable buffer that can be passed from Python to C that ctypes will marshal as a char*.

\n

You can also create writable arrays of ctypes types with the syntax:

\n
variable_name = (ctypes_type * length)(initial_values)\n
\n

x.h

\n
#ifdef DLL_EXPORTS\n#define DLL_API __declspec(dllexport)\n#else\n#define DLL_API __declspec(dllimport)\n#endif\n\nstruct example {\n    char* data;\n    int len;          // of data buffer\n    double* doubles;\n    int count;        // of doubles\n};\n\nDLL_API void func(struct example* p);\n
\n

x.c

\n
#include \n#define DLL_EXPORTS\n#include "x.h"\n\nvoid func(struct example* p)\n{\n    int i;\n    strcpy_s(p->data,p->len,"hello, world!");\n    for(i = 0; i < p->count; i++)\n        p->doubles[i] = 1.1 * (i + 1);\n}\n
\n

x.py

\n
import ctypes\n\nclass Example(ctypes.Structure):\n\n    _fields_ = [\n        ('data',ctypes.POINTER(ctypes.c_char)),\n        ('len',ctypes.c_int),\n        ('doubles',ctypes.POINTER(ctypes.c_double)),\n        ('count',ctypes.c_int)]\n\n    def __init__(self,length,count):\n        self.data = ctypes.cast(ctypes.create_string_buffer(length),ctypes.POINTER(ctypes.c_char))\n        self.len = length\n        self.doubles = (ctypes.c_double * count)()\n        self.count = count\n\n    def __repr__(self):\n        return 'Example({},[{}])'.format(\n            ctypes.string_at(self.data),\n            ','.join(str(self.doubles[i]) for i in range(self.count)))\n\nclass Dll:\n\n    def __init__(self):\n        self.dll = ctypes.CDLL('x')\n        self.dll.func.argtypes = [ctypes.POINTER(Example)]\n        self.dll.func.restype = None\n\n    def func(self,ex):\n        self.dll.func(ctypes.byref(ex))\n\nd = Dll()\ne = Example(20,5)\nprint('before:',e)\nd.func(e)\nprint ('after:',e)\n
\n

Output

\n
before: Example(b'',[0.0,0.0,0.0,0.0,0.0])\nafter: Example(b'hello, world!',[1.1,2.2,3.3000000000000003,4.4,5.5])\n
\n soup wrap:

You can allocate buffers using ctypes and assign them to the pointers. Once the Python ctypes objects have no references they will be freed automatically. Here's a simple example (with a Windows DLL...don't have a Linux machine handy, but the idea is the same) and a Python wrapper.

create_string_buffer allocates a writable buffer that can be passed from Python to C that ctypes will marshal as a char*.

You can also create writable arrays of ctypes types with the syntax:

variable_name = (ctypes_type * length)(initial_values)

x.h

#ifdef DLL_EXPORTS
#define DLL_API __declspec(dllexport)
#else
#define DLL_API __declspec(dllimport)
#endif

struct example {
    char* data;
    int len;          // of data buffer
    double* doubles;
    int count;        // of doubles
};

DLL_API void func(struct example* p);

x.c

#include 
#define DLL_EXPORTS
#include "x.h"

void func(struct example* p)
{
    int i;
    strcpy_s(p->data,p->len,"hello, world!");
    for(i = 0; i < p->count; i++)
        p->doubles[i] = 1.1 * (i + 1);
}

x.py

import ctypes

class Example(ctypes.Structure):

    _fields_ = [
        ('data',ctypes.POINTER(ctypes.c_char)),
        ('len',ctypes.c_int),
        ('doubles',ctypes.POINTER(ctypes.c_double)),
        ('count',ctypes.c_int)]

    def __init__(self,length,count):
        self.data = ctypes.cast(ctypes.create_string_buffer(length),ctypes.POINTER(ctypes.c_char))
        self.len = length
        self.doubles = (ctypes.c_double * count)()
        self.count = count

    def __repr__(self):
        return 'Example({},[{}])'.format(
            ctypes.string_at(self.data),
            ','.join(str(self.doubles[i]) for i in range(self.count)))

class Dll:

    def __init__(self):
        self.dll = ctypes.CDLL('x')
        self.dll.func.argtypes = [ctypes.POINTER(Example)]
        self.dll.func.restype = None

    def func(self,ex):
        self.dll.func(ctypes.byref(ex))

d = Dll()
e = Example(20,5)
print('before:',e)
d.func(e)
print ('after:',e)

Output

before: Example(b'',[0.0,0.0,0.0,0.0,0.0])
after: Example(b'hello, world!',[1.1,2.2,3.3000000000000003,4.4,5.5])
qid & accept id: (18682304, 18682346) query: regex condition that returns only if a " [word]" does not trail at the end soup:

The regex can be shortened by quite a few characters:

\n
(?
\n

And so that it doesn't match 2013-29883 followed by Dog, use another negative lookahead:

\n
(?
\n soup wrap:

The regex can be shortened by quite a few characters:

(?

And so that it doesn't match 2013-29883 followed by Dog, use another negative lookahead:

(?
qid & accept id: (18689862, 18689914) query: Implementing fancy indexing in a class soup:

The only thing special about the slice notation is the shorthand for slice, which makes your code equivalent to:

\n
obj[(slice(1, 3), slice(None, None))]\n
\n

The parameter passed to __getitem__ is the "index" of the item, which can be any object:

\n
def __getitem__(self, index):\n    if isinstance(index, tuple):\n        # foo[1:2, 3:4]\n    elif isinstance(index, slice)\n        # foo[1:2]\n    else:\n        # foo[1]\n
\n soup wrap:

The only thing special about the slice notation is the shorthand for slice, which makes your code equivalent to:

obj[(slice(1, 3), slice(None, None))]

The parameter passed to __getitem__ is the "index" of the item, which can be any object:

def __getitem__(self, index):
    if isinstance(index, tuple):
        # foo[1:2, 3:4]
    elif isinstance(index, slice)
        # foo[1:2]
    else:
        # foo[1]
qid & accept id: (18696264, 18696288) query: How to get a list of Xth elements from a list of tuples? soup:

You could use zip(), applying sset using the splat arbitrary arguments syntax *args:

\n
x, y = zip(*sset)\n
\n

Demo:

\n
>>> sset = [('foo',1),('bar',3),('zzz',9)]\n>>> x, y = zip(*sset)\n>>> x\n('foo', 'bar', 'zzz')\n>>> y\n(1, 3, 9)\n
\n

This creates tuples, not lists; you can map the zip() output to lists as required:

\n
x, y = map(list, zip(*sset))\n
\n soup wrap:

You could use zip(), applying sset using the splat arbitrary arguments syntax *args:

x, y = zip(*sset)

Demo:

>>> sset = [('foo',1),('bar',3),('zzz',9)]
>>> x, y = zip(*sset)
>>> x
('foo', 'bar', 'zzz')
>>> y
(1, 3, 9)

This creates tuples, not lists; you can map the zip() output to lists as required:

x, y = map(list, zip(*sset))
qid & accept id: (18706785, 18711270) query: Efficiently displaying a stacked bar graph soup:

I see, I did not fully grasp the fact that you are trying to make ~1M bars which is very memory intensive. I would suggest something like this:

\n
import numpy as np\nfrom itertools import izip, cycle\nimport matplotlib.pyplot as plt\nfrom collections import defaultdict\n\nN = 100\n\nfake_data = {}\nfor j in range(97, 104):\n    lab = chr(j)\n    fake_data[lab] = np.cumsum(np.random.rand(N) > np.random.rand(1))\n\ncolors = cycle(['r', 'g', 'b', 'm', 'c', 'Orange', 'Pink'])\n\n# fig, ax = plt.subplots(1, 1, tight_layout=True) # if your mpl is newenough\nfig, ax = plt.subplots(1, 1) # other wise\nax.set_xlabel('time')\nax.set_ylabel('counts')\ncum_array = np.zeros(N*2 - 1) # to keep track of the bottoms\nx = np.vstack([arange(N), arange(N)]).T.ravel()[1:] # [0, 1, 1, 2, 2, ..., N-2, N-2, N-1, N-1]\nhands = []\nlabs = []\nfor k, c in izip(sorted(fake_data.keys()), colors):\n    d = fake_data[k]\n    dd = np.vstack([d, d]).T.ravel()[:-1]  # double up the data to match the x values [x0, x0, x1, x1, ... xN-2, xN-1]\n    ax.fill_between(x, dd + cum_array, cum_array,  facecolor=c, label=k, edgecolor='none') # fill the region\n    cum_array += dd                       # update the base line\n    # make a legend entry\n    hands.append(matplotlib.patches.Rectangle([0, 0], 1, 1, color=c)) # dummy artist\n    labs.append(k)                        # label\n\nax.set_xlim([0, N - 1]) # set the limits \nax.legend(hands, labs, loc=2)             #add legend\nplt.show()                                #make sure it shows\n
\n

for N=100:

\n

N=100 demo

\n

for N=100000:

\n

N=100000

\n

This uses ~few hundred megs.

\n

As a side note, the data parsing could be be further simplified to this:

\n
import numpy as np\nfrom itertools import izip\nimport matplotlib.pyplot as plt\nfrom collections import defaultdict\n\n# this requires you to know a head of time how many times you have\nlen = 10\nd = defaultdict(lambda : np.zeros(len, dtype=np.bool)) # save space!\nwith open('test.txt', 'r') as infile:\n    infile.next() # skip the header line\n    for line in infile:\n        tokens = line.rstrip().split(" ")\n        time = int(tokens[0]) # get the time which is the first token\n        for e in tokens[1:]:  # loop over the rest\n            if len(e) == 0:\n                pass\n            d[e][time] = True\n\nfor k in d:\n    d[k] = np.cumsum(d[k])\n
\n

not strictly tested, but I think it should work.

\n soup wrap:

I see, I did not fully grasp the fact that you are trying to make ~1M bars which is very memory intensive. I would suggest something like this:

import numpy as np
from itertools import izip, cycle
import matplotlib.pyplot as plt
from collections import defaultdict

N = 100

fake_data = {}
for j in range(97, 104):
    lab = chr(j)
    fake_data[lab] = np.cumsum(np.random.rand(N) > np.random.rand(1))

colors = cycle(['r', 'g', 'b', 'm', 'c', 'Orange', 'Pink'])

# fig, ax = plt.subplots(1, 1, tight_layout=True) # if your mpl is newenough
fig, ax = plt.subplots(1, 1) # other wise
ax.set_xlabel('time')
ax.set_ylabel('counts')
cum_array = np.zeros(N*2 - 1) # to keep track of the bottoms
x = np.vstack([arange(N), arange(N)]).T.ravel()[1:] # [0, 1, 1, 2, 2, ..., N-2, N-2, N-1, N-1]
hands = []
labs = []
for k, c in izip(sorted(fake_data.keys()), colors):
    d = fake_data[k]
    dd = np.vstack([d, d]).T.ravel()[:-1]  # double up the data to match the x values [x0, x0, x1, x1, ... xN-2, xN-1]
    ax.fill_between(x, dd + cum_array, cum_array,  facecolor=c, label=k, edgecolor='none') # fill the region
    cum_array += dd                       # update the base line
    # make a legend entry
    hands.append(matplotlib.patches.Rectangle([0, 0], 1, 1, color=c)) # dummy artist
    labs.append(k)                        # label

ax.set_xlim([0, N - 1]) # set the limits 
ax.legend(hands, labs, loc=2)             #add legend
plt.show()                                #make sure it shows

for N=100:

N=100 demo

for N=100000:

N=100000

This uses ~few hundred megs.

As a side note, the data parsing could be be further simplified to this:

import numpy as np
from itertools import izip
import matplotlib.pyplot as plt
from collections import defaultdict

# this requires you to know a head of time how many times you have
len = 10
d = defaultdict(lambda : np.zeros(len, dtype=np.bool)) # save space!
with open('test.txt', 'r') as infile:
    infile.next() # skip the header line
    for line in infile:
        tokens = line.rstrip().split(" ")
        time = int(tokens[0]) # get the time which is the first token
        for e in tokens[1:]:  # loop over the rest
            if len(e) == 0:
                pass
            d[e][time] = True

for k in d:
    d[k] = np.cumsum(d[k])

not strictly tested, but I think it should work.

qid & accept id: (18727686, 18728381) query: Numpy: averaging many datapoints at each time step soup:

May I propose a pandas solution. It is highly recommended if you are going to be working with time series.

\n

Create test data

\n
import pandas as pd\nimport numpy as np\n\ntimes = np.random.randint(0,10,size=50)\nvalues = np.sin(times) + np.random.random_sample((len(times),))\ns = pd.Series(values, index=times)\ns.plot(linestyle='.', marker='o')\n
\n

enter image description here

\n

Calculate averages

\n
avs = s.groupby(level=0).mean()\navs.plot()\n
\n

enter image description here

\n soup wrap:

May I propose a pandas solution. It is highly recommended if you are going to be working with time series.

Create test data

import pandas as pd
import numpy as np

times = np.random.randint(0,10,size=50)
values = np.sin(times) + np.random.random_sample((len(times),))
s = pd.Series(values, index=times)
s.plot(linestyle='.', marker='o')

enter image description here

Calculate averages

avs = s.groupby(level=0).mean()
avs.plot()

enter image description here

qid & accept id: (18765094, 18766100) query: Protection against downloading too big files soup:

Every http client lib I know (at least in Python) gives you or can give you a stream:

\n
>>> import requests\n>>> r = requests.get('https://example.com/big-file', stream=True)\n>>> r.raw\n\n
\n

Now you have response headers available, maybe Content-Length is present:

\n
>>> r.headers.get("content-length")\n'33236'\n
\n

It's up to you how you read from the stream:

\n
>>> r.raw.read(10)\n'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'\n
\n

If I wanted to limit the download by max time and max size, I would do something like this:

\n
t0 = time.time()\ntotal_size = 0\nwhile True:\n    if time.time() - t0 > time_limit:\n        raise Exception("Too much time taken")\n    if total_size > size_limit:\n        raise Exception("Too large")\n    data = r.raw.read(8192)\n    if data == "":\n        break  # end of file\n    total_size += len(data)\n    output_file.write(data)\n
\n

The web server doesn't stop working when you quit HTTP connection prematurely :)

\n soup wrap:

Every http client lib I know (at least in Python) gives you or can give you a stream:

>>> import requests
>>> r = requests.get('https://example.com/big-file', stream=True)
>>> r.raw

Now you have response headers available, maybe Content-Length is present:

>>> r.headers.get("content-length")
'33236'

It's up to you how you read from the stream:

>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

If I wanted to limit the download by max time and max size, I would do something like this:

t0 = time.time()
total_size = 0
while True:
    if time.time() - t0 > time_limit:
        raise Exception("Too much time taken")
    if total_size > size_limit:
        raise Exception("Too large")
    data = r.raw.read(8192)
    if data == "":
        break  # end of file
    total_size += len(data)
    output_file.write(data)

The web server doesn't stop working when you quit HTTP connection prematurely :)

qid & accept id: (18765340, 18766262) query: Pygame - Getting a rectangle for a dynamically drawn object soup:

Just create a new Surface and fill it with the right color:

\n
class raquete(pygame.sprite.Sprite):\n\n    def __init__(self, x, y, l_raquete, a_raquete):\n        pygame.sprite.Sprite.__init__(self)\n        self.image = pygame.Surface((l_raquete, a_raquete))\n        # I guess branco means color\n        self.image.fill(branco) \n        # no need for the x and y members, \n        # since we store the position in self.rect already\n        self.rect = self.image.get_rect(x=x, y=y) \n
\n

Since you're already using the Sprite class, what's the point of the imprime function anyway? Just use a pygame.sprite.Group to draw your sprites to the screen. That said, the rect member of a Sprite is used for positioning, so you can simplify your bola class to:

\n
class bola(pygame.sprite.Sprite):\n\n    def __init__(self, x, y, imagem_bola):\n        pygame.sprite.Sprite.__init__(self)\n        # always call convert() on loaded images\n        # so the surface will have the right pixel format\n        self.image = pygame.image.load(imagem_bola).convert()\n        self.rect = self.image.get_rect(x=x, y=y)\n
\n soup wrap:

Just create a new Surface and fill it with the right color:

class raquete(pygame.sprite.Sprite):

    def __init__(self, x, y, l_raquete, a_raquete):
        pygame.sprite.Sprite.__init__(self)
        self.image = pygame.Surface((l_raquete, a_raquete))
        # I guess branco means color
        self.image.fill(branco) 
        # no need for the x and y members, 
        # since we store the position in self.rect already
        self.rect = self.image.get_rect(x=x, y=y) 

Since you're already using the Sprite class, what's the point of the imprime function anyway? Just use a pygame.sprite.Group to draw your sprites to the screen. That said, the rect member of a Sprite is used for positioning, so you can simplify your bola class to:

class bola(pygame.sprite.Sprite):

    def __init__(self, x, y, imagem_bola):
        pygame.sprite.Sprite.__init__(self)
        # always call convert() on loaded images
        # so the surface will have the right pixel format
        self.image = pygame.image.load(imagem_bola).convert()
        self.rect = self.image.get_rect(x=x, y=y)
qid & accept id: (18772706, 18772767) query: How to generate combination of fix length strings using a set of characters? soup:

You could use itertools.product:

\n
li = []\nfor i in itertools.product([0,1], repeat=4):\n    li.append(''.join(map(str, i)))\nprint (li)\n\n>>> li\n['0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111']\n
\n

Possible one liner:

\n
[''.join(map(str, i)) for i in itertools.product([0,1], repeat=4)]\n
\n soup wrap:

You could use itertools.product:

li = []
for i in itertools.product([0,1], repeat=4):
    li.append(''.join(map(str, i)))
print (li)

>>> li
['0000', '0001', '0010', '0011', '0100', '0101', '0110', '0111', '1000', '1001', '1010', '1011', '1100', '1101', '1110', '1111']

Possible one liner:

[''.join(map(str, i)) for i in itertools.product([0,1], repeat=4)]
qid & accept id: (18787830, 18787886) query: Search multiple strings in multiple files soup:

Simply use fgrep:

\n
fgrep -rlf messages.txt dir\n
\n

Or grep -f

\n
grep -Frlf messages.txt dir\n
\n

If you want to search for regex, don't use -F:

\n
grep -rlf messages.txt dir\n
\n

Update: If your lines messages.txt contain patterns like foo=bar, you can use cut with process substitution and cut in bash:

\n
grep -rlf <(cut -d = -f 2- messages.txt) dir\n
\n soup wrap:

Simply use fgrep:

fgrep -rlf messages.txt dir

Or grep -f

grep -Frlf messages.txt dir

If you want to search for regex, don't use -F:

grep -rlf messages.txt dir

Update: If your lines messages.txt contain patterns like foo=bar, you can use cut with process substitution and cut in bash:

grep -rlf <(cut -d = -f 2- messages.txt) dir
qid & accept id: (18802249, 18802525) query: How to make a calculator with strings and numbers as mixed input using parser python ply soup:

First, add token definition for english words

\n
t_plustext    = r'plus'\n
\n

Add those new tokens to tokens

\n
tokens = (\n    'NAME','NUMBER', 'times', 'divided_by', 'plus', 'minus', 'plustext', ....\n)\n
\n

Finally, use those new token in you grammar this way :

\n
def p_expression_binop(p):\n    '''expression : expression '+' expression\n                  | expression plustext expression\n    '''\n
\n

UPDATE : here is a working subset of the grammar

\n
#!/usr/bin/python\n\nfrom __future__ import print_function\n\nimport sys\nimport ply.lex as lex\nimport ply.yacc as yacc\n\n# ------- Calculator tokenizing rules\n\ntokens = (\n    'NUMBER', 'times', 'divided_by', 'plus', 'minus', 'plustext',\n    'one', 'two', 'three',\n)\n\nliterals = ['=','+','-','*','/', '(',')']\n\nt_ignore = " \t\n"\n\nt_plustext    = r'plus'\nt_plus    = r'\+'\nt_minus   = r'-'\nt_times   = r'\*'\nt_divided_by  = r'/'\nt_one = 'one'\nt_two = 'two'\nt_three = 'three'\n\ndef t_NUMBER(t):\n    r'\d+'\n    try:\n        t.value = int(t.value)\n    except ValueError:\n        print("Integer value too large %d", t.value)\n        t.value = 0\n    return t\n\nprecedence = (\n    ('left','+','-','plustext'),\n    ('left','times','divided_by'),\n    ('left','*','/'),\n)\n\n\ndef p_statement_expr(p):\n    'statement : expression'\n    p[0] = p[1]\n    print(p[1])\n\ndef p_expression_binop(p):\n    '''expression : expression '+' expression\n                  | expression plustext expression\n                  | expression '-' expression\n                  | expression '*' expression\n                  | expression '/' expression'''\n    if p[2] ==   '+'  : p[0] = p[1] + p[3]\n    elif p[2] == '-': p[0] = p[1] - p[3]\n    elif p[2] == '*': p[0] = p[1] * p[3]\n    elif p[2] == '/': p[0] = p[1] / p[3]\n    elif p[2] == 'plus': p[0] = p[1] + p[3]\n\ndef p_statement_lit(p):\n    '''expression : NUMBER\n          | TXTNUMBER\n    '''\n    p[0] = p[1]\n\ndef p_txtnumber(p):\n    '''TXTNUMBER : one\n         | two\n         | three\n    '''\n    p[0] = w2n(p[1])\n\ndef w2n(s):\n    if s == 'one': return 1\n    elif s == 'two': return 2\n    elif s == 'three': return 3\n    assert(False)\n    # See http://stackoverflow.com/questions/493174/is-there-a-way-to-convert-number-words-to-integers-python for a complete implementation\n\ndef process(data):\n    lex.lex()\n        yacc.yacc()\n        #yacc.parse(data, debug=1, tracking=True)\n        yacc.parse(data)\n\nif __name__ == "__main__":\n        data = open(sys.argv[1]).read()\n        process(data)\n
\n soup wrap:

First, add token definition for english words

t_plustext    = r'plus'

Add those new tokens to tokens

tokens = (
    'NAME','NUMBER', 'times', 'divided_by', 'plus', 'minus', 'plustext', ....
)

Finally, use those new token in you grammar this way :

def p_expression_binop(p):
    '''expression : expression '+' expression
                  | expression plustext expression
    '''

UPDATE : here is a working subset of the grammar

#!/usr/bin/python

from __future__ import print_function

import sys
import ply.lex as lex
import ply.yacc as yacc

# ------- Calculator tokenizing rules

tokens = (
    'NUMBER', 'times', 'divided_by', 'plus', 'minus', 'plustext',
    'one', 'two', 'three',
)

literals = ['=','+','-','*','/', '(',')']

t_ignore = " \t\n"

t_plustext    = r'plus'
t_plus    = r'\+'
t_minus   = r'-'
t_times   = r'\*'
t_divided_by  = r'/'
t_one = 'one'
t_two = 'two'
t_three = 'three'

def t_NUMBER(t):
    r'\d+'
    try:
        t.value = int(t.value)
    except ValueError:
        print("Integer value too large %d", t.value)
        t.value = 0
    return t

precedence = (
    ('left','+','-','plustext'),
    ('left','times','divided_by'),
    ('left','*','/'),
)


def p_statement_expr(p):
    'statement : expression'
    p[0] = p[1]
    print(p[1])

def p_expression_binop(p):
    '''expression : expression '+' expression
                  | expression plustext expression
                  | expression '-' expression
                  | expression '*' expression
                  | expression '/' expression'''
    if p[2] ==   '+'  : p[0] = p[1] + p[3]
    elif p[2] == '-': p[0] = p[1] - p[3]
    elif p[2] == '*': p[0] = p[1] * p[3]
    elif p[2] == '/': p[0] = p[1] / p[3]
    elif p[2] == 'plus': p[0] = p[1] + p[3]

def p_statement_lit(p):
    '''expression : NUMBER
          | TXTNUMBER
    '''
    p[0] = p[1]

def p_txtnumber(p):
    '''TXTNUMBER : one
         | two
         | three
    '''
    p[0] = w2n(p[1])

def w2n(s):
    if s == 'one': return 1
    elif s == 'two': return 2
    elif s == 'three': return 3
    assert(False)
    # See http://stackoverflow.com/questions/493174/is-there-a-way-to-convert-number-words-to-integers-python for a complete implementation

def process(data):
    lex.lex()
        yacc.yacc()
        #yacc.parse(data, debug=1, tracking=True)
        yacc.parse(data)

if __name__ == "__main__":
        data = open(sys.argv[1]).read()
        process(data)
qid & accept id: (18802563, 18802743) query: Efficiently Removing Very-Near-Duplicates From Python List soup:

You really want to use NumPy if you're handling large quantities of data. Here's how I would do it :

\n

Import NumPy :

\n
import numpy as np\n
\n

Generate 8000 high-precision floats (128-bits will be enough for your purposes, but note that I'm converting the 64-bits output of random to 128 just to fake it. Use your real data here.) :

\n
a = np.float128(np.random.random((8000,)))\n
\n

Find the indexes of the unique elements in the rounded array :

\n
_, unique = np.unique(a.round(decimals=14), return_index=True)\n
\n

And take those indexes from the original (non-rounded) array :

\n
no_duplicates = a[unique]\n
\n soup wrap:

You really want to use NumPy if you're handling large quantities of data. Here's how I would do it :

Import NumPy :

import numpy as np

Generate 8000 high-precision floats (128-bits will be enough for your purposes, but note that I'm converting the 64-bits output of random to 128 just to fake it. Use your real data here.) :

a = np.float128(np.random.random((8000,)))

Find the indexes of the unique elements in the rounded array :

_, unique = np.unique(a.round(decimals=14), return_index=True)

And take those indexes from the original (non-rounded) array :

no_duplicates = a[unique]
qid & accept id: (18817616, 18817697) query: removing excess spaces from a string (and counting them) soup:

Normally, I would regex this, but since that has already been suggested, here's a more DIY approach (just for completeness):

\n
def countSpaces(s):\n    answer = []\n    start = None\n    maxCount = 0\n    for i,char in enumerate(s):\n        if char == ' ':\n            if start is None:\n                start = i\n                answer.append(char)\n        else:\n            if start is not None:\n                maxCount = max(i-start-1, maxCount)\n                start = None\n            answer.append(char)\n    print("The whitespace normalized string is", ''.join(answer))\n    print("The maximum length of consecutive whitespace is", maxCount)\n
\n

Output:

\n
>>> s = "foo    bar  baz                        bam"\n>>> countSpaces(s)\nThe whitespace normalized string is foo bar baz bam\nThe maximum length of consecutive whitespace is 23\n
\n soup wrap:

Normally, I would regex this, but since that has already been suggested, here's a more DIY approach (just for completeness):

def countSpaces(s):
    answer = []
    start = None
    maxCount = 0
    for i,char in enumerate(s):
        if char == ' ':
            if start is None:
                start = i
                answer.append(char)
        else:
            if start is not None:
                maxCount = max(i-start-1, maxCount)
                start = None
            answer.append(char)
    print("The whitespace normalized string is", ''.join(answer))
    print("The maximum length of consecutive whitespace is", maxCount)

Output:

>>> s = "foo    bar  baz                        bam"
>>> countSpaces(s)
The whitespace normalized string is foo bar baz bam
The maximum length of consecutive whitespace is 23
qid & accept id: (18839875, 18839924) query: Union of many Counters soup:

Goodness, when did Python programmers become afraid of easy loops? LOL.

\n
result = Counter()\nfor c in counters:\n    result |= c\n
\n

There really aren't prizes in real life for squashing things into as few characters as theoretically possible. Well, ya, there are in Perl, but not in Python ;-)

\n

Later: pursuant to user2357112's comment, starting with Python 3.3 the code above will do "in place" unions into result. That is, result is truly reused, possibly growing larger on each iteration.

\n

In any spelling of

\n
counters[0] | counters[1] | counters[2] | ...\n
\n

instead, the entire partial result so far keeps getting thrown away when the next partial result is computed. That may - or may not - be a lot slower.

\n soup wrap:

Goodness, when did Python programmers become afraid of easy loops? LOL.

result = Counter()
for c in counters:
    result |= c

There really aren't prizes in real life for squashing things into as few characters as theoretically possible. Well, ya, there are in Perl, but not in Python ;-)

Later: pursuant to user2357112's comment, starting with Python 3.3 the code above will do "in place" unions into result. That is, result is truly reused, possibly growing larger on each iteration.

In any spelling of

counters[0] | counters[1] | counters[2] | ...

instead, the entire partial result so far keeps getting thrown away when the next partial result is computed. That may - or may not - be a lot slower.

qid & accept id: (18854425, 18855277) query: What is the best way to compute the trace of a matrix product in numpy? soup:

You can improve on @Bill's solution by reducing intermediate storage to the diagonal elements only:

\n
from numpy.core.umath_tests import inner1d\n\nm, n = 1000, 500\n\na = np.random.rand(m, n)\nb = np.random.rand(n, m)\n\n# They all should give the same result\nprint np.trace(a.dot(b))\nprint np.sum(a*b.T)\nprint np.sum(inner1d(a, b.T))\n\n%timeit np.trace(a.dot(b))\n10 loops, best of 3: 34.7 ms per loop\n\n%timeit np.sum(a*b.T)\n100 loops, best of 3: 4.85 ms per loop\n\n%timeit np.sum(inner1d(a, b.T))\n1000 loops, best of 3: 1.83 ms per loop\n
\n

Another option is to use np.einsum and have no explicit intermediate storage at all:

\n
# Will print the same as the others:\nprint np.einsum('ij,ji->', a, b)\n
\n

On my system it runs slightly slower than using inner1d, but it may not hold for all systems, see this question:

\n
%timeit np.einsum('ij,ji->', a, b)\n100 loops, best of 3: 1.91 ms per loop\n
\n soup wrap:

You can improve on @Bill's solution by reducing intermediate storage to the diagonal elements only:

from numpy.core.umath_tests import inner1d

m, n = 1000, 500

a = np.random.rand(m, n)
b = np.random.rand(n, m)

# They all should give the same result
print np.trace(a.dot(b))
print np.sum(a*b.T)
print np.sum(inner1d(a, b.T))

%timeit np.trace(a.dot(b))
10 loops, best of 3: 34.7 ms per loop

%timeit np.sum(a*b.T)
100 loops, best of 3: 4.85 ms per loop

%timeit np.sum(inner1d(a, b.T))
1000 loops, best of 3: 1.83 ms per loop

Another option is to use np.einsum and have no explicit intermediate storage at all:

# Will print the same as the others:
print np.einsum('ij,ji->', a, b)

On my system it runs slightly slower than using inner1d, but it may not hold for all systems, see this question:

%timeit np.einsum('ij,ji->', a, b)
100 loops, best of 3: 1.91 ms per loop
qid & accept id: (18919032, 18919791) query: How to see if section of python code completes within a given time soup:

For stuff like that I normally use the following construct:

\n
from threading import Timer\nimport thread\n\ndef run_with_timeout( timeout, func, *args, **kwargs ):\n    """ Function to execute a func for the maximal time of timeout.\n    [IN]timeout        Max execution time for the func\n    [IN]func           Reference of the function/method to be executed\n    [IN]args & kwargs  Will be passed to the func call\n    """\n    try:\n        # Raises a KeyboardInterrupt if timer triggers\n        timeout_timer = Timer( timeout, thread.interrupt_main )\n        timeout_timer.start()\n        return func( *args, **kwargs )\n    except KeyboardInterrupt:\n        print "run_with_timeout timed out, when running '%s'" %  func.__name__\n        #Normally I raise here my own exception\n    finally:\n        timeout_timer.cancel()\n
\n

Then the call would like:

\n
timeout = 5.2 #Time in sec\nfor i in range(len(arr1)):\n    res1 = run_with_timeout(timeout, foo1,arr1[i]))\n
\n soup wrap:

For stuff like that I normally use the following construct:

from threading import Timer
import thread

def run_with_timeout( timeout, func, *args, **kwargs ):
    """ Function to execute a func for the maximal time of timeout.
    [IN]timeout        Max execution time for the func
    [IN]func           Reference of the function/method to be executed
    [IN]args & kwargs  Will be passed to the func call
    """
    try:
        # Raises a KeyboardInterrupt if timer triggers
        timeout_timer = Timer( timeout, thread.interrupt_main )
        timeout_timer.start()
        return func( *args, **kwargs )
    except KeyboardInterrupt:
        print "run_with_timeout timed out, when running '%s'" %  func.__name__
        #Normally I raise here my own exception
    finally:
        timeout_timer.cancel()

Then the call would like:

timeout = 5.2 #Time in sec
for i in range(len(arr1)):
    res1 = run_with_timeout(timeout, foo1,arr1[i]))
qid & accept id: (18921302, 18994897) query: How to incrementally sample without replacement? soup:

Note to readers from OP: Please consider looking at the originally accepted answer to understand the logic, and then understand this answer.

\n

Aaaaaand for completeness sake: This is the concept of no_answer_not_upvoted’s answer, but adapted so it takes a list of forbidden numbers as input. This is just the same code as in my previous answer, but we build a state from forbid, before we generate numbers.

\n
    \n
  • This is time O(f+k) and memory O(f+k). Obviously this is the fastest thing possible without requirements towards the format of forbid (sorted/set). I think this makes this a winner in some way ^^.
  • \n
  • If forbid is a set, the repeated guessing method is faster with O(k⋅n/(n-(f+k))), which is very close to O(k) for f+k not very close to n.
  • \n
  • If forbid is sorted, my ridiculous algorithm is faster with:
    \nO(k⋅(log(f+k)+log²(n/(n-(f+k))))
  • \n
\n
import random\ndef sample_gen(n, forbid):\n    state = dict()\n    track = dict()\n    for (i, o) in enumerate(forbid):\n        x = track.get(o, o)\n        t = state.get(n-i-1, n-i-1)\n        state[x] = t\n        track[t] = x\n        state.pop(n-i-1, None)\n        track.pop(o, None)\n    del track\n    for remaining in xrange(n-len(forbid), 0, -1):\n        i = random.randrange(remaining)\n        yield state.get(i, i)\n        state[i] = state.get(remaining - 1, remaining - 1)\n        state.pop(remaining - 1, None)\n
\n

usage:

\n
gen = sample_gen(10, [1, 2, 4, 8])\nprint gen.next()\nprint gen.next()\nprint gen.next()\nprint gen.next()\n
\n soup wrap:

Note to readers from OP: Please consider looking at the originally accepted answer to understand the logic, and then understand this answer.

Aaaaaand for completeness sake: This is the concept of no_answer_not_upvoted’s answer, but adapted so it takes a list of forbidden numbers as input. This is just the same code as in my previous answer, but we build a state from forbid, before we generate numbers.

  • This is time O(f+k) and memory O(f+k). Obviously this is the fastest thing possible without requirements towards the format of forbid (sorted/set). I think this makes this a winner in some way ^^.
  • If forbid is a set, the repeated guessing method is faster with O(k⋅n/(n-(f+k))), which is very close to O(k) for f+k not very close to n.
  • If forbid is sorted, my ridiculous algorithm is faster with:
    O(k⋅(log(f+k)+log²(n/(n-(f+k))))
import random
def sample_gen(n, forbid):
    state = dict()
    track = dict()
    for (i, o) in enumerate(forbid):
        x = track.get(o, o)
        t = state.get(n-i-1, n-i-1)
        state[x] = t
        track[t] = x
        state.pop(n-i-1, None)
        track.pop(o, None)
    del track
    for remaining in xrange(n-len(forbid), 0, -1):
        i = random.randrange(remaining)
        yield state.get(i, i)
        state[i] = state.get(remaining - 1, remaining - 1)
        state.pop(remaining - 1, None)

usage:

gen = sample_gen(10, [1, 2, 4, 8])
print gen.next()
print gen.next()
print gen.next()
print gen.next()
qid & accept id: (18938302, 18938744) query: Remove duplicate, remove certain letters from line if found soup:

This could help, as a start:

\n
for line in fin.readlines():\n    words = line.split()    # list of words\n    new_words = []\n    unique_words = set()\n    for word in words:\n        if (word not in unique_words and\n                  (not word.isdigit() or int(word) <= 65000)):\n            new_words.append(word)\n            unique_words.add(word)\n    new_line = ' '.join(new_words)\n    print new_line\n
\n

Turns this:

\n
A   786 65534 65534 786 786 786 786 10026/AS4637 19151 19151 19151 19151 19151     19151 10796/AS13706\n
\n

Into this:

\n
A 786 10026/AS4637 19151 10796/AS13706\n
\n

Obviously, it's not quite what you want yet, but try to do the rest yourself. :) The str.replace() method might help you getting rid of those /AS.

\n soup wrap:

This could help, as a start:

for line in fin.readlines():
    words = line.split()    # list of words
    new_words = []
    unique_words = set()
    for word in words:
        if (word not in unique_words and
                  (not word.isdigit() or int(word) <= 65000)):
            new_words.append(word)
            unique_words.add(word)
    new_line = ' '.join(new_words)
    print new_line

Turns this:

A   786 65534 65534 786 786 786 786 10026/AS4637 19151 19151 19151 19151 19151     19151 10796/AS13706

Into this:

A 786 10026/AS4637 19151 10796/AS13706

Obviously, it's not quite what you want yet, but try to do the rest yourself. :) The str.replace() method might help you getting rid of those /AS.

qid & accept id: (18952977, 18953215) query: Parsing the json file after a specific time interval soup:

Use tell() method of file(after reading from file) this will return current pointer.\nAnd next time you read use seek() function of file for setting pointer to old position.

\n

Example:

\n
f = open("test.json" , "w+")\n .....\n .....\nyour code for reading \nf.read()\n .....\n .....\nlast_position = f.tell() # return current position of file pointer(where you stoped reading)\n
\n

now when you next time read from file use seek() function

\n
f = open("test.json" , "w+")\nf.seek(last_position)\nf.read() # now this will start reading from last position\n
\n

Hope This will Help :)

\n soup wrap:

Use tell() method of file(after reading from file) this will return current pointer. And next time you read use seek() function of file for setting pointer to old position.

Example:

f = open("test.json" , "w+")
 .....
 .....
your code for reading 
f.read()
 .....
 .....
last_position = f.tell() # return current position of file pointer(where you stoped reading)

now when you next time read from file use seek() function

f = open("test.json" , "w+")
f.seek(last_position)
f.read() # now this will start reading from last position

Hope This will Help :)

qid & accept id: (18972212, 18972276) query: recursively (or non-recursively) iterating through python array and get the elements soup:

One possible solution would be to flatten the list:

\n
def flatten(lst):\n    if not lst:\n        return []\n    elif not isinstance(lst, list):\n        return [lst] \n    else:\n        return flatten(lst[0]) + flatten(lst[1:])\n
\n

This will allow you to traverse the list in order:

\n
ls1 = [[[[1, '1.0.1'], [1, '2.0.1']], [1, '3.0.11']], [1, '4.0.11']]\nflatten(ls1)\n=> [1, '1.0.1', 1, '2.0.1', 1, '3.0.11', 1, '4.0.11']\n
\n

Or alternatively, using generators:

\n
def flatten(lst):\n    if not lst:\n        return\n    elif not isinstance(lst, list):\n        yield lst\n    else:\n        for e in flatten(lst[0]):\n            yield e\n        for e in flatten(lst[1:]):\n            yield e\n\nlist(flatten(ls1))\n=> [1, '1.0.1', 1, '2.0.1', 1, '3.0.11', 1, '4.0.11']\n
\n soup wrap:

One possible solution would be to flatten the list:

def flatten(lst):
    if not lst:
        return []
    elif not isinstance(lst, list):
        return [lst] 
    else:
        return flatten(lst[0]) + flatten(lst[1:])

This will allow you to traverse the list in order:

ls1 = [[[[1, '1.0.1'], [1, '2.0.1']], [1, '3.0.11']], [1, '4.0.11']]
flatten(ls1)
=> [1, '1.0.1', 1, '2.0.1', 1, '3.0.11', 1, '4.0.11']

Or alternatively, using generators:

def flatten(lst):
    if not lst:
        return
    elif not isinstance(lst, list):
        yield lst
    else:
        for e in flatten(lst[0]):
            yield e
        for e in flatten(lst[1:]):
            yield e

list(flatten(ls1))
=> [1, '1.0.1', 1, '2.0.1', 1, '3.0.11', 1, '4.0.11']
qid & accept id: (19031953, 19032932) query: Skip unittest test without decorator syntax soup:

Using unittest.TestCase.skipTest:

\n
import unittest\n\nclass TestFoo(unittest.TestCase):\n    def setUp(self): print('setup')\n    def tearDown(self): print('teardown')\n    def test_spam(self): pass\n    def test_egg(self): pass\n    def test_ham(self): pass\n\nif __name__ == '__main__':\n    import sys\n    loader = unittest.loader.defaultTestLoader\n    runner = unittest.TextTestRunner(verbosity=2)\n    suite = loader.loadTestsFromModule(sys.modules['__main__'])\n    for ts in suite:\n        for t in ts:\n            if t.id().endswith('am'): # To skip `test_spam` and `test_ham`\n                setattr(t, 'setUp', lambda: t.skipTest('criteria'))\n    runner.run(suite)\n
\n

prints

\n
test_egg (__main__.TestFoo) ... setup\nteardown\nok\ntest_ham (__main__.TestFoo) ... skipped 'criteria'\ntest_spam (__main__.TestFoo) ... skipped 'criteria'\n\n----------------------------------------------------------------------\nRan 3 tests in 0.001s\n\nOK (skipped=2)\n\n\n----------------------------------------------------------------------\nRan 3 tests in 0.002s\n\nOK (skipped=2)\n
\n

UPDATE

\n

Updated the code to patch setUp instead of test method. Otherwise, setUp/tearDown methods will be executed for test to be skipped.

\n

NOTE

\n

unittest.TestCase.skipTest (Test skipping) was introduced in Python 2.7, 3.1. So this method only work in Python 2.7+, 3.1+.

\n soup wrap:

Using unittest.TestCase.skipTest:

import unittest

class TestFoo(unittest.TestCase):
    def setUp(self): print('setup')
    def tearDown(self): print('teardown')
    def test_spam(self): pass
    def test_egg(self): pass
    def test_ham(self): pass

if __name__ == '__main__':
    import sys
    loader = unittest.loader.defaultTestLoader
    runner = unittest.TextTestRunner(verbosity=2)
    suite = loader.loadTestsFromModule(sys.modules['__main__'])
    for ts in suite:
        for t in ts:
            if t.id().endswith('am'): # To skip `test_spam` and `test_ham`
                setattr(t, 'setUp', lambda: t.skipTest('criteria'))
    runner.run(suite)

prints

test_egg (__main__.TestFoo) ... setup
teardown
ok
test_ham (__main__.TestFoo) ... skipped 'criteria'
test_spam (__main__.TestFoo) ... skipped 'criteria'

----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK (skipped=2)


----------------------------------------------------------------------
Ran 3 tests in 0.002s

OK (skipped=2)

UPDATE

Updated the code to patch setUp instead of test method. Otherwise, setUp/tearDown methods will be executed for test to be skipped.

NOTE

unittest.TestCase.skipTest (Test skipping) was introduced in Python 2.7, 3.1. So this method only work in Python 2.7+, 3.1+.

qid & accept id: (19043923, 19052069) query: tuple of datetime objects in Python soup:

I think using datetime.isocalendar is a nice solution. This give the correct outputs for your example:

\n
import datetime\n\ndef iso_year_start(iso_year):\n    "The gregorian calendar date of the first day of the given ISO year"\n    fourth_jan = datetime.date(iso_year, 1, 4)\n    delta = datetime.timedelta(fourth_jan.isoweekday()-1)\n    return fourth_jan - delta \n\ndef iso_to_gregorian(iso_year, iso_week, iso_day):\n    "Gregorian calendar date for the given ISO year, week and day"\n    year_start = iso_year_start(iso_year)\n    return year_start + datetime.timedelta(days=iso_day-1, weeks=iso_week-1)\n\n\ndef week_start_end(date):\n    year = date.isocalendar()[0]\n    week = date.isocalendar()[1]\n    d1 = iso_to_gregorian(year, week, 0)\n    d2 = iso_to_gregorian(year, week, 6)\n    d3 = datetime.datetime(d1.year, d1.month, d1.day, 0,0,0,0)\n    d4 = datetime.datetime(d2.year, d2.month, d2.day, 23,59,59,999999)\n    return (d3,d4)\n
\n

As an example:

\n
>>> d = datetime.datetime(2013, 8, 15, 12, 0, 0)\n>>> print week_start_end(d)\n(datetime.datetime(2013, 8, 11, 0, 0), datetime.datetime(2013, 8, 17, 23, 59, 59, 999999))\n
\n

And should help you with your problem.

\n soup wrap:

I think using datetime.isocalendar is a nice solution. This give the correct outputs for your example:

import datetime

def iso_year_start(iso_year):
    "The gregorian calendar date of the first day of the given ISO year"
    fourth_jan = datetime.date(iso_year, 1, 4)
    delta = datetime.timedelta(fourth_jan.isoweekday()-1)
    return fourth_jan - delta 

def iso_to_gregorian(iso_year, iso_week, iso_day):
    "Gregorian calendar date for the given ISO year, week and day"
    year_start = iso_year_start(iso_year)
    return year_start + datetime.timedelta(days=iso_day-1, weeks=iso_week-1)


def week_start_end(date):
    year = date.isocalendar()[0]
    week = date.isocalendar()[1]
    d1 = iso_to_gregorian(year, week, 0)
    d2 = iso_to_gregorian(year, week, 6)
    d3 = datetime.datetime(d1.year, d1.month, d1.day, 0,0,0,0)
    d4 = datetime.datetime(d2.year, d2.month, d2.day, 23,59,59,999999)
    return (d3,d4)

As an example:

>>> d = datetime.datetime(2013, 8, 15, 12, 0, 0)
>>> print week_start_end(d)
(datetime.datetime(2013, 8, 11, 0, 0), datetime.datetime(2013, 8, 17, 23, 59, 59, 999999))

And should help you with your problem.

qid & accept id: (19045971, 19046264) query: Random rounding to integer in Python soup:

The probability you're looking for is x-int(x).

\n

To sample with this probability, do random.random() < x-int(x)

\n
import random\nimport math\nimport numpy as np\n\ndef prob_round(x):\n    sign = np.sign(x)\n    x = abs(x)\n    is_up = random.random() < x-int(x)\n    round_func = math.ceil if is_up else math.floor\n    return sign * round_func(x)\n\nx = 6.1\nsum( prob_round(x) for i in range(100) ) / 100.\n=> 6.12\n
\n

EDIT: adding an optional prec argument:

\n
def prob_round(x, prec = 0):\n    fixup = np.sign(x) * 10**prec\n    x *= fixup\n    is_up = random.random() < x-int(x)\n    round_func = math.ceil if is_up else math.floor\n    return round_func(x) / fixup\n\nx = 8.33333333\n[ prob_round(x, prec = 2) for i in range(10) ]\n=> [8.3399999999999999,\n 8.3300000000000001,\n 8.3399999999999999,\n 8.3300000000000001,\n 8.3300000000000001,\n 8.3300000000000001,\n 8.3300000000000001,\n 8.3300000000000001,\n 8.3399999999999999,\n 8.3399999999999999]\n
\n soup wrap:

The probability you're looking for is x-int(x).

To sample with this probability, do random.random() < x-int(x)

import random
import math
import numpy as np

def prob_round(x):
    sign = np.sign(x)
    x = abs(x)
    is_up = random.random() < x-int(x)
    round_func = math.ceil if is_up else math.floor
    return sign * round_func(x)

x = 6.1
sum( prob_round(x) for i in range(100) ) / 100.
=> 6.12

EDIT: adding an optional prec argument:

def prob_round(x, prec = 0):
    fixup = np.sign(x) * 10**prec
    x *= fixup
    is_up = random.random() < x-int(x)
    round_func = math.ceil if is_up else math.floor
    return round_func(x) / fixup

x = 8.33333333
[ prob_round(x, prec = 2) for i in range(10) ]
=> [8.3399999999999999,
 8.3300000000000001,
 8.3399999999999999,
 8.3300000000000001,
 8.3300000000000001,
 8.3300000000000001,
 8.3300000000000001,
 8.3300000000000001,
 8.3399999999999999,
 8.3399999999999999]
qid & accept id: (19079040, 19079129) query: Selenium (Python): How to insert value on a hidden input? soup:

You can use WebDriver.execute_script. For example:

\n
from selenium import webdriver\n\ndriver = webdriver.Firefox()\ndriver.get('http://jsfiddle.net/falsetru/mLGnB/show/')\nelem = driver.find_element_by_css_selector('div.dijitReset>input[type=hidden]')\ndriver.execute_script('''\n    var elem = arguments[0];\n    var value = arguments[1];\n    elem.value = value;\n''', elem, '2013-11-26')\n
\n
\n

UPDATE

\n
from selenium import webdriver\n\ndriver = webdriver.Firefox()\ndriver.get('http://matrix.itasoftware.com/')\nelem = driver.find_element_by_xpath(\n    './/input[@id="ita_form_date_DateTextBox_0"]'\n    '/following-sibling::input[@type="hidden"]')\n\nvalue = driver.execute_script('return arguments[0].value;', elem)\nprint("Before update, hidden input value = {}".format(value))\n\ndriver.execute_script('''\n    var elem = arguments[0];\n    var value = arguments[1];\n    elem.value = value;\n''', elem, '2013-11-26')\n\nvalue = driver.execute_script('return arguments[0].value;', elem)\nprint("After update, hidden input value = {}".format(value))\n
\n soup wrap:

You can use WebDriver.execute_script. For example:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('http://jsfiddle.net/falsetru/mLGnB/show/')
elem = driver.find_element_by_css_selector('div.dijitReset>input[type=hidden]')
driver.execute_script('''
    var elem = arguments[0];
    var value = arguments[1];
    elem.value = value;
''', elem, '2013-11-26')

UPDATE

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('http://matrix.itasoftware.com/')
elem = driver.find_element_by_xpath(
    './/input[@id="ita_form_date_DateTextBox_0"]'
    '/following-sibling::input[@type="hidden"]')

value = driver.execute_script('return arguments[0].value;', elem)
print("Before update, hidden input value = {}".format(value))

driver.execute_script('''
    var elem = arguments[0];
    var value = arguments[1];
    elem.value = value;
''', elem, '2013-11-26')

value = driver.execute_script('return arguments[0].value;', elem)
print("After update, hidden input value = {}".format(value))
qid & accept id: (19122988, 19123028) query: Require one out of two keyword arguments soup:

You can use set operations on the dictionary keys view:

\n
if len(kargs.viewkeys() & {'dollar', 'euro'}) != 1:\n    raise ValueError('One keyword argument is required: dollar=x or euro=x')\n
\n

In Python 3, use kargs.keys() instead.

\n

Demo of the different outcomes of the set operation:

\n
>>> kargs = {'dollar': 1, 'euro': 3, 'foo': 'bar'}\n>>> kargs.viewkeys() & {'dollar', 'euro'}\nset(['dollar', 'euro'])\n>>> del kargs['euro']\n>>> kargs.viewkeys() & {'dollar', 'euro'}\nset(['dollar'])\n>>> del kargs['dollar']\n>>> kargs.viewkeys() & {'dollar', 'euro'}\nset([])\n
\n

In other words, the & set intersection gives you a set of all keys present in both sets; both in the dictionary and in your explicit set literal. Only if one and only one of the named keys is present is the length of the intersection going to be 1.

\n

If you do not want to allow any other keyword arguments besides dollar and euro, then you can also use proper subset tests. Using < with two sets is only True if the left-hand set is strictly smaller than the right-hand set; it only has fewer keys than the other set and no extra keys:

\n
if {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}:\n    raise ValueError('One keyword argument is required: dollar=x or euro=x')\n
\n

On Python 3, that can be spelled as:

\n
if set() < kargs.keys() < {'dollar', 'euro'}:\n
\n

instead.

\n

Demo:

\n
>>> kargs = {'dollar': 1, 'euro': 3, 'foo': 'bar'}\n>>> {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}\nFalse\n>>> del kargs['foo']\n>>> {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}\nFalse\n>>> del kargs['dollar']\n>>> {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}\nTrue\n>>> del kargs['euro']\n>>> {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}\nFalse\n
\n

Note that now the 'foo' key is no longer acceptable.

\n soup wrap:

You can use set operations on the dictionary keys view:

if len(kargs.viewkeys() & {'dollar', 'euro'}) != 1:
    raise ValueError('One keyword argument is required: dollar=x or euro=x')

In Python 3, use kargs.keys() instead.

Demo of the different outcomes of the set operation:

>>> kargs = {'dollar': 1, 'euro': 3, 'foo': 'bar'}
>>> kargs.viewkeys() & {'dollar', 'euro'}
set(['dollar', 'euro'])
>>> del kargs['euro']
>>> kargs.viewkeys() & {'dollar', 'euro'}
set(['dollar'])
>>> del kargs['dollar']
>>> kargs.viewkeys() & {'dollar', 'euro'}
set([])

In other words, the & set intersection gives you a set of all keys present in both sets; both in the dictionary and in your explicit set literal. Only if one and only one of the named keys is present is the length of the intersection going to be 1.

If you do not want to allow any other keyword arguments besides dollar and euro, then you can also use proper subset tests. Using < with two sets is only True if the left-hand set is strictly smaller than the right-hand set; it only has fewer keys than the other set and no extra keys:

if {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}:
    raise ValueError('One keyword argument is required: dollar=x or euro=x')

On Python 3, that can be spelled as:

if set() < kargs.keys() < {'dollar', 'euro'}:

instead.

Demo:

>>> kargs = {'dollar': 1, 'euro': 3, 'foo': 'bar'}
>>> {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}
False
>>> del kargs['foo']
>>> {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}
False
>>> del kargs['dollar']
>>> {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}
True
>>> del kargs['euro']
>>> {}.viewkeys() < kargs.viewkeys() < {'dollar', 'euro'}
False

Note that now the 'foo' key is no longer acceptable.

qid & accept id: (19124072, 19189120) query: Access a Numpy Recarray via the C-API soup:

I'll try to answer my own question.
\nIt appears that you can use the function PyObject_GetItem() to access fields in your Numpy recarray. To test this I created a simple recarray with three fields:
\nnp.dtype([('field1', '
I send this array to my C++ function and exectute two loops: one loop over each field and a nested loop over the array elements in each field (eg. x['field1'], x['field2'], x['field3']). In the outerloop I use PyObject_GetItem() to access each field. The code is as follows:

\n

C++ Code

\n
#include "Python.h"\n#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION\n#include "arrayobject.h"\n#include \n#include \n#include \nusing namespace std;\n\nstatic PyObject *readarray(PyObject *self, PyObject *args) {\n    PyArrayObject *arr, *x2;\n    PyArray_Descr *dtype;\n    PyObject *names, *name, *x1 = NULL;\n    Py_ssize_t N, i;\n    NpyIter *iter;\n    NpyIter_IterNextFunc *iternext;\n    double **dataptr;\n    npy_intp index;\n\n    if (!PyArg_ParseTuple(args, "O!", &PyArray_Type, &arr)) {\n        return NULL;\n    }\n    dtype = PyArray_DTYPE(arr);\n    names = dtype->names;\n    if (names != NULL) {\n        names = PySequence_Fast(names, NULL);\n        N = PySequence_Fast_GET_SIZE(names);\n        for (i=0; i
\n


\nPython Code

\n
import numpy as np\nimport pyproj4 as p4\nnp.random.seed(22)\n\n## Python Implementation ##\ndt = np.dtype([('field1', '
\n


\nThe output in both cases looks like:

\n
field1      0  -0.0919\n            1  -1.4634\n            2   1.0818\n            3  -0.2393\nfield2      0  -0.4911\n            1  -1.0023\n            2   0.9188\n            3  -1.1036\n            4   0.6265\n            5  -0.5615\n            6   0.0289\n            7  -0.2308\nfield3      0   0.5878\n            1   0.7523\n            2  -1.0585\n            3   1.0560\n            4   0.7478\n            5   1.0647\n
\n

If you know a better way to accomplish this, please don't hesitate to post your solution.

\n soup wrap:

I'll try to answer my own question.
It appears that you can use the function PyObject_GetItem() to access fields in your Numpy recarray. To test this I created a simple recarray with three fields:
np.dtype([('field1', '
I send this array to my C++ function and exectute two loops: one loop over each field and a nested loop over the array elements in each field (eg. x['field1'], x['field2'], x['field3']). In the outerloop I use PyObject_GetItem() to access each field. The code is as follows:

C++ Code

#include "Python.h"
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
#include "arrayobject.h"
#include 
#include 
#include 
using namespace std;

static PyObject *readarray(PyObject *self, PyObject *args) {
    PyArrayObject *arr, *x2;
    PyArray_Descr *dtype;
    PyObject *names, *name, *x1 = NULL;
    Py_ssize_t N, i;
    NpyIter *iter;
    NpyIter_IterNextFunc *iternext;
    double **dataptr;
    npy_intp index;

    if (!PyArg_ParseTuple(args, "O!", &PyArray_Type, &arr)) {
        return NULL;
    }
    dtype = PyArray_DTYPE(arr);
    names = dtype->names;
    if (names != NULL) {
        names = PySequence_Fast(names, NULL);
        N = PySequence_Fast_GET_SIZE(names);
        for (i=0; i


Python Code

import numpy as np
import pyproj4 as p4
np.random.seed(22)

## Python Implementation ##
dt = np.dtype([('field1', '


The output in both cases looks like:

field1      0  -0.0919
            1  -1.4634
            2   1.0818
            3  -0.2393
field2      0  -0.4911
            1  -1.0023
            2   0.9188
            3  -1.1036
            4   0.6265
            5  -0.5615
            6   0.0289
            7  -0.2308
field3      0   0.5878
            1   0.7523
            2  -1.0585
            3   1.0560
            4   0.7478
            5   1.0647

If you know a better way to accomplish this, please don't hesitate to post your solution.

qid & accept id: (19166610, 19180737) query: Trying to do left outer joins to two related tables soup:

I usually use subqueries for this purpose. Here is the SQL I would generate:

\n
SELECT foo.name, bar.name, (SELECT COUNT('*') FROM baz WHERE baz.foo_id=foo.id AND baz.bar_id=bar.id) FROM foo, bar\n
\n

An in SQLAlchemy:

\n
count_query = session.query(func.count('*')).filter(Baz.foo_id==Foo.id).filter(Baz.bar_id==Bar.id).correlate(Bar).correlate(Foo).as_scalar()\nquery = session.query(Foo.name, Bar.name, count_query)\n
\n

I am no expert concerning the performance of this query versus a query with cleverly combined OUTER JOINs, so I am not sure if this is any slower. I would have otherwise tried something like:

\n
FROM foo LEFT OUTER JOIN baz ON foo.id=baz.foo_id RIGHT OUTER JOIN bar ON bar.id=baz.bar_id\n
\n

I am not sure this yields what you need and right now I only have sqlite available which doesn't support RIGHT OUTER JOIN so I cannot test this. But I have verified tha the above query generates the result you want.

\n soup wrap:

I usually use subqueries for this purpose. Here is the SQL I would generate:

SELECT foo.name, bar.name, (SELECT COUNT('*') FROM baz WHERE baz.foo_id=foo.id AND baz.bar_id=bar.id) FROM foo, bar

An in SQLAlchemy:

count_query = session.query(func.count('*')).filter(Baz.foo_id==Foo.id).filter(Baz.bar_id==Bar.id).correlate(Bar).correlate(Foo).as_scalar()
query = session.query(Foo.name, Bar.name, count_query)

I am no expert concerning the performance of this query versus a query with cleverly combined OUTER JOINs, so I am not sure if this is any slower. I would have otherwise tried something like:

FROM foo LEFT OUTER JOIN baz ON foo.id=baz.foo_id RIGHT OUTER JOIN bar ON bar.id=baz.bar_id

I am not sure this yields what you need and right now I only have sqlite available which doesn't support RIGHT OUTER JOIN so I cannot test this. But I have verified tha the above query generates the result you want.

qid & accept id: (19179245, 19179708) query: Finding index values in a pandas data frame where columns are the same soup:
mask = df[['PhaseA','PhaseB','PhaseC']].isin([415,423,427,432]).all(axis=1)\ndf.ix[mask]\n
\n
\n

For example,

\n
In [51]: mask = df[['PhaseA','PhaseB','PhaseC']].isin([415,423,427,432]).all(axis=1)\nIn [52]: mask \nOut[52]: \nIndex\n2013-01-07 00:00:00    False\n2013-01-07 00:01:00    False\n2013-01-07 00:02:00     True\n2013-01-07 00:03:00    False\n2013-01-07 00:04:00     True\n2013-01-07 00:05:00    False\ndtype: bool\n\nIn [53]: df.ix[mask]\n
\n

yields

\n
Out[53]: \n                     PhaseA  PhaseB  PhaseC  DataCol\nIndex                                               \n2013-01-07 00:02:00     415     423     415      1.2\n2013-01-07 00:04:00     415     423     423      1.2\n
\n
\n

DataFrame.isin will be added to v0.13 of Pandas. Without DataFrame.isin you can create the mask with

\n
mask = df[['PhaseA','PhaseB','PhaseC']].applymap(set([415,423,427,]).__contains__).all(axis=1)\n
\n soup wrap:
mask = df[['PhaseA','PhaseB','PhaseC']].isin([415,423,427,432]).all(axis=1)
df.ix[mask]

For example,

In [51]: mask = df[['PhaseA','PhaseB','PhaseC']].isin([415,423,427,432]).all(axis=1)
In [52]: mask 
Out[52]: 
Index
2013-01-07 00:00:00    False
2013-01-07 00:01:00    False
2013-01-07 00:02:00     True
2013-01-07 00:03:00    False
2013-01-07 00:04:00     True
2013-01-07 00:05:00    False
dtype: bool

In [53]: df.ix[mask]

yields

Out[53]: 
                     PhaseA  PhaseB  PhaseC  DataCol
Index                                               
2013-01-07 00:02:00     415     423     415      1.2
2013-01-07 00:04:00     415     423     423      1.2

DataFrame.isin will be added to v0.13 of Pandas. Without DataFrame.isin you can create the mask with

mask = df[['PhaseA','PhaseB','PhaseC']].applymap(set([415,423,427,]).__contains__).all(axis=1)
qid & accept id: (19199869, 19201510) query: Most efficient way to loop through multiple csv files and calculate NYSE tick soup:

I'd probably use the pandas library for this. It has lots of nice features for working with time series in general and OHLC data in particular, but we won't use any here.

\n
import glob\nimport numpy as np\nimport pandas as pd\n\nstocks = glob.glob("stock*.csv")\n\ntotal_tick = 0\nfor stock in stocks:\n    df = pd.read_csv(stock, \n                     names=["time", "open", "high", "low", "close", "volume"],\n                     parse_dates=[0], index_col="time")\n    tick = df["close"].diff().apply(np.sign).fillna(0.0)\n    total_tick += tick\n\ntotal_tick.to_csv("tick.csv")\n
\n

which produces an output looking something like

\n
2013-09-16 09:30:00,0.0\n2013-09-16 09:31:00,3.0\n2013-09-16 15:59:00,-5.0\n2013-09-16 16:00:00,-3.0\n2013-09-17 09:30:00,1.0\n2013-09-17 09:31:00,-1.0\n
\n

where I've made up sample data looking like yours.

\n
\n

The basic idea is that you can read a csv file into an object called a DataFrame:

\n
>>> df\n                         open      high     low       close  volume\ntime                                                               \n2013-09-16 09:30:00  461.0100  461.4900  461.00  453.484089  183507\n2013-09-16 09:31:00  460.8200  461.6099  460.39  474.727508  212774\n2013-09-16 15:59:00  449.7200  450.0774  449.59  436.010403  146399\n2013-09-16 16:00:00  450.1200  450.1200  449.65  455.296584  444594\n2013-09-17 09:30:00  448.0000  448.0000  447.50  447.465545  173624\n2013-09-17 09:31:00  449.2628  449.6800  447.50  477.785506  193186\n
\n

We can select a column:

\n
>>> df["close"]\ntime\n2013-09-16 09:30:00    453.484089\n2013-09-16 09:31:00    474.727508\n2013-09-16 15:59:00    436.010403\n2013-09-16 16:00:00    455.296584\n2013-09-17 09:30:00    447.465545\n2013-09-17 09:31:00    477.785506\nName: close, dtype: float64\n
\n

Take the difference, noting that if we're subtracting from the previous value, then the initial value is undefined:

\n
>>> df["close"].diff()\ntime\n2013-09-16 09:30:00          NaN\n2013-09-16 09:31:00    21.243419\n2013-09-16 15:59:00   -38.717105\n2013-09-16 16:00:00    19.286181\n2013-09-17 09:30:00    -7.831039\n2013-09-17 09:31:00    30.319961\nName: close, dtype: float64\n
\n

Make this either positive or negative, depending on its sign:

\n
>>> df["close"].diff().apply(np.sign)\ntime\n2013-09-16 09:30:00   NaN\n2013-09-16 09:31:00     1\n2013-09-16 15:59:00    -1\n2013-09-16 16:00:00     1\n2013-09-17 09:30:00    -1\n2013-09-17 09:31:00     1\nName: close, dtype: float64\n
\n

And fill the NaN with a 0.

\n
>>> df["close"].diff().apply(np.sign).fillna(0)\ntime\n2013-09-16 09:30:00    0\n2013-09-16 09:31:00    1\n2013-09-16 15:59:00   -1\n2013-09-16 16:00:00    1\n2013-09-17 09:30:00   -1\n2013-09-17 09:31:00    1\ndtype: float64\n
\n

This assumes that the recording times match across all stocks: if not, there are powerful resampling tools available to align them.

\n soup wrap:

I'd probably use the pandas library for this. It has lots of nice features for working with time series in general and OHLC data in particular, but we won't use any here.

import glob
import numpy as np
import pandas as pd

stocks = glob.glob("stock*.csv")

total_tick = 0
for stock in stocks:
    df = pd.read_csv(stock, 
                     names=["time", "open", "high", "low", "close", "volume"],
                     parse_dates=[0], index_col="time")
    tick = df["close"].diff().apply(np.sign).fillna(0.0)
    total_tick += tick

total_tick.to_csv("tick.csv")

which produces an output looking something like

2013-09-16 09:30:00,0.0
2013-09-16 09:31:00,3.0
2013-09-16 15:59:00,-5.0
2013-09-16 16:00:00,-3.0
2013-09-17 09:30:00,1.0
2013-09-17 09:31:00,-1.0

where I've made up sample data looking like yours.


The basic idea is that you can read a csv file into an object called a DataFrame:

>>> df
                         open      high     low       close  volume
time                                                               
2013-09-16 09:30:00  461.0100  461.4900  461.00  453.484089  183507
2013-09-16 09:31:00  460.8200  461.6099  460.39  474.727508  212774
2013-09-16 15:59:00  449.7200  450.0774  449.59  436.010403  146399
2013-09-16 16:00:00  450.1200  450.1200  449.65  455.296584  444594
2013-09-17 09:30:00  448.0000  448.0000  447.50  447.465545  173624
2013-09-17 09:31:00  449.2628  449.6800  447.50  477.785506  193186

We can select a column:

>>> df["close"]
time
2013-09-16 09:30:00    453.484089
2013-09-16 09:31:00    474.727508
2013-09-16 15:59:00    436.010403
2013-09-16 16:00:00    455.296584
2013-09-17 09:30:00    447.465545
2013-09-17 09:31:00    477.785506
Name: close, dtype: float64

Take the difference, noting that if we're subtracting from the previous value, then the initial value is undefined:

>>> df["close"].diff()
time
2013-09-16 09:30:00          NaN
2013-09-16 09:31:00    21.243419
2013-09-16 15:59:00   -38.717105
2013-09-16 16:00:00    19.286181
2013-09-17 09:30:00    -7.831039
2013-09-17 09:31:00    30.319961
Name: close, dtype: float64

Make this either positive or negative, depending on its sign:

>>> df["close"].diff().apply(np.sign)
time
2013-09-16 09:30:00   NaN
2013-09-16 09:31:00     1
2013-09-16 15:59:00    -1
2013-09-16 16:00:00     1
2013-09-17 09:30:00    -1
2013-09-17 09:31:00     1
Name: close, dtype: float64

And fill the NaN with a 0.

>>> df["close"].diff().apply(np.sign).fillna(0)
time
2013-09-16 09:30:00    0
2013-09-16 09:31:00    1
2013-09-16 15:59:00   -1
2013-09-16 16:00:00    1
2013-09-17 09:30:00   -1
2013-09-17 09:31:00    1
dtype: float64

This assumes that the recording times match across all stocks: if not, there are powerful resampling tools available to align them.

qid & accept id: (19209860, 19210181) query: How to add date and time information to time series data using python numpy or pandas soup:

The documentation gives a similar example at the beginning using date_range. If you have a Series object, you can make a DatetimeIndex starting at the appropriate time (I'm assuming 1013 was a typo for 2013), with a frequency of one second, and of the appropriate length:

\n
>>> x = pd.Series(np.random.randint(8,24,23892344)) # make some random data\n>>> when = pd.date_range(start=pd.datetime(2013,1,1),freq='S',periods=len(x))\n>>> when\n\n[2013-01-01 00:00:00, ..., 2013-10-04 12:45:43]\nLength: 23892344, Freq: S, Timezone: None\n
\n

and then we can make a new series from the original data using this as the new index:

\n
>>> x_with_time = pd.Series(x.values, index=when)\n>>> x_with_time\n2013-01-01 00:00:00    13\n2013-01-01 00:00:01    14\n2013-01-01 00:00:02    15\n2013-01-01 00:00:03    22\n2013-01-01 00:00:04    16\n[...]\n2013-10-04 12:45:41    21\n2013-10-04 12:45:42    16\n2013-10-04 12:45:43    15\nFreq: S, Length: 23892344\n
\n soup wrap:

The documentation gives a similar example at the beginning using date_range. If you have a Series object, you can make a DatetimeIndex starting at the appropriate time (I'm assuming 1013 was a typo for 2013), with a frequency of one second, and of the appropriate length:

>>> x = pd.Series(np.random.randint(8,24,23892344)) # make some random data
>>> when = pd.date_range(start=pd.datetime(2013,1,1),freq='S',periods=len(x))
>>> when

[2013-01-01 00:00:00, ..., 2013-10-04 12:45:43]
Length: 23892344, Freq: S, Timezone: None

and then we can make a new series from the original data using this as the new index:

>>> x_with_time = pd.Series(x.values, index=when)
>>> x_with_time
2013-01-01 00:00:00    13
2013-01-01 00:00:01    14
2013-01-01 00:00:02    15
2013-01-01 00:00:03    22
2013-01-01 00:00:04    16
[...]
2013-10-04 12:45:41    21
2013-10-04 12:45:42    16
2013-10-04 12:45:43    15
Freq: S, Length: 23892344
qid & accept id: (19215100, 19215102) query: More elegant/Pythonic way of printing elements of tuple? soup:

print(*solution()) actually can be valid on python 2.7, just put:

\n
from __future__ import print_function\n
\n

On the top of your file.

\n

You could also iterate through the tuple:

\n
for i in solution():\n    print i,\n
\n

This is equivalent to:

\n
for i in solution():\n    print(i, end= ' ')\n
\n

If you ever use Python 3 or the import statement above.

\n soup wrap:

print(*solution()) actually can be valid on python 2.7, just put:

from __future__ import print_function

On the top of your file.

You could also iterate through the tuple:

for i in solution():
    print i,

This is equivalent to:

for i in solution():
    print(i, end= ' ')

If you ever use Python 3 or the import statement above.

qid & accept id: (19221073, 19221160) query: Call a C++ project main() in Python in Visual Studio? soup:

The arguments of main() are the command-line arguments of the program. So if you do for example this in Python:

\n
subprocess.Popen(['myCppprogram.exe', 'foo', 'bar'], ...)\n
\n

then the following will hold in main():

\n
int main(int argc, char** argv)\n{\n  assert(argc == 3);\n  assert(argv[1] == std::string("foo");\n  assert(argv[2] == std::string("bar");\n}\n
\n soup wrap:

The arguments of main() are the command-line arguments of the program. So if you do for example this in Python:

subprocess.Popen(['myCppprogram.exe', 'foo', 'bar'], ...)

then the following will hold in main():

int main(int argc, char** argv)
{
  assert(argc == 3);
  assert(argv[1] == std::string("foo");
  assert(argv[2] == std::string("bar");
}
qid & accept id: (19241166, 19242894) query: Sqlalchemy: bulk correlated update to link tables soup:

SQLAlchemy works in layers. At the base layer, SQLAlchemy provides stuff such as a unified interface to databases using various database drivers, and a connection pool implementation. Above this sits a SQL Expression Language, allowing you to define the tables and columns of your database using Python objects, and then use those objects to create SQL expressions using the APIs that SQLAlchemy gives you. Then there is the ORM. The ORM builds on these existing layers, and so even if you use the ORM, you can still drop down to use the expression API. You are a level even above that, using the declarative model (which builds on the ORM).

\n

Most of the expression API is based on the SQLAlchemy Table object and the columns. The tables are accessible by the __table__ property on the mapped class, and the columns are available as the properties on the mapped class. So, even though you are at the declarative level, you can still utilize much of what you have available to you there while using the models you mapped using declarative. So, the example correlated query...

\n
>>> stmt = select([addresses.c.email_address]).\\n...             where(addresses.c.user_id == users.c.id).\\n...             limit(1)\n>>> conn.execute(users.update().values(fullname=stmt)) \n
\n

...can translate to a declarative ORM model by using the __table__ attribute and declarative columns...

\n
>>> stmt = select([Addresses.email_address]).\\n...             where(Addresses.user_id == Users.id).\\n...             limit(1)\n>>> conn.execute(Users.__table__.update().values(fullname=stmt)) \n
\n

Here is what I believe your correlated query would look like..

\n
stmt = select([Location.id]).\\n    where(and_(Location.x==Stopover.x, Location.y==Stopover.y)).limit(1)\n\nconn.execute(Stopover.__table__.update().values(location_id=stmt)\n
\n

The resulting SQL:

\n
UPDATE stopovers SET location_id=(SELECT locations.id \nFROM locations \nWHERE locations.x = stopovers.x AND locations.y = stopovers.y\nLIMIT ? OFFSET ?)\n
\n soup wrap:

SQLAlchemy works in layers. At the base layer, SQLAlchemy provides stuff such as a unified interface to databases using various database drivers, and a connection pool implementation. Above this sits a SQL Expression Language, allowing you to define the tables and columns of your database using Python objects, and then use those objects to create SQL expressions using the APIs that SQLAlchemy gives you. Then there is the ORM. The ORM builds on these existing layers, and so even if you use the ORM, you can still drop down to use the expression API. You are a level even above that, using the declarative model (which builds on the ORM).

Most of the expression API is based on the SQLAlchemy Table object and the columns. The tables are accessible by the __table__ property on the mapped class, and the columns are available as the properties on the mapped class. So, even though you are at the declarative level, you can still utilize much of what you have available to you there while using the models you mapped using declarative. So, the example correlated query...

>>> stmt = select([addresses.c.email_address]).\
...             where(addresses.c.user_id == users.c.id).\
...             limit(1)
>>> conn.execute(users.update().values(fullname=stmt)) 

...can translate to a declarative ORM model by using the __table__ attribute and declarative columns...

>>> stmt = select([Addresses.email_address]).\
...             where(Addresses.user_id == Users.id).\
...             limit(1)
>>> conn.execute(Users.__table__.update().values(fullname=stmt)) 

Here is what I believe your correlated query would look like..

stmt = select([Location.id]).\
    where(and_(Location.x==Stopover.x, Location.y==Stopover.y)).limit(1)

conn.execute(Stopover.__table__.update().values(location_id=stmt)

The resulting SQL:

UPDATE stopovers SET location_id=(SELECT locations.id 
FROM locations 
WHERE locations.x = stopovers.x AND locations.y = stopovers.y
LIMIT ? OFFSET ?)
qid & accept id: (19274492, 19288889) query: How calculate the global coverage? soup:

Yesterday I receipted an email answering this question:

\n
\n

coverage.py (the tool coveralls uses to measure coverage in Python programs) has a "coverage combine" command.

\n
\n

Yesterday, I got the global coverage executing something like this:

\n
coverage erase\ntox\ncoverage combine\ncoveralls\n
\n

In tox.ini I added the "p" param:

\n
python {envbindir}/coverage run -p testing/run_tests.py\npython {envbindir}/coverage run -p testing/run_tests.py testing.settings_no_debug\n
\n

I fixed the problem with these commits:

\n\n soup wrap:

Yesterday I receipted an email answering this question:

coverage.py (the tool coveralls uses to measure coverage in Python programs) has a "coverage combine" command.

Yesterday, I got the global coverage executing something like this:

coverage erase
tox
coverage combine
coveralls

In tox.ini I added the "p" param:

python {envbindir}/coverage run -p testing/run_tests.py
python {envbindir}/coverage run -p testing/run_tests.py testing.settings_no_debug

I fixed the problem with these commits:

qid & accept id: (19288469, 19311505) query: Python how to strip white-spaces from xml text nodes soup:

With lxml you can iterate over all elements and check if it has text to strip():

\n
from lxml import etree\n\ntree = etree.parse('xmlfile')\nroot = tree.getroot()\n\nfor elem in root.iter('*'):\n    if elem.text is not None:\n        elem.text = elem.text.strip()\n\nprint(etree.tostring(root))\n
\n

It yields:

\n
My Name\n
My Address
\n
\n
\n
\n

UPDATE to strip tail text too:

\n
from lxml import etree\n\ntree = etree.parse('xmlfile')\nroot = tree.getroot()\n\nfor elem in root.iter('*'):\n    if elem.text is not None:\n        elem.text = elem.text.strip()\n    if elem.tail is not None:\n        elem.tail = elem.tail.strip()\n\nprint(etree.tostring(root, encoding="utf-8", xml_declaration=True))\n
\n soup wrap:

With lxml you can iterate over all elements and check if it has text to strip():

from lxml import etree

tree = etree.parse('xmlfile')
root = tree.getroot()

for elem in root.iter('*'):
    if elem.text is not None:
        elem.text = elem.text.strip()

print(etree.tostring(root))

It yields:

My Name
My Address

UPDATE to strip tail text too:

from lxml import etree

tree = etree.parse('xmlfile')
root = tree.getroot()

for elem in root.iter('*'):
    if elem.text is not None:
        elem.text = elem.text.strip()
    if elem.tail is not None:
        elem.tail = elem.tail.strip()

print(etree.tostring(root, encoding="utf-8", xml_declaration=True))
qid & accept id: (19297410, 19849614) query: packaging with numpy and test suite soup:

Here is a setup.py that works for me:

\n
# pkg - A fancy software package\n# Copyright (C) 2013  author (email)\n#\n# This program is free software: you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation, either version 3 of the License, or\n# (at your option) any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License\n# along with this program.  If not, see http://www.gnu.org/licenses/gpl.html.\n"""pkg: a software suite for \n\nHey look at me I'm a long description\nBut how long am I?\n\n"""\n\nfrom __future__ import division, print_function\n\n#ideas for setup/f2py came from:\n#    -numpy setup.py: https://github.com/numpy/numpy/blob/master/setup.py 2013-11-07\n#    -winpython setup.py: http://code.google.com/p/winpython/source/browse/setup.py 2013-11-07\n#    -needing to use \n#        import setuptools; from numpy.distutils.core import setup, Extension: \n#        http://comments.gmane.org/gmane.comp.python.f2py.user/707 2013-11-07\n#    -wrapping FORTRAN code with f2py: http://www2-pcmdi.llnl.gov/cdat/tutorials/f2py-wrapping-fortran-code 2013-11-07\n#    -numpy disutils: http://docs.scipy.org/doc/numpy/reference/distutils.html 2013-11-07\n#    -manifest files in disutils: \n#        'distutils doesn't properly update MANIFEST. when the contents of directories change.'\n#        https://github.com/numpy/numpy/blob/master/setup.py         \n#    -if things are not woring try deleting build, sdist, egg directories  and try again: \n#        https://stackoverflow.com/a/9982133/2530083 2013-11-07\n#    -getting fortran extensions to be installed in their appropriate sub package\n#        i.e. "my_ext = Extension(name = 'my_pack._fortran', sources = ['my_pack/code.f90'])" \n#        Note that sources is a list even if one file: \n#        http://numpy-discussion.10968.n7.nabble.com/f2py-and-setup-py-how-can-I-specify-where-the-so-file-goes-tp34490p34497.html 2013-11-07\n#    -install fortran source files into their appropriate sub-package \n#        i.e. "package_data={'': ['*.f95','*.f90']}# Note it's a dict and list":\n#        https://stackoverflow.com/a/19373744/2530083 2013-11-07\n#    -Chapter 9 Fortran Programming with NumPy Arrays: \n#        Langtangen, Hans Petter. 2013. Python Scripting for Computational Science. 3rd edition. Springer.\n#    -Hitchhikers guide to packaging :\n#        http://guide.python-distribute.org/\n#    -Python Packaging: Hate, hate, hate everywhere : \n#        http://lucumr.pocoo.org/2012/6/22/hate-hate-hate-everywhere/\n#    -How To Package Your Python Code: \n#        http://www.scotttorborg.com/python-packaging/\n#    -install testing requirements: \n#        https://stackoverflow.com/a/7747140/2530083 2013-11-07\n\nimport setuptools\nfrom numpy.distutils.core import setup, Extension\nimport os\nimport os.path as osp\n\ndef readme(filename='README.rst'):\n    with open('README.rst') as f:\n        text=f.read()\n    f.close()\n    return text\n\ndef get_package_data(name, extlist):\n    """Return data files for package *name* with extensions in *extlist*"""\n    #modified slightly from taken from http://code.google.com/p/winpython/source/browse/setup.py 2013-11-7\n    flist = []\n    # Workaround to replace os.path.relpath (not available until Python 2.6):\n    offset = len(name)+len(os.pathsep)\n    for dirpath, _dirnames, filenames in os.walk(name):\n        for fname in filenames:            \n            if not fname.startswith('.') and osp.splitext(fname)[1] in extlist:\n#                flist.append(osp.join(dirpath, fname[offset:]))\n                flist.append(osp.join(dirpath, fname))\n    return flist\n\nDOCLINES = __doc__.split("\n")\nCLASSIFIERS = """\\nDevelopment Status :: 1 - Planning\nLicense :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)\nProgramming Language :: Python :: 2.7\nTopic :: Scientific/Engineering\n"""\n\nNAME = 'pkg'\nMAINTAINER = "me"\nMAINTAINER_EMAIL = "me@me.com"\nDESCRIPTION = DOCLINES[0]\nLONG_DESCRIPTION = "\n".join(DOCLINES[2:])#readme('readme.rst')\nURL = "http://meeeee.mmemem"\nDOWNLOAD_URL = "https://github.com/rtrwalker/geotecha.git"\nLICENSE = 'GNU General Public License v3 or later (GPLv3+)'\nCLASSIFIERS = [_f for _f in CLASSIFIERS.split('\n') if _f]\nKEYWORDS=''\nAUTHOR = "me"\nAUTHOR_EMAIL = "me.com"\nPLATFORMS = ["Windows"]#, "Linux", "Solaris", "Mac OS-X", "Unix"]\nMAJOR = 0\nMINOR = 1\nMICRO = 0\nISRELEASED = False\nVERSION = '%d.%d.%d' % (MAJOR, MINOR, MICRO)\n\nINSTALL_REQUIRES=[]\nZIP_SAFE=False\nTEST_SUITE='nose.collector'\nTESTS_REQUIRE=['nose']\n\nDATA_FILES = [(NAME, ['LICENSE.txt','README.rst'])]\nPACKAGES=setuptools.find_packages()\nPACKAGES.remove('tools')\n\nPACKAGE_DATA={'': ['*.f95','*f90']}               \next_files = get_package_data(NAME,['.f90', '.f95','.F90', '.F95'])\next_module_names = ['.'.join(osp.splitext(v)[0].split(osp.sep)) for v in ext_files]\nEXT_MODULES = [Extension(name=x,sources=[y]) for x, y in zip(ext_module_names, ext_files)]      \n\n\nsetup(\n    name=NAME,\n    version=VERSION,\n    maintainer=MAINTAINER,\n    maintainer_email=MAINTAINER_EMAIL,\n    description=DESCRIPTION,\n    long_description=LONG_DESCRIPTION,\n    url=URL,\n    download_url=DOWNLOAD_URL,\n    license=LICENSE,\n    classifiers=CLASSIFIERS,\n    author=AUTHOR,\n    author_email=AUTHOR_EMAIL,\n    platforms=PLATFORMS,\n    packages=PACKAGES,\n    data_files=DATA_FILES,\n    install_requires=INSTALL_REQUIRES,\n    zip_safe=ZIP_SAFE,\n    test_suite=TEST_SUITE,\n    tests_require=TESTS_REQUIRE,\n    package_data=PACKAGE_DATA,    \n    ext_modules=EXT_MODULES,\n    )\n
\n

To install, at the command line I use:

\n
python setup.py install\npython setup.py clean --all\n
\n

The only issue I seem to have is a minor one. when I look in site-packages for my package it is installed inside the egg folder C:\Python27\Lib\site-packages\pkg-0.1.0-py2.7-win32.egg\pkg. Most other packages I see there have a C:\Python27\Lib\site-packages\pkg folder separate to the egg folder. Does anyone know how to get that separation?

\n

As for testing, after installing, I type the following at the command line:

\n
nosetests package_name -v\n
\n

Try investigating python setup.py develop (Python setup.py develop vs install) for not having to install the package after every change.

\n

As I commented in the code I found the following useful:

\n\n soup wrap:

Here is a setup.py that works for me:

# pkg - A fancy software package
# Copyright (C) 2013  author (email)
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see http://www.gnu.org/licenses/gpl.html.
"""pkg: a software suite for 

Hey look at me I'm a long description
But how long am I?

"""

from __future__ import division, print_function

#ideas for setup/f2py came from:
#    -numpy setup.py: https://github.com/numpy/numpy/blob/master/setup.py 2013-11-07
#    -winpython setup.py: http://code.google.com/p/winpython/source/browse/setup.py 2013-11-07
#    -needing to use 
#        import setuptools; from numpy.distutils.core import setup, Extension: 
#        http://comments.gmane.org/gmane.comp.python.f2py.user/707 2013-11-07
#    -wrapping FORTRAN code with f2py: http://www2-pcmdi.llnl.gov/cdat/tutorials/f2py-wrapping-fortran-code 2013-11-07
#    -numpy disutils: http://docs.scipy.org/doc/numpy/reference/distutils.html 2013-11-07
#    -manifest files in disutils: 
#        'distutils doesn't properly update MANIFEST. when the contents of directories change.'
#        https://github.com/numpy/numpy/blob/master/setup.py         
#    -if things are not woring try deleting build, sdist, egg directories  and try again: 
#        https://stackoverflow.com/a/9982133/2530083 2013-11-07
#    -getting fortran extensions to be installed in their appropriate sub package
#        i.e. "my_ext = Extension(name = 'my_pack._fortran', sources = ['my_pack/code.f90'])" 
#        Note that sources is a list even if one file: 
#        http://numpy-discussion.10968.n7.nabble.com/f2py-and-setup-py-how-can-I-specify-where-the-so-file-goes-tp34490p34497.html 2013-11-07
#    -install fortran source files into their appropriate sub-package 
#        i.e. "package_data={'': ['*.f95','*.f90']}# Note it's a dict and list":
#        https://stackoverflow.com/a/19373744/2530083 2013-11-07
#    -Chapter 9 Fortran Programming with NumPy Arrays: 
#        Langtangen, Hans Petter. 2013. Python Scripting for Computational Science. 3rd edition. Springer.
#    -Hitchhikers guide to packaging :
#        http://guide.python-distribute.org/
#    -Python Packaging: Hate, hate, hate everywhere : 
#        http://lucumr.pocoo.org/2012/6/22/hate-hate-hate-everywhere/
#    -How To Package Your Python Code: 
#        http://www.scotttorborg.com/python-packaging/
#    -install testing requirements: 
#        https://stackoverflow.com/a/7747140/2530083 2013-11-07

import setuptools
from numpy.distutils.core import setup, Extension
import os
import os.path as osp

def readme(filename='README.rst'):
    with open('README.rst') as f:
        text=f.read()
    f.close()
    return text

def get_package_data(name, extlist):
    """Return data files for package *name* with extensions in *extlist*"""
    #modified slightly from taken from http://code.google.com/p/winpython/source/browse/setup.py 2013-11-7
    flist = []
    # Workaround to replace os.path.relpath (not available until Python 2.6):
    offset = len(name)+len(os.pathsep)
    for dirpath, _dirnames, filenames in os.walk(name):
        for fname in filenames:            
            if not fname.startswith('.') and osp.splitext(fname)[1] in extlist:
#                flist.append(osp.join(dirpath, fname[offset:]))
                flist.append(osp.join(dirpath, fname))
    return flist

DOCLINES = __doc__.split("\n")
CLASSIFIERS = """\
Development Status :: 1 - Planning
License :: OSI Approved :: GNU Lesser General Public License v3 or later (LGPLv3+)
Programming Language :: Python :: 2.7
Topic :: Scientific/Engineering
"""

NAME = 'pkg'
MAINTAINER = "me"
MAINTAINER_EMAIL = "me@me.com"
DESCRIPTION = DOCLINES[0]
LONG_DESCRIPTION = "\n".join(DOCLINES[2:])#readme('readme.rst')
URL = "http://meeeee.mmemem"
DOWNLOAD_URL = "https://github.com/rtrwalker/geotecha.git"
LICENSE = 'GNU General Public License v3 or later (GPLv3+)'
CLASSIFIERS = [_f for _f in CLASSIFIERS.split('\n') if _f]
KEYWORDS=''
AUTHOR = "me"
AUTHOR_EMAIL = "me.com"
PLATFORMS = ["Windows"]#, "Linux", "Solaris", "Mac OS-X", "Unix"]
MAJOR = 0
MINOR = 1
MICRO = 0
ISRELEASED = False
VERSION = '%d.%d.%d' % (MAJOR, MINOR, MICRO)

INSTALL_REQUIRES=[]
ZIP_SAFE=False
TEST_SUITE='nose.collector'
TESTS_REQUIRE=['nose']

DATA_FILES = [(NAME, ['LICENSE.txt','README.rst'])]
PACKAGES=setuptools.find_packages()
PACKAGES.remove('tools')

PACKAGE_DATA={'': ['*.f95','*f90']}               
ext_files = get_package_data(NAME,['.f90', '.f95','.F90', '.F95'])
ext_module_names = ['.'.join(osp.splitext(v)[0].split(osp.sep)) for v in ext_files]
EXT_MODULES = [Extension(name=x,sources=[y]) for x, y in zip(ext_module_names, ext_files)]      


setup(
    name=NAME,
    version=VERSION,
    maintainer=MAINTAINER,
    maintainer_email=MAINTAINER_EMAIL,
    description=DESCRIPTION,
    long_description=LONG_DESCRIPTION,
    url=URL,
    download_url=DOWNLOAD_URL,
    license=LICENSE,
    classifiers=CLASSIFIERS,
    author=AUTHOR,
    author_email=AUTHOR_EMAIL,
    platforms=PLATFORMS,
    packages=PACKAGES,
    data_files=DATA_FILES,
    install_requires=INSTALL_REQUIRES,
    zip_safe=ZIP_SAFE,
    test_suite=TEST_SUITE,
    tests_require=TESTS_REQUIRE,
    package_data=PACKAGE_DATA,    
    ext_modules=EXT_MODULES,
    )

To install, at the command line I use:

python setup.py install
python setup.py clean --all

The only issue I seem to have is a minor one. when I look in site-packages for my package it is installed inside the egg folder C:\Python27\Lib\site-packages\pkg-0.1.0-py2.7-win32.egg\pkg. Most other packages I see there have a C:\Python27\Lib\site-packages\pkg folder separate to the egg folder. Does anyone know how to get that separation?

As for testing, after installing, I type the following at the command line:

nosetests package_name -v

Try investigating python setup.py develop (Python setup.py develop vs install) for not having to install the package after every change.

As I commented in the code I found the following useful:

qid & accept id: (19303429, 19303477) query: Select last chars of string until whitespace in Python soup:

Just split the string on whitespace, and get the last element of the array. Or use rsplit() to start splitting from end:

\n
>>> st = 'Hello my name is John'\n>>> st.rsplit(' ', 1)\n['Hello my name is', 'John']\n>>> \n>>> st.rsplit(' ', 1)[1]\n'John'\n
\n

The 2nd argument specifies the number of split to do. Since you just want last element, we just need to split once.

\n

As specified in comments, you can just pass None as 1st argument, in which case the default delimiter which is whitespace will be used:

\n
>>> st.rsplit(None, 1)[-1]\n'John'\n
\n

Using -1 as index is safe, in case there is no whitespace in your string.

\n soup wrap:

Just split the string on whitespace, and get the last element of the array. Or use rsplit() to start splitting from end:

>>> st = 'Hello my name is John'
>>> st.rsplit(' ', 1)
['Hello my name is', 'John']
>>> 
>>> st.rsplit(' ', 1)[1]
'John'

The 2nd argument specifies the number of split to do. Since you just want last element, we just need to split once.

As specified in comments, you can just pass None as 1st argument, in which case the default delimiter which is whitespace will be used:

>>> st.rsplit(None, 1)[-1]
'John'

Using -1 as index is safe, in case there is no whitespace in your string.

qid & accept id: (19317766, 19348527) query: How to find first byte of a serial stream with python? soup:

It might be helpful to flush your input before you put the request. I.e.,

\n
ser.read(ser.inWaiting())\n
\n

to read out all bytes which are waiting. Then, hoping that no further bytes are being sent, you can send your command:

\n
ser.write(bytes([0x05, 0x69, 0x02, 0x0A, 0x86]))\n
\n

This is supposed to make sure that all bytes which are coming next are an answer to this command.

\n

Then read data until you get your 107:

\n
found = False\nbuffer = '' # what is left from the previous run...\nwhile not found:\n    rd = ser.read(50)\n    buffer += rd\n    sp = buffer.split(chr(107), 1)\n    if len(sp) == 2:\n        pkt = chr(107) + sp[1] # candidate for a valid packet\n        if pkt[1] == chr(105): # \n            while len(pkt) < 107: # TODO add a timeout condition here...\n                rd = ser.read(107 - len(pkt))\n                pkt += rd\n            found = True\n        else:\n            buffer = pkt[1:] # process this further...\n    else: # no 107 found; empty the buffer.\n        buffer = ''\n# Now we have a pkt of 107 bytes and can do whatever we want with it.\n
\n soup wrap:

It might be helpful to flush your input before you put the request. I.e.,

ser.read(ser.inWaiting())

to read out all bytes which are waiting. Then, hoping that no further bytes are being sent, you can send your command:

ser.write(bytes([0x05, 0x69, 0x02, 0x0A, 0x86]))

This is supposed to make sure that all bytes which are coming next are an answer to this command.

Then read data until you get your 107:

found = False
buffer = '' # what is left from the previous run...
while not found:
    rd = ser.read(50)
    buffer += rd
    sp = buffer.split(chr(107), 1)
    if len(sp) == 2:
        pkt = chr(107) + sp[1] # candidate for a valid packet
        if pkt[1] == chr(105): # 
            while len(pkt) < 107: # TODO add a timeout condition here...
                rd = ser.read(107 - len(pkt))
                pkt += rd
            found = True
        else:
            buffer = pkt[1:] # process this further...
    else: # no 107 found; empty the buffer.
        buffer = ''
# Now we have a pkt of 107 bytes and can do whatever we want with it.
qid & accept id: (19330415, 19330727) query: Python Django: join view on the admin interface soup:

Displaying is easy - define a method that returns the related data on the model or the admin class, then use the method in list_display and/or readonly_fields.

\n

For sorting, define the admin_order_field property of the method. Although list_display and readonly_fields do not support the double underscore related field syntax, admin_order_field does. So something like this:

\n
class UniversityContact(models.Model):\n    # as above, plus:\n    def abbrev(self):\n        return self.university.abbrev\n    abbrev.admin_order_field = 'university__abbrev'\n
\n

Optionally you can set the short_description attribute as well, if you don't want the default choic of the method name:

\n
    abbrev.short_description = 'abbreviation'\n
\n

You didn't ask about this, but it seems worth knowing - list_filter also supports the standard related field name syntax:

\n
    list_filter = ('university__region',)\n
\n

Alternatively, there's a code snippet here that claims to address it:\nhttp://djangosnippets.org/snippets/2887/

\n

I haven't tested that myself.

\n soup wrap:

Displaying is easy - define a method that returns the related data on the model or the admin class, then use the method in list_display and/or readonly_fields.

For sorting, define the admin_order_field property of the method. Although list_display and readonly_fields do not support the double underscore related field syntax, admin_order_field does. So something like this:

class UniversityContact(models.Model):
    # as above, plus:
    def abbrev(self):
        return self.university.abbrev
    abbrev.admin_order_field = 'university__abbrev'

Optionally you can set the short_description attribute as well, if you don't want the default choic of the method name:

    abbrev.short_description = 'abbreviation'

You didn't ask about this, but it seems worth knowing - list_filter also supports the standard related field name syntax:

    list_filter = ('university__region',)

Alternatively, there's a code snippet here that claims to address it: http://djangosnippets.org/snippets/2887/

I haven't tested that myself.

qid & accept id: (19368715, 19368736) query: Convert utf string ftom python to javascript dictionary soup:

You are passing a Python object instead of JSON.

\n

On the Python side, convert this to JSON first:

\n
import json\n\njson_value = json.dumps(python_object)\n
\n

Demo:

\n
>>> import json\n>>> python_object = {'username': u'Tester1', 'age': 0L}\n>>> print json.dumps(python_object)\n{"username": "Tester1", "age": 0}\n
\n

The latter you can load into JavaScript with JSON.parse().

\n soup wrap:

You are passing a Python object instead of JSON.

On the Python side, convert this to JSON first:

import json

json_value = json.dumps(python_object)

Demo:

>>> import json
>>> python_object = {'username': u'Tester1', 'age': 0L}
>>> print json.dumps(python_object)
{"username": "Tester1", "age": 0}

The latter you can load into JavaScript with JSON.parse().

qid & accept id: (19369025, 19369548) query: Output touch position from custom kivy widget to labels soup:

Two problems:

\n

1) You do xlabel = ObjectProperty, but this just doesn't instantiate an ObjectProperty, it sets xlabel to ObjectProperty itself. You instead want to do xlabel = ObjectProperty(), the brackets create an instance of ObjectProperty.

\n

2) Your on_touch_down method is in the ColorLoopWidget, and tries (in your commented out code) to reference self.xlabel and self.ylabel. This doesn't work because those properties are never set; if you check the kv you see that HueLayout has xlabel: xlabel and ylabel: ylabel but the interior ColorLoopWidget does not. The code below adds these properties, so that the ColorLoopWidget knows about the labels and its on_touch_down method is able to reference them.

\n

The following code fixes both issues, and seems to work fine for me.

\n

main.py:

\n
from kivy.config import Config\nConfig.set('graphics', 'width', '1000')\nConfig.set('graphics', 'height', '500')\nConfig.set('graphics', 'resizable', 0)\nfrom kivy.app import App\nfrom kivy.uix.widget import Widget\nfrom kivy.uix.boxlayout import BoxLayout\nfrom kivy.lang import Builder\nfrom kivy.properties import ObjectProperty\nfrom kivy.graphics import Color, Ellipse, Line\n\nBuilder.load_file('hueLayout.kv')\n\nclass ColorLoopWidget(Widget):\n    xlabel = ObjectProperty()\n    ylabel = ObjectProperty()\n    def on_touch_down(self, touch):\n        with self.canvas:\n            self.canvas.clear()\n            d = 10\n            Ellipse(pos=(touch.x - d/2, touch.y - d/2), size=(d,d))\n            touch.ud['line'] = Line(points=(touch.x, touch.y))\n            self.xlabel.text = 'x: '+str(touch.x)\n            self.ylabel.text = 'y: '+str(touch.y)\n\n##    def on_touch_move(self, touch):\n##        touch.ud['line'].points += [touch.x, touch.y]\n\n\n\nclass HueLayout(Widget):\n    colorloopwidget = ObjectProperty()\n    xlabel = ObjectProperty()\n    ylabel = ObjectProperty()\n\n##    def on_touch_down():\n##        ColorLoopWidget.on_touch_down()\n##\n##    def on_touch_move():\n##        ColorLoopWidget.on_touch_move()\n\n    def clear_canvas(self):\n        self.colorloopwidget.canvas.clear()\n\n\nclass HueApp(App):\n    def build(self):\n        return HueLayout()\n\nif __name__ == '__main__':\n    HueApp().run()\n
\n

hueLayout.kv:

\n
:\n    colorloopwidget: colorloopwidget\n    xlabel: xlabel\n    ylabel: ylabel\n\n    BoxLayout:\n        size: 1000, 500\n        orientation: 'horizontal'\n\n        ColorLoopWidget:\n            xlabel: xlabel\n            ylabel: ylabel\n            id: colorloopwidget\n            size: 500, 500\n\n        BoxLayout:\n            orientation: 'vertical'\n            Button:\n                text: 'Clear'\n                on_release: root.clear_canvas()\n            Label:\n                id: xlabel\n                text: 'x: '\n                size_hint_y: 0.2\n            Label:\n                id: ylabel\n                text: 'y: '\n                size_hint_y: 0.2\n
\n soup wrap:

Two problems:

1) You do xlabel = ObjectProperty, but this just doesn't instantiate an ObjectProperty, it sets xlabel to ObjectProperty itself. You instead want to do xlabel = ObjectProperty(), the brackets create an instance of ObjectProperty.

2) Your on_touch_down method is in the ColorLoopWidget, and tries (in your commented out code) to reference self.xlabel and self.ylabel. This doesn't work because those properties are never set; if you check the kv you see that HueLayout has xlabel: xlabel and ylabel: ylabel but the interior ColorLoopWidget does not. The code below adds these properties, so that the ColorLoopWidget knows about the labels and its on_touch_down method is able to reference them.

The following code fixes both issues, and seems to work fine for me.

main.py:

from kivy.config import Config
Config.set('graphics', 'width', '1000')
Config.set('graphics', 'height', '500')
Config.set('graphics', 'resizable', 0)
from kivy.app import App
from kivy.uix.widget import Widget
from kivy.uix.boxlayout import BoxLayout
from kivy.lang import Builder
from kivy.properties import ObjectProperty
from kivy.graphics import Color, Ellipse, Line

Builder.load_file('hueLayout.kv')

class ColorLoopWidget(Widget):
    xlabel = ObjectProperty()
    ylabel = ObjectProperty()
    def on_touch_down(self, touch):
        with self.canvas:
            self.canvas.clear()
            d = 10
            Ellipse(pos=(touch.x - d/2, touch.y - d/2), size=(d,d))
            touch.ud['line'] = Line(points=(touch.x, touch.y))
            self.xlabel.text = 'x: '+str(touch.x)
            self.ylabel.text = 'y: '+str(touch.y)

##    def on_touch_move(self, touch):
##        touch.ud['line'].points += [touch.x, touch.y]



class HueLayout(Widget):
    colorloopwidget = ObjectProperty()
    xlabel = ObjectProperty()
    ylabel = ObjectProperty()

##    def on_touch_down():
##        ColorLoopWidget.on_touch_down()
##
##    def on_touch_move():
##        ColorLoopWidget.on_touch_move()

    def clear_canvas(self):
        self.colorloopwidget.canvas.clear()


class HueApp(App):
    def build(self):
        return HueLayout()

if __name__ == '__main__':
    HueApp().run()

hueLayout.kv:

:
    colorloopwidget: colorloopwidget
    xlabel: xlabel
    ylabel: ylabel

    BoxLayout:
        size: 1000, 500
        orientation: 'horizontal'

        ColorLoopWidget:
            xlabel: xlabel
            ylabel: ylabel
            id: colorloopwidget
            size: 500, 500

        BoxLayout:
            orientation: 'vertical'
            Button:
                text: 'Clear'
                on_release: root.clear_canvas()
            Label:
                id: xlabel
                text: 'x: '
                size_hint_y: 0.2
            Label:
                id: ylabel
                text: 'y: '
                size_hint_y: 0.2
qid & accept id: (19382088, 19382856) query: numpy create 3D array from indexed list soup:

If I understand your situation correctly, you can just reshape it.

\n
In [132]: p = np.array("p_x1y1z1 p_x2y1z1 p_x3y1z1 p_x4y1z1 p_x1y2z1 p_x2y2z1 p_x3y2z1 p_x4y2z1".split())\n\nIn [133]: p\nOut[133]: \narray(['p_x1y1z1', 'p_x2y1z1', 'p_x3y1z1', 'p_x4y1z1', 'p_x1y2z1', 'p_x2y2z1', 'p_x3y2z1', 'p_x4y2z1'], \n      dtype='|S8')\n
\n

It appears to me that your array is ordered in what numpy calls 'F' ordering:

\n
In [168]: p.reshape(4, 2, order='F')\nOut[168]: \narray([['p_x1y1z1', 'p_x1y2z1'],\n       ['p_x2y1z1', 'p_x2y2z1'],\n       ['p_x3y1z1', 'p_x3y2z1'],\n       ['p_x4y1z1', 'p_x4y2z1']], \n      dtype='|S8')\n
\n

If you have z variance, too, simply reshape to three dimensions:

\n
In [169]: q\nOut[169]: \narray(['p_x1y1z1', 'p_x2y1z1', 'p_x3y1z1', 'p_x4y1z1', 'p_x1y2z1',\n       'p_x2y2z1', 'p_x3y2z1', 'p_x4y2z1', 'p_x1y1z2', 'p_x2y1z2',\n       'p_x3y1z2', 'p_x4y1z2', 'p_x1y2z2', 'p_x2y2z2', 'p_x3y2z2',\n       'p_x4y2z2', 'p_x1y1z3', 'p_x2y1z3', 'p_x3y1z3', 'p_x4y1z3',\n       'p_x1y2z3', 'p_x2y2z3', 'p_x3y2z3', 'p_x4y2z3'], \n      dtype='|S8')\n\nIn [170]: q.reshape(4,2,3,order='F')\nOut[170]: \narray([[['p_x1y1z1', 'p_x1y1z2', 'p_x1y1z3'],\n        ['p_x1y2z1', 'p_x1y2z2', 'p_x1y2z3']],\n\n       [['p_x2y1z1', 'p_x2y1z2', 'p_x2y1z3'],\n        ['p_x2y2z1', 'p_x2y2z2', 'p_x2y2z3']],\n\n       [['p_x3y1z1', 'p_x3y1z2', 'p_x3y1z3'],\n        ['p_x3y2z1', 'p_x3y2z2', 'p_x3y2z3']],\n\n       [['p_x4y1z1', 'p_x4y1z2', 'p_x4y1z3'],\n        ['p_x4y2z1', 'p_x4y2z2', 'p_x4y2z3']]], \n      dtype='|S8')\n
\n

This assumes x,y,z should map to i+1,j+1,k+1, as seen here:

\n
In [175]: r = q.reshape(4,2,3,order='F')\n\nIn [176]: r[0]   #all x==1\nOut[176]: \narray([['p_x1y1z1', 'p_x1y1z2', 'p_x1y1z3'],\n       ['p_x1y2z1', 'p_x1y2z2', 'p_x1y2z3']], \n      dtype='|S8')\n\nIn [177]: r[:,0]  # all y==1\nOut[177]: \narray([['p_x1y1z1', 'p_x1y1z2', 'p_x1y1z3'],\n       ['p_x2y1z1', 'p_x2y1z2', 'p_x2y1z3'],\n       ['p_x3y1z1', 'p_x3y1z2', 'p_x3y1z3'],\n       ['p_x4y1z1', 'p_x4y1z2', 'p_x4y1z3']], \n      dtype='|S8')\n\nIn [178]: r[:,:,0]  #all z==1\nOut[178]: \narray([['p_x1y1z1', 'p_x1y2z1'],\n       ['p_x2y1z1', 'p_x2y2z1'],\n       ['p_x3y1z1', 'p_x3y2z1'],\n       ['p_x4y1z1', 'p_x4y2z1']], \n      dtype='|S8')\n
\n soup wrap:

If I understand your situation correctly, you can just reshape it.

In [132]: p = np.array("p_x1y1z1 p_x2y1z1 p_x3y1z1 p_x4y1z1 p_x1y2z1 p_x2y2z1 p_x3y2z1 p_x4y2z1".split())

In [133]: p
Out[133]: 
array(['p_x1y1z1', 'p_x2y1z1', 'p_x3y1z1', 'p_x4y1z1', 'p_x1y2z1', 'p_x2y2z1', 'p_x3y2z1', 'p_x4y2z1'], 
      dtype='|S8')

It appears to me that your array is ordered in what numpy calls 'F' ordering:

In [168]: p.reshape(4, 2, order='F')
Out[168]: 
array([['p_x1y1z1', 'p_x1y2z1'],
       ['p_x2y1z1', 'p_x2y2z1'],
       ['p_x3y1z1', 'p_x3y2z1'],
       ['p_x4y1z1', 'p_x4y2z1']], 
      dtype='|S8')

If you have z variance, too, simply reshape to three dimensions:

In [169]: q
Out[169]: 
array(['p_x1y1z1', 'p_x2y1z1', 'p_x3y1z1', 'p_x4y1z1', 'p_x1y2z1',
       'p_x2y2z1', 'p_x3y2z1', 'p_x4y2z1', 'p_x1y1z2', 'p_x2y1z2',
       'p_x3y1z2', 'p_x4y1z2', 'p_x1y2z2', 'p_x2y2z2', 'p_x3y2z2',
       'p_x4y2z2', 'p_x1y1z3', 'p_x2y1z3', 'p_x3y1z3', 'p_x4y1z3',
       'p_x1y2z3', 'p_x2y2z3', 'p_x3y2z3', 'p_x4y2z3'], 
      dtype='|S8')

In [170]: q.reshape(4,2,3,order='F')
Out[170]: 
array([[['p_x1y1z1', 'p_x1y1z2', 'p_x1y1z3'],
        ['p_x1y2z1', 'p_x1y2z2', 'p_x1y2z3']],

       [['p_x2y1z1', 'p_x2y1z2', 'p_x2y1z3'],
        ['p_x2y2z1', 'p_x2y2z2', 'p_x2y2z3']],

       [['p_x3y1z1', 'p_x3y1z2', 'p_x3y1z3'],
        ['p_x3y2z1', 'p_x3y2z2', 'p_x3y2z3']],

       [['p_x4y1z1', 'p_x4y1z2', 'p_x4y1z3'],
        ['p_x4y2z1', 'p_x4y2z2', 'p_x4y2z3']]], 
      dtype='|S8')

This assumes x,y,z should map to i+1,j+1,k+1, as seen here:

In [175]: r = q.reshape(4,2,3,order='F')

In [176]: r[0]   #all x==1
Out[176]: 
array([['p_x1y1z1', 'p_x1y1z2', 'p_x1y1z3'],
       ['p_x1y2z1', 'p_x1y2z2', 'p_x1y2z3']], 
      dtype='|S8')

In [177]: r[:,0]  # all y==1
Out[177]: 
array([['p_x1y1z1', 'p_x1y1z2', 'p_x1y1z3'],
       ['p_x2y1z1', 'p_x2y1z2', 'p_x2y1z3'],
       ['p_x3y1z1', 'p_x3y1z2', 'p_x3y1z3'],
       ['p_x4y1z1', 'p_x4y1z2', 'p_x4y1z3']], 
      dtype='|S8')

In [178]: r[:,:,0]  #all z==1
Out[178]: 
array([['p_x1y1z1', 'p_x1y2z1'],
       ['p_x2y1z1', 'p_x2y2z1'],
       ['p_x3y1z1', 'p_x3y2z1'],
       ['p_x4y1z1', 'p_x4y2z1']], 
      dtype='|S8')
qid & accept id: (19391149, 19391264) query: Numpy mean AND variance from single function? soup:

You can't pass a known mean to np.std or np.var, you'll have to wait for the new standard library statistics module, but in the meantime you can save a little time by using the formula:

\n
In [329]: a = np.random.rand(1000)\n\nIn [330]: %%timeit\n   .....: a.mean()\n   .....: a.var()\n   .....: \n10000 loops, best of 3: 80.6 µs per loop\n\nIn [331]: %%timeit\n   .....: m = a.mean()\n   .....: np.mean((a-m)**2)\n   .....: \n10000 loops, best of 3: 60.9 µs per loop\n\nIn [332]: m = a.mean()\n\nIn [333]: a.var()\nOut[333]: 0.078365856465916137\n\nIn [334]: np.mean((a-m)**2)\nOut[334]: 0.078365856465916137\n
\n

If you really are trying to speed things up, try np.dot to do the squaring and summing (since that's what a dot-product is):

\n
In [335]: np.dot(a-m,a-m)/a.size\nOut[335]: 0.078365856465916137\n\nIn [336]: %%timeit\n   .....: m = a.mean()\n   .....: c = a-m\n   .....: np.dot(c,c)/a.size\n   .....: \n10000 loops, best of 3: 38.2 µs per loop\n
\n soup wrap:

You can't pass a known mean to np.std or np.var, you'll have to wait for the new standard library statistics module, but in the meantime you can save a little time by using the formula:

In [329]: a = np.random.rand(1000)

In [330]: %%timeit
   .....: a.mean()
   .....: a.var()
   .....: 
10000 loops, best of 3: 80.6 µs per loop

In [331]: %%timeit
   .....: m = a.mean()
   .....: np.mean((a-m)**2)
   .....: 
10000 loops, best of 3: 60.9 µs per loop

In [332]: m = a.mean()

In [333]: a.var()
Out[333]: 0.078365856465916137

In [334]: np.mean((a-m)**2)
Out[334]: 0.078365856465916137

If you really are trying to speed things up, try np.dot to do the squaring and summing (since that's what a dot-product is):

In [335]: np.dot(a-m,a-m)/a.size
Out[335]: 0.078365856465916137

In [336]: %%timeit
   .....: m = a.mean()
   .....: c = a-m
   .....: np.dot(c,c)/a.size
   .....: 
10000 loops, best of 3: 38.2 µs per loop
qid & accept id: (19395350, 19395520) query: Opening a text file and then storing the contents into a nested dictionary in python 2.7 soup:
with open(infilepath) as infile:\n  answer = {}\n  name = None\n  for line in infile:\n    line = line.strip()\n    if line.startswith("NGC"):\n      name = line\n      answer[name] = {}\n    else:\n      var, val = line.split(':', 1)\n      answer[name][var.strip()] = val.strip()\n
\n

Output with your text file:

\n
>>> with open(infilepath) as infile:\n...   answer = {}\n...   name = None\n...   for line in infile:\n...     line = line.strip()\n...     if line.startswith("NGC"):\n...       name = line\n...       answer[name] = {}\n...     else:\n...       var, val = line.split(':', 1)\n...       answer[name][var.strip()] = val.strip()\n... \n>>> answer\n{'NGC6853': {'Messier': 'M27', 'Magnitude': '7.4', 'Distance': '1.25', 'Name': 'Dumbbell Nebula'}, 'NGC4254': {'Brightness': '9.9 mag', 'Messier': 'M99', 'Distance': '60000', 'Name': 'Coma Pinwheel Galaxy'}, 'NGC4594': {'Messier': 'M104', 'Distance': '50000', 'Name': 'Sombrero Galaxy'}, 'NGC0224': {'Messier': 'M31', 'Magnitude': '3.4', 'Distance': '2900', 'Name': 'Andromeda Galaxy'}, 'NGC4826': {'Messier': 'M64', 'Magnitude': '8.5', 'Distance': '19000', 'Name': 'Black Eye Galaxy'}, 'NGC5457': {'Messier': 'M101', 'Magnitude': '7.9', 'Distance': '27000', 'Name': 'Pinwheel Galaxy'}}\n
\n soup wrap:
with open(infilepath) as infile:
  answer = {}
  name = None
  for line in infile:
    line = line.strip()
    if line.startswith("NGC"):
      name = line
      answer[name] = {}
    else:
      var, val = line.split(':', 1)
      answer[name][var.strip()] = val.strip()

Output with your text file:

>>> with open(infilepath) as infile:
...   answer = {}
...   name = None
...   for line in infile:
...     line = line.strip()
...     if line.startswith("NGC"):
...       name = line
...       answer[name] = {}
...     else:
...       var, val = line.split(':', 1)
...       answer[name][var.strip()] = val.strip()
... 
>>> answer
{'NGC6853': {'Messier': 'M27', 'Magnitude': '7.4', 'Distance': '1.25', 'Name': 'Dumbbell Nebula'}, 'NGC4254': {'Brightness': '9.9 mag', 'Messier': 'M99', 'Distance': '60000', 'Name': 'Coma Pinwheel Galaxy'}, 'NGC4594': {'Messier': 'M104', 'Distance': '50000', 'Name': 'Sombrero Galaxy'}, 'NGC0224': {'Messier': 'M31', 'Magnitude': '3.4', 'Distance': '2900', 'Name': 'Andromeda Galaxy'}, 'NGC4826': {'Messier': 'M64', 'Magnitude': '8.5', 'Distance': '19000', 'Name': 'Black Eye Galaxy'}, 'NGC5457': {'Messier': 'M101', 'Magnitude': '7.9', 'Distance': '27000', 'Name': 'Pinwheel Galaxy'}}
qid & accept id: (19431099, 19431100) query: How to directly set RGB/RGBA colors in mayavi soup:

As far as I am aware, there is no documentation for doing this, but I have found a way to do it with only a minimum amount of hacking around. Here is a minimal example, which might require a little tinkering for different kinds of sources:

\n
from tvtk.api import tvtk; from mayavi import mlab; import numpy as np\n\nx,y,z=np.random.random((3,nr_points)) #some data\ncolors=np.random.randint(256,size=(100,3)) #some RGB or RGBA colors\n\npts=mlab.points3d(x,y,z)\nsc=tvtk.UnsignedCharArray()\nsc.from_array(colors)\n\npts.mlab_source.dataset.point_data.scalars=sc\npts.mlab_source.dataset.modified()\n
\n

It also looks like sometimes you have to ensure that the mapper points to the right thing. This is not necessary for the above example, but it may be for other sources

\n
pts.actor.mapper.input=pts.mlab_source.dataset\n
\n

At some point the mayavi API should be fixed better so that there is an API exposed to just do this for all the pipeline functions, but that turns out to be a rather complicated and sweeping set of changes which I don't currently have time to finish.

\n

Edit:\nUser eqzx posted an answer to another question (Specify absolute colour for 3D points in MayaVi) which may be simpler, especially for certain source types that are hard to get to work with tvtk.UnsignedCharArray.

\n

His idea is to create a LUT spanning the entire range of 256x256x256 RGB values. Note that this LUT therefore has 16,777,216 entries. Which, if you wanted to use it in many vtk objects, may waste quite a lot of memory if you are not careful.

\n
#create direct grid as 256**3 x 4 array \ndef create_8bit_rgb_lut():\n    xl = numpy.mgrid[0:256, 0:256, 0:256]\n    lut = numpy.vstack((xl[0].reshape(1, 256**3),\n                        xl[1].reshape(1, 256**3),\n                        xl[2].reshape(1, 256**3),\n                        255 * numpy.ones((1, 256**3)))).T\n    return lut.astype('int32')\n\n# indexing function to above grid\ndef rgb_2_scalar_idx(r, g, b):\n    return 256**2 *r + 256 * g + b\n\n#N x 3 colors\ncolors = numpy.array([_.color for _ in points])\n\n#N scalars\nscalars = numpy.zeros((colors.shape[0],))\n\nfor (kp_idx, kp_c) in enumerate(colors):\n    scalars[kp_idx] = rgb_2_scalar_idx(kp_c[0], kp_c[1], kp_c[2])\n\nrgb_lut = create_8bit_rgb_lut()\n\npoints_mlab = mayavi.mlab.points3d(x, y, z\n                                   keypoint_scalars,\n                                   mode = 'point')\n\n#magic to modify lookup table \npoints_mlab.module_manager.scalar_lut_manager.lut._vtk_obj.SetTableRange(0, rgb_lut.shape[0])\npoints_mlab.module_manager.scalar_lut_manager.lut.number_of_colors = rgb_lut.shape[0]\npoints_mlab.module_manager.scalar_lut_manager.lut.table = rgb_lut\n
\n soup wrap:

As far as I am aware, there is no documentation for doing this, but I have found a way to do it with only a minimum amount of hacking around. Here is a minimal example, which might require a little tinkering for different kinds of sources:

from tvtk.api import tvtk; from mayavi import mlab; import numpy as np

x,y,z=np.random.random((3,nr_points)) #some data
colors=np.random.randint(256,size=(100,3)) #some RGB or RGBA colors

pts=mlab.points3d(x,y,z)
sc=tvtk.UnsignedCharArray()
sc.from_array(colors)

pts.mlab_source.dataset.point_data.scalars=sc
pts.mlab_source.dataset.modified()

It also looks like sometimes you have to ensure that the mapper points to the right thing. This is not necessary for the above example, but it may be for other sources

pts.actor.mapper.input=pts.mlab_source.dataset

At some point the mayavi API should be fixed better so that there is an API exposed to just do this for all the pipeline functions, but that turns out to be a rather complicated and sweeping set of changes which I don't currently have time to finish.

Edit: User eqzx posted an answer to another question (Specify absolute colour for 3D points in MayaVi) which may be simpler, especially for certain source types that are hard to get to work with tvtk.UnsignedCharArray.

His idea is to create a LUT spanning the entire range of 256x256x256 RGB values. Note that this LUT therefore has 16,777,216 entries. Which, if you wanted to use it in many vtk objects, may waste quite a lot of memory if you are not careful.

#create direct grid as 256**3 x 4 array 
def create_8bit_rgb_lut():
    xl = numpy.mgrid[0:256, 0:256, 0:256]
    lut = numpy.vstack((xl[0].reshape(1, 256**3),
                        xl[1].reshape(1, 256**3),
                        xl[2].reshape(1, 256**3),
                        255 * numpy.ones((1, 256**3)))).T
    return lut.astype('int32')

# indexing function to above grid
def rgb_2_scalar_idx(r, g, b):
    return 256**2 *r + 256 * g + b

#N x 3 colors
colors = numpy.array([_.color for _ in points])

#N scalars
scalars = numpy.zeros((colors.shape[0],))

for (kp_idx, kp_c) in enumerate(colors):
    scalars[kp_idx] = rgb_2_scalar_idx(kp_c[0], kp_c[1], kp_c[2])

rgb_lut = create_8bit_rgb_lut()

points_mlab = mayavi.mlab.points3d(x, y, z
                                   keypoint_scalars,
                                   mode = 'point')

#magic to modify lookup table 
points_mlab.module_manager.scalar_lut_manager.lut._vtk_obj.SetTableRange(0, rgb_lut.shape[0])
points_mlab.module_manager.scalar_lut_manager.lut.number_of_colors = rgb_lut.shape[0]
points_mlab.module_manager.scalar_lut_manager.lut.table = rgb_lut
qid & accept id: (19461747, 19461818) query: Sum corresponding elements of multiple python dictionaries soup:

collections.Counter() to the rescue ;-)

\n
from collections import Counter\ndicts = [{'a':1, 'b':4, 'c':8, 'd':9},\n         {'a':2, 'b':3, 'c':2, 'd':7},\n         {'a':0, 'b':1, 'c':3, 'd':4}]\nc = Counter()\nfor d in dicts:\n    c.update(d)\n
\n

Then:

\n
>>> print c\nCounter({'d': 20, 'c': 13, 'b': 8, 'a': 3})\n
\n

Or you can change it back to a dict:

\n
>>> print dict(c)\n{'a': 3, 'c': 13, 'b': 8, 'd': 20}\n
\n

It doesn't matter to Counter() whether all the input dicts have same keys. If you know for sure that they do, you could try ridiculous ;-) one-liners like this:

\n
d = {k: v for k in dicts[0] for v in [sum(d[k] for d in dicts)]}\n
\n

Counter() is clearer, faster, and more flexible. To be fair, though, this slightly less ridiculous one-liner is less ridiculous:

\n
d = {k: sum(d[k] for d in dicts) for k in dicts[0]}\n
\n soup wrap:

collections.Counter() to the rescue ;-)

from collections import Counter
dicts = [{'a':1, 'b':4, 'c':8, 'd':9},
         {'a':2, 'b':3, 'c':2, 'd':7},
         {'a':0, 'b':1, 'c':3, 'd':4}]
c = Counter()
for d in dicts:
    c.update(d)

Then:

>>> print c
Counter({'d': 20, 'c': 13, 'b': 8, 'a': 3})

Or you can change it back to a dict:

>>> print dict(c)
{'a': 3, 'c': 13, 'b': 8, 'd': 20}

It doesn't matter to Counter() whether all the input dicts have same keys. If you know for sure that they do, you could try ridiculous ;-) one-liners like this:

d = {k: v for k in dicts[0] for v in [sum(d[k] for d in dicts)]}

Counter() is clearer, faster, and more flexible. To be fair, though, this slightly less ridiculous one-liner is less ridiculous:

d = {k: sum(d[k] for d in dicts) for k in dicts[0]}
qid & accept id: (19522263, 19525913) query: Load all third party scripts using requests or mechanize in Python soup:

BeautifulSoup

\n

Use BeautifulSoup4 to get all the , , and \n \n \n """\n\nsoup1 = BeautifulSoup(html2)\nvalue = soup1.body.extract()\n\ndiv.append(value)\nprint div\n\n

And the output is :

\n
\n\n\n
\n
\n

If you want the content inside the body you can do it something like this instead :

\n
#the above same lines\n\nsoup1 = BeautifulSoup(html2)\nvalue = soup1.body.extract()\n\ndiv.append(value)\n# replaces a tag with whatever’s inside that tag.\ndiv.body.unwrap()\nprint div\n
\n

And the output is :

\n
\n\n\n
\n
\n soup wrap:

You could do it something like this :

from bs4 import BeautifulSoup

html = """
""" soup = BeautifulSoup(html) div = soup.find("div", id="here") html2 = """ """ soup1 = BeautifulSoup(html2) value = soup1.body.extract() div.append(value) print div

And the output is :

If you want the content inside the body you can do it something like this instead :

#the above same lines

soup1 = BeautifulSoup(html2)
value = soup1.body.extract()

div.append(value)
# replaces a tag with whatever’s inside that tag.
div.body.unwrap()
print div

And the output is :

qid & accept id: (22042490, 22042550) query: How do I add a method to a class from a third-party Python module without editing the original module soup:

Python is highly modifiable. Just add your function to the class:

\n
from mpl_toolkits.basemap import Basemap\n\n\ndef drawmlat(self, arg1, arg2, kw=something):\n    pass\n\nBasemap.drawmlat = drawmlat\n
\n

Now the Basemap class has a drawmlat method; call it on instances and self will be bound to the instance object. When looking up the method on instances, the function will automatically be bound as a method for you.

\n

Anything defined in the Basemap.__init__ method that you need to care about are attributes on self.

\n

Having looked over the mpl_toolkits.basemap.__init__ module, I do see that the drawparallel method relies on a few globals; you can import those from the module into your own namespace:

\n
from mpl_toolkits.basemap import Basemap, _cylproj, _pseudocyl\n
\n

This is no different from other imports you'd make; the original drawparallel method also relies on import numpy as np and from matplotlib.lines import Line2D, which make both np and Line2D globals in the original module.

\n soup wrap:

Python is highly modifiable. Just add your function to the class:

from mpl_toolkits.basemap import Basemap


def drawmlat(self, arg1, arg2, kw=something):
    pass

Basemap.drawmlat = drawmlat

Now the Basemap class has a drawmlat method; call it on instances and self will be bound to the instance object. When looking up the method on instances, the function will automatically be bound as a method for you.

Anything defined in the Basemap.__init__ method that you need to care about are attributes on self.

Having looked over the mpl_toolkits.basemap.__init__ module, I do see that the drawparallel method relies on a few globals; you can import those from the module into your own namespace:

from mpl_toolkits.basemap import Basemap, _cylproj, _pseudocyl

This is no different from other imports you'd make; the original drawparallel method also relies on import numpy as np and from matplotlib.lines import Line2D, which make both np and Line2D globals in the original module.

qid & accept id: (22048792, 22049405) query: How do I display dates when plotting in matplotlib.pyplot? soup:

According to efiring, matplotlib does not support NumPy datetime64 objects (at least not yet). Therefore, convert x to Python datetime.datetime objects:

\n
x = x.astype(DT.datetime)\n
\n

Next, you can specify the x-axis tick mark formatter like this:

\n
xfmt = mdates.DateFormatter('%b %d')\nax.xaxis.set_major_formatter(xfmt)\n
\n
\n
import matplotlib.pyplot as plt\nimport matplotlib.dates as mdates\nimport datetime as DT\nimport numpy as np\n\nx = np.array([DT.datetime(2013, 9, i).strftime("%Y-%m-%d") for i in range(1,5)], \n            dtype='datetime64')\nx = x.astype(DT.datetime)\ny = np.array([1,-1,7,-3])\nfig, ax = plt.subplots()\nax.plot(x, y)\nax.axhline(linewidth=4, color='r')\nxfmt = mdates.DateFormatter('%b %d')\nax.xaxis.set_major_formatter(xfmt)\nplt.show()\n
\n

enter image description here

\n soup wrap:

According to efiring, matplotlib does not support NumPy datetime64 objects (at least not yet). Therefore, convert x to Python datetime.datetime objects:

x = x.astype(DT.datetime)

Next, you can specify the x-axis tick mark formatter like this:

xfmt = mdates.DateFormatter('%b %d')
ax.xaxis.set_major_formatter(xfmt)

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as DT
import numpy as np

x = np.array([DT.datetime(2013, 9, i).strftime("%Y-%m-%d") for i in range(1,5)], 
            dtype='datetime64')
x = x.astype(DT.datetime)
y = np.array([1,-1,7,-3])
fig, ax = plt.subplots()
ax.plot(x, y)
ax.axhline(linewidth=4, color='r')
xfmt = mdates.DateFormatter('%b %d')
ax.xaxis.set_major_formatter(xfmt)
plt.show()

enter image description here

qid & accept id: (22083378, 22083625) query: how to forward fill non-null values in a pandas dataframe based on a set condition soup:

One way is to replace the lower zeros with NaNs:

\n
In [11]: df.replace(0, np.nan).bfill()  # maybe neater way to do this?\nOut[11]:\n             a   b   c\n2000-03-02   1   1   1\n2000-03-03   1   1   1\n2000-03-04   1   1   1\n2000-03-05   1 NaN NaN\n2000-03-06 NaN NaN NaN\n2000-03-07 NaN NaN NaN\n
\n

Now you can use where to change these to 2:

\n
In [12]: df.where(df.replace(0, np.nan).bfill(), 2)\nOut[12]:\n            a  b  c\n2000-03-02  0  0  0\n2000-03-03  0  0  1\n2000-03-04  0  1  1\n2000-03-05  1  2  2\n2000-03-06  2  2  2\n2000-03-07  2  2  2\n
\n

Edit: it may be faster to use a trick here with cumsum:

\n
In [21]: %timeit df.where(df.replace(0, np.nan).bfill(), 2)\n100 loops, best of 3: 2.34 ms per loop\n\nIn [22]: %timeit df.where(df[::-1].cumsum()[::-1], 2)\n1000 loops, best of 3: 1.7 ms per loop\n\nIn [23]: %timeit pd.DataFrame(np.where(np.cumsum(df.values[::-1], 0)[::-1], df.values, 2), df.index)\n10000 loops, best of 3: 186 µs per loop\n
\n soup wrap:

One way is to replace the lower zeros with NaNs:

In [11]: df.replace(0, np.nan).bfill()  # maybe neater way to do this?
Out[11]:
             a   b   c
2000-03-02   1   1   1
2000-03-03   1   1   1
2000-03-04   1   1   1
2000-03-05   1 NaN NaN
2000-03-06 NaN NaN NaN
2000-03-07 NaN NaN NaN

Now you can use where to change these to 2:

In [12]: df.where(df.replace(0, np.nan).bfill(), 2)
Out[12]:
            a  b  c
2000-03-02  0  0  0
2000-03-03  0  0  1
2000-03-04  0  1  1
2000-03-05  1  2  2
2000-03-06  2  2  2
2000-03-07  2  2  2

Edit: it may be faster to use a trick here with cumsum:

In [21]: %timeit df.where(df.replace(0, np.nan).bfill(), 2)
100 loops, best of 3: 2.34 ms per loop

In [22]: %timeit df.where(df[::-1].cumsum()[::-1], 2)
1000 loops, best of 3: 1.7 ms per loop

In [23]: %timeit pd.DataFrame(np.where(np.cumsum(df.values[::-1], 0)[::-1], df.values, 2), df.index)
10000 loops, best of 3: 186 µs per loop
qid & accept id: (22118136, 30714721) query: NLTK: Find contexts of size 2k for a word soup:

The simplest, nltk-ish way to do this is with nltk.ngrams().

\n
words = nltk.corpus.brown.words()\nk = 5\nfor ngram in nltk.ngrams(words, 2*k+1, pad_left=True, pad_right=True, pad_symbol=" "):\n    if ngram[k+1].lower() == "settle":\n        print(" ".join(ngram))\n
\n

pad_left and pad_right ensure that all words get looked at. This is important if you don't let your concordances span across sentences (hence: lots of boundary cases).

\n

If you want to ignore punctuation in the window size, you can strip it before scanning:

\n
words = (w for w in nltk.corpus.brown.words() if re.search(r"\w", w))\n
\n soup wrap:

The simplest, nltk-ish way to do this is with nltk.ngrams().

words = nltk.corpus.brown.words()
k = 5
for ngram in nltk.ngrams(words, 2*k+1, pad_left=True, pad_right=True, pad_symbol=" "):
    if ngram[k+1].lower() == "settle":
        print(" ".join(ngram))

pad_left and pad_right ensure that all words get looked at. This is important if you don't let your concordances span across sentences (hence: lots of boundary cases).

If you want to ignore punctuation in the window size, you can strip it before scanning:

words = (w for w in nltk.corpus.brown.words() if re.search(r"\w", w))
qid & accept id: (22119987, 22120033) query: Returning all keys that have the same corresponding value in a dictionary with python soup:

After getting the maximum, you can check each key of their values. This comprehension list returns a list of keys where the value associated if the same as aiMove2.

\n
keys = [x for x,y in posValueD.items() if y == posValueD[aiMove2]]\n
\n

Here's an example in Python shell:

\n
>>> a = {'a':1, 'b':2, 'c':2}\n>>> [x for x,y in a.items() if y == 2]\n['c', 'b']\n
\n soup wrap:

After getting the maximum, you can check each key of their values. This comprehension list returns a list of keys where the value associated if the same as aiMove2.

keys = [x for x,y in posValueD.items() if y == posValueD[aiMove2]]

Here's an example in Python shell:

>>> a = {'a':1, 'b':2, 'c':2}
>>> [x for x,y in a.items() if y == 2]
['c', 'b']
qid & accept id: (22122385, 22122403) query: Nested Regular Expression in Python for soup:
r1 = re.compile(r'SO ON')\nr2 = re.compile(r'WHATEVER AND (%s)*' % r1.pattern)\n
\n

This isn't actually using any special feature of regex, it's using string formatting. Multiple strings can be passed in as:

\n
r'WHATEVER AND (%s) (%s)' % (r1.pattern, 'hello')\n
\n soup wrap:
r1 = re.compile(r'SO ON')
r2 = re.compile(r'WHATEVER AND (%s)*' % r1.pattern)

This isn't actually using any special feature of regex, it's using string formatting. Multiple strings can be passed in as:

r'WHATEVER AND (%s) (%s)' % (r1.pattern, 'hello')
qid & accept id: (22134243, 22134290) query: Recreating builtin s.find('substring') function soup:

Your loop should work just fine, but you are not doing anything with the str2 in str1[..] test and are not calculating the end point correctly; you want to use the length of str2 here, really.

\n

You could loop directly over str1 and add indices with the enumerate() function. You need to add the len(str2) result to i to find the endpoint, and make it print out the test; I used == here as the resulting slice should be the same string:

\n
for i, char in enumerate(str1):\n    if str2[0] == char:\n        print("found first instance of letter at,", i)\n        print(str2 == str1[i:i + len(str2)])\n
\n

Demo:

\n
>>> str1 = 'my best test ever!'\n>>> str2 = 'best'\n>>> for i, char in enumerate(str1):\n...     if str2[0] == char:\n...         print("found first instance of letter at, ", i)\n...         print(str2 == str1[i:i + len(str2)])\n... \nfound first instance of letter at,  3\nTrue\n
\n soup wrap:

Your loop should work just fine, but you are not doing anything with the str2 in str1[..] test and are not calculating the end point correctly; you want to use the length of str2 here, really.

You could loop directly over str1 and add indices with the enumerate() function. You need to add the len(str2) result to i to find the endpoint, and make it print out the test; I used == here as the resulting slice should be the same string:

for i, char in enumerate(str1):
    if str2[0] == char:
        print("found first instance of letter at,", i)
        print(str2 == str1[i:i + len(str2)])

Demo:

>>> str1 = 'my best test ever!'
>>> str2 = 'best'
>>> for i, char in enumerate(str1):
...     if str2[0] == char:
...         print("found first instance of letter at, ", i)
...         print(str2 == str1[i:i + len(str2)])
... 
found first instance of letter at,  3
True
qid & accept id: (22158110, 22158239) query: How do I install pip in python 2.7? soup:

You can first install easy_install (which is part of set up tools) from the following location

\n
https://pypi.python.org/pypi/setuptools#windows\n
\n

Right click on the link and save the file ez_setup.py and then run it.

\n

Once that is complete and you have added the scripts to your path variable (C...Python2.7..scripts), you can install pip using

\n
easy_install pip\n
\n

Check out this video for more help.

\n

http://www.youtube.com/watch?v=MIHYflJwyLk&feature=youtu.be

\n soup wrap:

You can first install easy_install (which is part of set up tools) from the following location

https://pypi.python.org/pypi/setuptools#windows

Right click on the link and save the file ez_setup.py and then run it.

Once that is complete and you have added the scripts to your path variable (C...Python2.7..scripts), you can install pip using

easy_install pip

Check out this video for more help.

http://www.youtube.com/watch?v=MIHYflJwyLk&feature=youtu.be

qid & accept id: (22161088, 22173545) query: How to extract a file within a folder within a zip? soup:

Here's something that seems to work. There were several issues with your code. As I mentioned in a comment, the zipfile must be opened with mode 'r' in order to read it. Another is that zip archive member names always use forward slash / characters in their path names as separators (see section 4.4.17.1 of the PKZIP Application Note). It's important to be aware that there's no way to extract a nested archive member to a different subdirectory with Python's currentzipfilemodule. You can control the root directory, but nothing below it (i.e. any subfolders within the zip).

\n

Lastly, since it's not necessary to rename the .pages file to .zip — the filename you passZipFile() can have any extension — I removed all that from the code. However, to overcome the limitation on extracting members to a different subdirectory, I had to add code to first extract the target member to a temporary directory, and then copy that to the final destination. Afterwards, of course, this temporary folder needs to deleted. So I'm not sure the net result is much simpler...

\n
import os.path\nimport shutil\nimport sys\nimport tempfile\nfrom zipfile import ZipFile\n\nPREVIEW_PATH = 'QuickLooks/Preview.pdf'  # archive member path\npages_file = input('Enter the path to the .pages file in question: ')\n#pages_file = r'C:\Stack Overflow\extract_test.pages'  # hardcode for testing\npages_file = os.path.abspath(pages_file)\nfilename, file_extension = os.path.splitext(pages_file)\nif file_extension == ".pages":\n    tempdir = tempfile.gettempdir()\n    temp_filename = os.path.join(tempdir, PREVIEW_PATH)\n    with ZipFile(pages_file, 'r') as zipfile:\n        zipfile.extract(PREVIEW_PATH, tempdir)\n    if not os.path.isfile(temp_filename):  # extract failure?\n        sys.exit('unable to extract {} from {}'.format(PREVIEW_PATH, pages_file))\n    final_PDF = filename + '.pdf'\n    shutil.copy2(temp_filename, final_PDF)  # copy and rename extracted file\n    # delete the temporary subdirectory created (along with pdf file in it)\n    shutil.rmtree(os.path.join(tempdir, os.path.split(PREVIEW_PATH)[0]))\n    print('Check out the PDF! It\'s located at "{}".'.format(final_PDF))\n    #view_file(final_PDF)  # see Bonus below\nelse:\n    sys.exit('Sorry, that isn\'t a .pages file.')\n
\n

Bonus: If you'd like to actually view the final pdf file from the script, you can add the following function and use it on the final pdf created (assuming you have a PDF viewer application installed on your system):

\n
import subprocess\ndef view_file(filepath):\n    subprocess.Popen(filepath, shell=True).wait()\n
\n soup wrap:

Here's something that seems to work. There were several issues with your code. As I mentioned in a comment, the zipfile must be opened with mode 'r' in order to read it. Another is that zip archive member names always use forward slash / characters in their path names as separators (see section 4.4.17.1 of the PKZIP Application Note). It's important to be aware that there's no way to extract a nested archive member to a different subdirectory with Python's currentzipfilemodule. You can control the root directory, but nothing below it (i.e. any subfolders within the zip).

Lastly, since it's not necessary to rename the .pages file to .zip — the filename you passZipFile() can have any extension — I removed all that from the code. However, to overcome the limitation on extracting members to a different subdirectory, I had to add code to first extract the target member to a temporary directory, and then copy that to the final destination. Afterwards, of course, this temporary folder needs to deleted. So I'm not sure the net result is much simpler...

import os.path
import shutil
import sys
import tempfile
from zipfile import ZipFile

PREVIEW_PATH = 'QuickLooks/Preview.pdf'  # archive member path
pages_file = input('Enter the path to the .pages file in question: ')
#pages_file = r'C:\Stack Overflow\extract_test.pages'  # hardcode for testing
pages_file = os.path.abspath(pages_file)
filename, file_extension = os.path.splitext(pages_file)
if file_extension == ".pages":
    tempdir = tempfile.gettempdir()
    temp_filename = os.path.join(tempdir, PREVIEW_PATH)
    with ZipFile(pages_file, 'r') as zipfile:
        zipfile.extract(PREVIEW_PATH, tempdir)
    if not os.path.isfile(temp_filename):  # extract failure?
        sys.exit('unable to extract {} from {}'.format(PREVIEW_PATH, pages_file))
    final_PDF = filename + '.pdf'
    shutil.copy2(temp_filename, final_PDF)  # copy and rename extracted file
    # delete the temporary subdirectory created (along with pdf file in it)
    shutil.rmtree(os.path.join(tempdir, os.path.split(PREVIEW_PATH)[0]))
    print('Check out the PDF! It\'s located at "{}".'.format(final_PDF))
    #view_file(final_PDF)  # see Bonus below
else:
    sys.exit('Sorry, that isn\'t a .pages file.')

Bonus: If you'd like to actually view the final pdf file from the script, you can add the following function and use it on the final pdf created (assuming you have a PDF viewer application installed on your system):

import subprocess
def view_file(filepath):
    subprocess.Popen(filepath, shell=True).wait()
qid & accept id: (22214949, 22215011) query: Generate numbers with 3 digits soup:
["{0:03}".format(i) for i in range(121)]\n
\n

or

\n
["%03d" % i for i in range(121)]\n
\n

To print:

\n
print "\n".join()\n
\n soup wrap:
["{0:03}".format(i) for i in range(121)]

or

["%03d" % i for i in range(121)]

To print:

print "\n".join()
qid & accept id: (22221858, 22222053) query: Compare string in format HH:MM to time now in python soup:

You can use datetimes's strptime() function to convert the string to a valid datetime:

\n
>>>d=datetime.datetime.strptime('15:30','%H:%M')\n
\n

and later compare it to now's time():

\n
>>>dnow=datetime.datetime.now()  #11:42 am here ;)\n>>>dnow.time() < d.time()\nTrue\n
\n

You can read also doc's strftime() and strptime() Behavior which explained these methods and have a very good table resuming the directives to parse dates.

\n soup wrap:

You can use datetimes's strptime() function to convert the string to a valid datetime:

>>>d=datetime.datetime.strptime('15:30','%H:%M')

and later compare it to now's time():

>>>dnow=datetime.datetime.now()  #11:42 am here ;)
>>>dnow.time() < d.time()
True

You can read also doc's strftime() and strptime() Behavior which explained these methods and have a very good table resuming the directives to parse dates.

qid & accept id: (22281406, 22281577) query: Python drag and drop, get filenames soup:

If you don't need a GUI, and depending on the platform I would use sys.argv.

\n

In windows for example you can't drag files onto python scripts, but you can drag them unto a batch file. And from the batch file you can call your script with the file-names as arguments.

\n

Batch File:

\n
python "dropScript.py" %*\npause\n
\n

The %* contains all the filenames.

\n

dropScript.py:

\n
import sys\n\nfile_paths = sys.argv[1:]\nfor p in file_paths:\n    print(p)\n
\n

The first argument is the script itself so it is omitted from the list.

\n soup wrap:

If you don't need a GUI, and depending on the platform I would use sys.argv.

In windows for example you can't drag files onto python scripts, but you can drag them unto a batch file. And from the batch file you can call your script with the file-names as arguments.

Batch File:

python "dropScript.py" %*
pause

The %* contains all the filenames.

dropScript.py:

import sys

file_paths = sys.argv[1:]
for p in file_paths:
    print(p)

The first argument is the script itself so it is omitted from the list.

qid & accept id: (22328160, 22328451) query: python string to date ISO 8601 soup:

You may want to look at the datetime module. Using its date formatting functions, you can do something like this:

\n
>>> import datetime as dt\n>>> ds = '0104160F'\n>>> parsed = dt.datetime.strptime(ds, "%y%m%d0F")\n>>> parsed\ndatetime.datetime(2001, 4, 16, 0, 0)    \n>>> reformatted = dt.datetime.strftime(parsed, "%Y-%m-%d")\n>>> reformatted\n'20010416'\n
\n

In your function, you can use these as follows:

\n
def YYMMDD0FtoYYYYMMDD(date):\n    return dt.datetime.strftime(dt.datetime.strptime(date, "%y%m%d0F"), "%Y-%m-%d")\n
\n soup wrap:

You may want to look at the datetime module. Using its date formatting functions, you can do something like this:

>>> import datetime as dt
>>> ds = '0104160F'
>>> parsed = dt.datetime.strptime(ds, "%y%m%d0F")
>>> parsed
datetime.datetime(2001, 4, 16, 0, 0)    
>>> reformatted = dt.datetime.strftime(parsed, "%Y-%m-%d")
>>> reformatted
'20010416'

In your function, you can use these as follows:

def YYMMDD0FtoYYYYMMDD(date):
    return dt.datetime.strftime(dt.datetime.strptime(date, "%y%m%d0F"), "%Y-%m-%d")
qid & accept id: (22332069, 22335257) query: Python convert single column of data into multiple columns soup:
from io import StringIO\nfrom collections import OrderedDict\n\ndatastring = StringIO(u"""\\n# row = 0\n9501.7734375\n9279.390625\n8615.1640625\n# row = 1\n4396.1953125\n4197.1796875\n3994.4296875\n# row = 2\n9088.046875\n8680.6953125\n8253.0546875\n""")      \n\ncontent = datastring.readlines()\nout = OrderedDict()\nfinal = []\n\nfor line in content:\n    if line.startswith('# row'):\n        header = line.strip('\n#')\n        out[header] = []\n    elif line not in out[header]:\n        out[header].append(line.strip('\n'))\n\n\nfor k, v in out.iteritems():\n    temp = (k + ',' + ','.join([str(item) for item in v])).split(',')\n    final.append(temp)\n\nfinal = zip(*final)\nwith open("C:/temp/output.csv", 'w') as fout:\n    for item in final:\n    fout.write('\t'.join([str(i) for i in item]))\n
\n

Output:

\n
 row = 0         row = 1        row = 2\n9501.7734375    4396.1953125    9088.046875\n9279.390625     4197.1796875    8680.6953125\n8615.1640625    3994.4296875    8253.0546875\n
\n soup wrap:
from io import StringIO
from collections import OrderedDict

datastring = StringIO(u"""\
# row = 0
9501.7734375
9279.390625
8615.1640625
# row = 1
4396.1953125
4197.1796875
3994.4296875
# row = 2
9088.046875
8680.6953125
8253.0546875
""")      

content = datastring.readlines()
out = OrderedDict()
final = []

for line in content:
    if line.startswith('# row'):
        header = line.strip('\n#')
        out[header] = []
    elif line not in out[header]:
        out[header].append(line.strip('\n'))


for k, v in out.iteritems():
    temp = (k + ',' + ','.join([str(item) for item in v])).split(',')
    final.append(temp)

final = zip(*final)
with open("C:/temp/output.csv", 'w') as fout:
    for item in final:
    fout.write('\t'.join([str(i) for i in item]))

Output:

 row = 0         row = 1        row = 2
9501.7734375    4396.1953125    9088.046875
9279.390625     4197.1796875    8680.6953125
8615.1640625    3994.4296875    8253.0546875
qid & accept id: (22346807, 22617060) query: How to avoid rebuilding existing wheels when using pip? soup:

I've been using the option

\n
    --find-links=/tmp\n
\n

where /tmp is the wheelhouse. This seems to actually check the wheelhouse and not re-download things. Using your example, try this:

\n
    pip wheel --find-links=/tmp --wheel-dir=/tmp Cython==0.19.2\n
\n soup wrap:

I've been using the option

    --find-links=/tmp

where /tmp is the wheelhouse. This seems to actually check the wheelhouse and not re-download things. Using your example, try this:

    pip wheel --find-links=/tmp --wheel-dir=/tmp Cython==0.19.2
qid & accept id: (22362010, 22362436) query: Using groupby to operate only on rows that have the same value for one of the columns pandas python soup:

There's probably a more efficient way, (and you could write this much more readably) but you could always do something like:

\n
import pandas as pd\n\norg = ['doclist[0]', 'doclist[0]', 'doclist[1]', 'doclist[4]', 'doclist[4]']\nnp = [0, 1, 1, 1, 0]\npr = [0, 0, 0, 0, 1]\ndf = pd.DataFrame({'Organization':org, 'NP':np, 'Pr':pr})\n\n# Make a "lookup" dataframe of the sums for each category\n# (Both the "NP" and "Pr" colums of "sums" will contain the result)\nsums = df.groupby('Organization').agg(lambda x: x['NP'].sum() + x['Pr'].sum())\n\n# Lookup the result based on the contents of the "Organization" row\ndf['Sum'] = df.apply(lambda row: sums.ix[row['Organization']]['NP'], axis=1)\n
\n
\n

That's rather unreadable, so it might be a bit clearer to write it this way:

\n
import pandas as pd\n\norg = ['doclist[0]', 'doclist[0]', 'doclist[1]', 'doclist[4]', 'doclist[4]']\nnp = [0, 1, 1, 1, 0]\npr = [0, 0, 0, 0, 1]\ndf = pd.DataFrame({'Organization':org, 'NP':np, 'Pr':pr})\n\n# Make a "lookup" dataframe of the sums for each category\nlookup = df.groupby('Organization').agg(lambda x: x['NP'].sum() + x['Pr'].sum())\n\n# Lookup the result based on the contents of the "Organization" row\n# The "lookup" dataframe will have the relevant sum in _both_ "NP" and "Pr"\ndef func(row):\n    org = row['Organization']\n    group_sum = lookup.ix[org]['NP']\n    return group_sum\ndf['Sum'] = df.apply(func, axis=1)\n
\n

Incidentally, @DSM's looks like a much better way to do this.

\n soup wrap:

There's probably a more efficient way, (and you could write this much more readably) but you could always do something like:

import pandas as pd

org = ['doclist[0]', 'doclist[0]', 'doclist[1]', 'doclist[4]', 'doclist[4]']
np = [0, 1, 1, 1, 0]
pr = [0, 0, 0, 0, 1]
df = pd.DataFrame({'Organization':org, 'NP':np, 'Pr':pr})

# Make a "lookup" dataframe of the sums for each category
# (Both the "NP" and "Pr" colums of "sums" will contain the result)
sums = df.groupby('Organization').agg(lambda x: x['NP'].sum() + x['Pr'].sum())

# Lookup the result based on the contents of the "Organization" row
df['Sum'] = df.apply(lambda row: sums.ix[row['Organization']]['NP'], axis=1)

That's rather unreadable, so it might be a bit clearer to write it this way:

import pandas as pd

org = ['doclist[0]', 'doclist[0]', 'doclist[1]', 'doclist[4]', 'doclist[4]']
np = [0, 1, 1, 1, 0]
pr = [0, 0, 0, 0, 1]
df = pd.DataFrame({'Organization':org, 'NP':np, 'Pr':pr})

# Make a "lookup" dataframe of the sums for each category
lookup = df.groupby('Organization').agg(lambda x: x['NP'].sum() + x['Pr'].sum())

# Lookup the result based on the contents of the "Organization" row
# The "lookup" dataframe will have the relevant sum in _both_ "NP" and "Pr"
def func(row):
    org = row['Organization']
    group_sum = lookup.ix[org]['NP']
    return group_sum
df['Sum'] = df.apply(func, axis=1)

Incidentally, @DSM's looks like a much better way to do this.

qid & accept id: (22384398, 22384521) query: Using variable as keyword passed to **kwargs in Python soup:

Rather than passing the parameter named as field, you can use dictionary unpacking to use the value of field as the name of the parameter:

\n
request = update_by_email(email, **{field: field_value})\n
\n

Using a mock of update_by_email:

\n
def update_by_email(email=None, **kwargs):\n    print(kwargs)\n
\n

When I call

\n
update_field("joe@me.com", "name", "joe")\n
\n

I see that kwargs inside update_by_email is

\n
{'name': 'joe'}\n
\n soup wrap:

Rather than passing the parameter named as field, you can use dictionary unpacking to use the value of field as the name of the parameter:

request = update_by_email(email, **{field: field_value})

Using a mock of update_by_email:

def update_by_email(email=None, **kwargs):
    print(kwargs)

When I call

update_field("joe@me.com", "name", "joe")

I see that kwargs inside update_by_email is

{'name': 'joe'}
qid & accept id: (22394350, 22394606) query: Validity of a string based on some conditions soup:

This will parse all the inputs you wrote using regex, but remember that arithmetic operations are generated by a context-free grammar, so you won't found a regex (only valid for regular languages) that match all existing operations (like (3*(3*2))*(3*1), (3*(3*(3*2)))*(3*1) and so on), you will need to build different ones.

\n
import re\n\nparser1 = re.compile("[0-9]\\*?$")\nparser3 = re.compile("\\([0-9]\\*[0-9]\\)$")\nparser4 = re.compile("(\\([0-9]\\*[0-9]\\)|[0-9])\\*(\\([0-9]\\*[0-9]\\)|[0-9])$")\n\ndef validity(s):\n    valid = False\n\n    # Condition 1 and 2\n    if parser1.match(s):\n        return True\n    # Condition 3\n    if parser3.match(s):\n        return True\n    # Condition 4\n    if parser4.match(s):\n        return True\n\n    return False\n\nprint validity('1') # Condition 1\nprint validity('9') # Condition 1\nprint validity('10') # Doesn't satisfy any of the conditions\nprint validity('1*') # Condition 2\nprint validity('4*') # Condition 2\nprint validity('9*') # Condition 2\nprint validity('10*') # Doesn't satisfy any of the conditions\nprint validity('(3*4)') # Condition 3\nprint validity('(3*9)') # Condition 3\nprint validity('(4*9)') # Condition 3\nprint validity('(10*9)') # Doesn't satisfy any of the conditions\nprint validity('(3*2)*(3*1)') # Condition 4\nprint validity('(3*2)*8') # Condition 4\nprint validity('(3*2)*z') # Doesn't satisfy any of the conditions\n
\n

The outputs here are:

\n
True\nTrue\nFalse\nTrue\nTrue\nTrue\nFalse\nTrue\nTrue\nTrue\nFalse\nTrue\nTrue\nFalse\n
\n soup wrap:

This will parse all the inputs you wrote using regex, but remember that arithmetic operations are generated by a context-free grammar, so you won't found a regex (only valid for regular languages) that match all existing operations (like (3*(3*2))*(3*1), (3*(3*(3*2)))*(3*1) and so on), you will need to build different ones.

import re

parser1 = re.compile("[0-9]\\*?$")
parser3 = re.compile("\\([0-9]\\*[0-9]\\)$")
parser4 = re.compile("(\\([0-9]\\*[0-9]\\)|[0-9])\\*(\\([0-9]\\*[0-9]\\)|[0-9])$")

def validity(s):
    valid = False

    # Condition 1 and 2
    if parser1.match(s):
        return True
    # Condition 3
    if parser3.match(s):
        return True
    # Condition 4
    if parser4.match(s):
        return True

    return False

print validity('1') # Condition 1
print validity('9') # Condition 1
print validity('10') # Doesn't satisfy any of the conditions
print validity('1*') # Condition 2
print validity('4*') # Condition 2
print validity('9*') # Condition 2
print validity('10*') # Doesn't satisfy any of the conditions
print validity('(3*4)') # Condition 3
print validity('(3*9)') # Condition 3
print validity('(4*9)') # Condition 3
print validity('(10*9)') # Doesn't satisfy any of the conditions
print validity('(3*2)*(3*1)') # Condition 4
print validity('(3*2)*8') # Condition 4
print validity('(3*2)*z') # Doesn't satisfy any of the conditions

The outputs here are:

True
True
False
True
True
True
False
True
True
True
False
True
True
False
qid & accept id: (22404273, 22506752) query: EPSG:900913 to WGS 84 projection soup:

Just transform your geometries using ST_Transform (http://postgis.org/docs/ST_Transform.html)

\n
 ST_Transform(geom,4326)\n
\n

If you want to transfrom a whole table do:

\n
 UPDATE table_name SET geom_column = ST_Transform(geom_column,4326)\n
\n soup wrap:

Just transform your geometries using ST_Transform (http://postgis.org/docs/ST_Transform.html)

 ST_Transform(geom,4326)

If you want to transfrom a whole table do:

 UPDATE table_name SET geom_column = ST_Transform(geom_column,4326)
qid & accept id: (22425626, 22425657) query: Python - re - need help for regular expression soup:

Either use a non-greedy quantifier, like this:

\n
re.search('\[(.*?)\]', html_template)\n
\n

Or a character class, like this:

\n
re.search('\[([^\]]*)\]', html_template)\n
\n

And use re.findall to get all matching sub strings.

\n soup wrap:

Either use a non-greedy quantifier, like this:

re.search('\[(.*?)\]', html_template)

Or a character class, like this:

re.search('\[([^\]]*)\]', html_template)

And use re.findall to get all matching sub strings.

qid & accept id: (22462728, 22463459) query: Analyze and edit links in html code with BeautifulSoup soup:

Use replace_with() method:

\n
\n

PageElement.replace_with() removes a tag or string from the tree, and\n replaces it with the tag or string of your choice

\n
\n
# -*- coding: utf-8 -*-\nfrom bs4 import BeautifulSoup\n\nbody = """\ngood link\n\n\n"""\n\nsoup = BeautifulSoup(body, 'html.parser')\n\nlinks = soup.find_all('a')\nfor link in links:\n    link = link.replace_with('')\n\nprint soup.prettify(formatter=None)\n
\n

prints:

\n
\n
    \n \n
\n
\n

Note the import statement - use the 4th BeautifulSoup version since Beautiful Soup 3 is no longer being developed, and that Beautiful Soup 4 is recommended for all new projects.

\n soup wrap:

Use replace_with() method:

PageElement.replace_with() removes a tag or string from the tree, and replaces it with the tag or string of your choice

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

body = """
good link


"""

soup = BeautifulSoup(body, 'html.parser')

links = soup.find_all('a')
for link in links:
    link = link.replace_with('')

print soup.prettify(formatter=None)

prints:


Note the import statement - use the 4th BeautifulSoup version since Beautiful Soup 3 is no longer being developed, and that Beautiful Soup 4 is recommended for all new projects.

qid & accept id: (22484903, 22485122) query: Replacing two elements of a list in place with a function [python 3] soup:

I think what you want is:

\n
def reverse_sublist(lst, start, end):\n    lst[start:end] = reversed(lst[start:end])\n
\n

Note the definition of the three arguments to the function on the first line, in parentheses after the name of the function.

\n

Your current pseudocode appears to be for swapping the items at start and end, which is not what your example shows. If you did want to do this, you could do:

\n
def swap_items(lst, index1, index2):\n    lst[index1], lst[index2] = lst[index2], lst[index1]\n
\n
\n

An additional note on your first approach: if you add extra items into the list you will throw off the indexing for the rest of the list. For example, try this test implementation of your pseudocode:

\n
def test(lst, start, end):\n    lst.insert(start+1, lst[start])\n    lst.insert(end+1, lst[end])\n    print(lst)\n    lst[start] = lst[end+1]\n    lst[end] = lst[start+1]\n    print(lst)\n    del lst[start+1]\n    del lst[end+1]\n    print(lst)\n
\n

The result this gives is not the [5, 2, 3, 4, 1] you are expecting:

\n
>>> test([1, 2, 3, 4, 5], 0, 4)\n[1, 1, 2, 3, 4, 4, 5] # after insert\n[4, 1, 2, 3, 1, 4, 5] # after swap\n[4, 2, 3, 1, 4] # after del\n
\n

Instead, you would have to do the insert and swap on end+2 and end+1 to account for the extra item from the insert at start+1:

\n
def test(lst, start, end):\n    lst.insert(start+1, lst[start])\n    lst.insert(end+2, lst[end+1])\n    lst[start], lst[end+1] = lst[end+2], lst[start+1]\n    del lst[start+1]\n    del lst[end+1]\n
\n soup wrap:

I think what you want is:

def reverse_sublist(lst, start, end):
    lst[start:end] = reversed(lst[start:end])

Note the definition of the three arguments to the function on the first line, in parentheses after the name of the function.

Your current pseudocode appears to be for swapping the items at start and end, which is not what your example shows. If you did want to do this, you could do:

def swap_items(lst, index1, index2):
    lst[index1], lst[index2] = lst[index2], lst[index1]

An additional note on your first approach: if you add extra items into the list you will throw off the indexing for the rest of the list. For example, try this test implementation of your pseudocode:

def test(lst, start, end):
    lst.insert(start+1, lst[start])
    lst.insert(end+1, lst[end])
    print(lst)
    lst[start] = lst[end+1]
    lst[end] = lst[start+1]
    print(lst)
    del lst[start+1]
    del lst[end+1]
    print(lst)

The result this gives is not the [5, 2, 3, 4, 1] you are expecting:

>>> test([1, 2, 3, 4, 5], 0, 4)
[1, 1, 2, 3, 4, 4, 5] # after insert
[4, 1, 2, 3, 1, 4, 5] # after swap
[4, 2, 3, 1, 4] # after del

Instead, you would have to do the insert and swap on end+2 and end+1 to account for the extra item from the insert at start+1:

def test(lst, start, end):
    lst.insert(start+1, lst[start])
    lst.insert(end+2, lst[end+1])
    lst[start], lst[end+1] = lst[end+2], lst[start+1]
    del lst[start+1]
    del lst[end+1]
qid & accept id: (22509908, 22510173) query: Checking if function was not called for x amount of time soup:

You could embed the last called time in the function definition:

\n
def myfun():\n  myfun.last_called = datetime.now()\n  # … do things\n
\n

From this point it should be easy to tell when the function was called. Each time it's called it will update its last_called timestamp.

\n

A more general approach would be to define a function decorator to attach the property:

\n
def remembercalltimes(f, *args, **kwargs):\n    """A decorator to help a function remember when it was last called."""\n    def inner(*args, **kwargs):\n        inner.last_called = datetime.now()\n        return f(*args, **kwargs)\n    return inner\n\n@remembercalltimes\ndef myfun():\n    # … do things\n\n>>> myfun()\n>>> myfun.last_called\n>>> datetime.datetime(2014, 3, 19, 11, 47, 5, 784833)\n
\n soup wrap:

You could embed the last called time in the function definition:

def myfun():
  myfun.last_called = datetime.now()
  # … do things

From this point it should be easy to tell when the function was called. Each time it's called it will update its last_called timestamp.

A more general approach would be to define a function decorator to attach the property:

def remembercalltimes(f, *args, **kwargs):
    """A decorator to help a function remember when it was last called."""
    def inner(*args, **kwargs):
        inner.last_called = datetime.now()
        return f(*args, **kwargs)
    return inner

@remembercalltimes
def myfun():
    # … do things

>>> myfun()
>>> myfun.last_called
>>> datetime.datetime(2014, 3, 19, 11, 47, 5, 784833)
qid & accept id: (22518000, 22522023) query: Calculate number of jumps in Dijkstra's algorithm? soup:

Following code can speedup 4x on my PC, it's faster because:

\n
    \n
  • use ndarray.item() to get values from array.
  • \n
  • use set object to save unprocessed index.
  • \n
  • don't create numpy.arange() in the while loop.
  • \n
\n

Python code:

\n
def dijkway2(dijkpredmat, i, j):\n    wayarr = []\n    while (i != j) & (j >= 0):\n        wayarr.append(j)\n        j = dijkpredmat.item(i,j)\n    return wayarr\n\ndef jumpvec2(pmat,node):\n    jumps = np.zeros(len(pmat))\n    jumps[node] = -999\n    todo = set()\n    for i in range(len(pmat)):\n        if i != node:\n            todo.add(i)\n\n    indexs = np.arange(len(pmat), 0, -1)\n    while todo:\n        r = todo.pop()\n        dway = dijkway2(pmat, node, r)\n        jumps[dway] = indexs[-len(dway):]\n        todo -= set(dway)\n    return jumps\n
\n

To speedup even more, you can use cython:

\n
import numpy as np\ncimport numpy as np\nimport cython\n\n@cython.wraparound(False)\n@cython.boundscheck(False)\ncpdef dijkway3(int[:, ::1] m, int i, int j):\n    cdef list wayarr = []\n    while (i != j) & (j >= 0):\n        wayarr.append(j)\n        j = m[i,j]\n    return wayarr\n\n@cython.wraparound(False)\n@cython.boundscheck(False)\ndef jumpvec3(int[:, ::1] pmat, int node):\n    cdef np.ndarray jumps\n    cdef int[::1] jumps_buf\n    cdef int i, j, r, n\n    cdef list dway\n    jumps = np.zeros(len(pmat), int)\n    jumps_buf = jumps\n    jumps[node] = -999\n\n    for i in range(len(jumps)):\n        if jumps_buf[i] != 0:\n            continue\n        r = i\n        dway = dijkway3(pmat, node, r)\n        n = len(dway)\n        for j in range(n):\n            jumps_buf[dway[j]] = n - j\n    return jumps\n
\n

Here is my test, the cython version is 80x faster:

\n
%timeit jumpvec3(pmat,1)\n%timeit jumpvec2(pmat, 1)\n%timeit jumpvec(pmat, 1)\n
\n

output:

\n
1000 loops, best of 3: 138 µs per loop\n100 loops, best of 3: 2.81 ms per loop\n100 loops, best of 3: 10.8 ms per loop\n
\n soup wrap:

Following code can speedup 4x on my PC, it's faster because:

  • use ndarray.item() to get values from array.
  • use set object to save unprocessed index.
  • don't create numpy.arange() in the while loop.

Python code:

def dijkway2(dijkpredmat, i, j):
    wayarr = []
    while (i != j) & (j >= 0):
        wayarr.append(j)
        j = dijkpredmat.item(i,j)
    return wayarr

def jumpvec2(pmat,node):
    jumps = np.zeros(len(pmat))
    jumps[node] = -999
    todo = set()
    for i in range(len(pmat)):
        if i != node:
            todo.add(i)

    indexs = np.arange(len(pmat), 0, -1)
    while todo:
        r = todo.pop()
        dway = dijkway2(pmat, node, r)
        jumps[dway] = indexs[-len(dway):]
        todo -= set(dway)
    return jumps

To speedup even more, you can use cython:

import numpy as np
cimport numpy as np
import cython

@cython.wraparound(False)
@cython.boundscheck(False)
cpdef dijkway3(int[:, ::1] m, int i, int j):
    cdef list wayarr = []
    while (i != j) & (j >= 0):
        wayarr.append(j)
        j = m[i,j]
    return wayarr

@cython.wraparound(False)
@cython.boundscheck(False)
def jumpvec3(int[:, ::1] pmat, int node):
    cdef np.ndarray jumps
    cdef int[::1] jumps_buf
    cdef int i, j, r, n
    cdef list dway
    jumps = np.zeros(len(pmat), int)
    jumps_buf = jumps
    jumps[node] = -999

    for i in range(len(jumps)):
        if jumps_buf[i] != 0:
            continue
        r = i
        dway = dijkway3(pmat, node, r)
        n = len(dway)
        for j in range(n):
            jumps_buf[dway[j]] = n - j
    return jumps

Here is my test, the cython version is 80x faster:

%timeit jumpvec3(pmat,1)
%timeit jumpvec2(pmat, 1)
%timeit jumpvec(pmat, 1)

output:

1000 loops, best of 3: 138 µs per loop
100 loops, best of 3: 2.81 ms per loop
100 loops, best of 3: 10.8 ms per loop
qid & accept id: (22520948, 22561824) query: How to view stdout of script run within automator soup:

I am not at my Mac, so this is untested but I am pretty sure it can be made to work with minor edits...

\n

Change your script so it looks like this and your python script runs in the background saving its output to a temporary file $$.tmp where $$ is your process id (pid):

\n
export PATH=${PATH}:/usr/local/bin:/usr/local/CrossPack-AVR/bin\ncd /Applications/MyApp\n/Applications/MyApp/doIt.py "$1" > $$.tmp &\n
\n

Now add the following lines at the end, so that you 1) create a script that tails your log file, 2) make it executable and 3) you execute it:

\n
echo "tail -f $$.tmp" > x.command\nchmod +x x.command\nopen x.command\n
\n

I would recommend renaming x.command as $$.command so that you can run it multiple times from multiple users without interactions between the various users. You should also clean up and delete the temporary files after use.

\n soup wrap:

I am not at my Mac, so this is untested but I am pretty sure it can be made to work with minor edits...

Change your script so it looks like this and your python script runs in the background saving its output to a temporary file $$.tmp where $$ is your process id (pid):

export PATH=${PATH}:/usr/local/bin:/usr/local/CrossPack-AVR/bin
cd /Applications/MyApp
/Applications/MyApp/doIt.py "$1" > $$.tmp &

Now add the following lines at the end, so that you 1) create a script that tails your log file, 2) make it executable and 3) you execute it:

echo "tail -f $$.tmp" > x.command
chmod +x x.command
open x.command

I would recommend renaming x.command as $$.command so that you can run it multiple times from multiple users without interactions between the various users. You should also clean up and delete the temporary files after use.

qid & accept id: (22534983, 22535227) query: Get Nodes from xml by specifying limit soup:

The idea is to find all facultyMember items and use python's list slicing:

\n
from xml.etree import ElementTree as ET\n\n\ndata = """\n    \n        A\n    \n    \n        B\n    \n    \n        C\n    \n    \n        D\n    \n    \n        E\n    \n    \n        F\n    \n    \n        G\n    \n    \n        H\n    \n\n"""\n\ntree = ET.fromstring(data)\nbegin, end = 3, 6\n\nfor element in tree.findall('.//facultyMember')[begin - 1: end]:\n    print ET.tostring(element).strip()\n
\n

prints:

\n
\n        C\n    \n\n        D\n    \n\n        E\n    \n\n        F\n    \n
\n soup wrap:

The idea is to find all facultyMember items and use python's list slicing:

from xml.etree import ElementTree as ET


data = """
    
        A
    
    
        B
    
    
        C
    
    
        D
    
    
        E
    
    
        F
    
    
        G
    
    
        H
    

"""

tree = ET.fromstring(data)
begin, end = 3, 6

for element in tree.findall('.//facultyMember')[begin - 1: end]:
    print ET.tostring(element).strip()

prints:


        C
    

        D
    

        E
    

        F
    
qid & accept id: (22535316, 22547541) query: How to log in to a website with urllib? soup:

The site is using a JSESSIONID cookie to create session since HTTP requests are stateless. When you're making your request, you're not getting that session id first.

\n

I sniffed a session to log into that site using Fiddler and found that the POST is made to a different URL, but it has that JSESSIONID cookie set. So you need to make a get to the URL first, capture that cookie using the cookiehandler, then POST to this URL:

\n
post_url = 'http://www.broadinstitute.org/cmap/j_security_check'\n
\n

You don't need to save the HTTP GET request at all, you can simply call opener.open(url), then in your code change the response line to this:

\n
response = opener.open(post_url, binary_data)\n
\n

Also the payload was missing the submit method. Here's the whole thing with the changes I suggest:

\n
import http.cookiejar\nimport urllib\n\nget_url = 'http://www.broadinstitute.org/cmap/index.jsp'\npost_url = 'http://www.broadinstitute.org/cmap/j_security_check'\n\nvalues = urllib.parse.urlencode({'j_username': ,\n          'j_password': ,\n          'submit': 'sign in'})\npayload = bytes(values, 'ascii')\ncj = http.cookiejar.CookieJar()\nopener = urllib.request.build_opener(\n    urllib.request.HTTPRedirectHandler(),\n    urllib.request.HTTPHandler(debuglevel=0),\n    urllib.request.HTTPSHandler(debuglevel=0),\n    urllib.request.HTTPCookieProcessor(cj))\n\nopener.open(get_url) #First call to capture the JSESSIONID\nresp = opener.open(post_url, payload)\nresp_html = resp.read()\nresp_headers = resp.info()\n
\n

Any other requests using the opener you created will re-use that cookie and you should be able to freely navigate the site.

\n soup wrap:

The site is using a JSESSIONID cookie to create session since HTTP requests are stateless. When you're making your request, you're not getting that session id first.

I sniffed a session to log into that site using Fiddler and found that the POST is made to a different URL, but it has that JSESSIONID cookie set. So you need to make a get to the URL first, capture that cookie using the cookiehandler, then POST to this URL:

post_url = 'http://www.broadinstitute.org/cmap/j_security_check'

You don't need to save the HTTP GET request at all, you can simply call opener.open(url), then in your code change the response line to this:

response = opener.open(post_url, binary_data)

Also the payload was missing the submit method. Here's the whole thing with the changes I suggest:

import http.cookiejar
import urllib

get_url = 'http://www.broadinstitute.org/cmap/index.jsp'
post_url = 'http://www.broadinstitute.org/cmap/j_security_check'

values = urllib.parse.urlencode({'j_username': ,
          'j_password': ,
          'submit': 'sign in'})
payload = bytes(values, 'ascii')
cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(
    urllib.request.HTTPRedirectHandler(),
    urllib.request.HTTPHandler(debuglevel=0),
    urllib.request.HTTPSHandler(debuglevel=0),
    urllib.request.HTTPCookieProcessor(cj))

opener.open(get_url) #First call to capture the JSESSIONID
resp = opener.open(post_url, payload)
resp_html = resp.read()
resp_headers = resp.info()

Any other requests using the opener you created will re-use that cookie and you should be able to freely navigate the site.

qid & accept id: (22567247, 22567439) query: Need help detecting a change in a variable outside of a while loop soup:

I haven't dug through your code yet, but from your description I assume your pseudo-code is like this:

\n
if GO_TO_MAX_SPEED_CONDITION:\n    while NOT_AT_MAX_SPEED:\n        ACCELERATE\n
\n

I'd suggest changing your strategy to be like this:

\n
if GO_TO_MAX_SPEED_CONDITION:\n    GO_TO_MAX_SPEED = True\nif STOP_GOING_TO_MAX_SPEED_CONDITION:\n    GO_TO_MAX_SPEED = False\n
\n

then at each iteration of your program, you'd have something like this:

\n
if GO_TO_MAX_SPEED and NOT_AT_MAX_SPEED:\n    ACCELERATE\n
\n soup wrap:

I haven't dug through your code yet, but from your description I assume your pseudo-code is like this:

if GO_TO_MAX_SPEED_CONDITION:
    while NOT_AT_MAX_SPEED:
        ACCELERATE

I'd suggest changing your strategy to be like this:

if GO_TO_MAX_SPEED_CONDITION:
    GO_TO_MAX_SPEED = True
if STOP_GOING_TO_MAX_SPEED_CONDITION:
    GO_TO_MAX_SPEED = False

then at each iteration of your program, you'd have something like this:

if GO_TO_MAX_SPEED and NOT_AT_MAX_SPEED:
    ACCELERATE
qid & accept id: (22576924, 22577415) query: building reusable package in django soup:

I believe that once the "from django.conf import settings" line has executed, settings are effectively immutable.

\n

What I would do is invert the logic a bit. In PACKAGE/__init__.py. Something like:

\n
def get_apps():\n    apps = (\n        'apps.store',\n        'apps.other',\n        ...\n    )\n    return [__name__ + '.' + x for x in apps]\n
\n

Then just:

\n
INSTALLED_APPS += get_apps()\n
\n

in settings.py. I do this quite a bit to keep our settings.py manageable and it seems to work quite well.

\n soup wrap:

I believe that once the "from django.conf import settings" line has executed, settings are effectively immutable.

What I would do is invert the logic a bit. In PACKAGE/__init__.py. Something like:

def get_apps():
    apps = (
        'apps.store',
        'apps.other',
        ...
    )
    return [__name__ + '.' + x for x in apps]

Then just:

INSTALLED_APPS += get_apps()

in settings.py. I do this quite a bit to keep our settings.py manageable and it seems to work quite well.

qid & accept id: (22591297, 22591558) query: Run same test on multiple datasets soup:

Use params as you mentioned:

\n
@pytest.fixture(scope='module', params=[load_dataset1, load_dataset2])\ndef data(request):\n    loader = request.param\n    dataset = loader()\n    return dataset\n
\n

Use fixture finalization if you want to do fixture specific finalization:

\n
@pytest.fixture(scope='module', params=[load_dataset1, load_dataset2])\ndef data(request):\n    loader = request.param\n    dataset = loader()\n    def fin():\n        # finalize dataset-related resource\n        pass\n    request.addfinalizer(fin)\n    return dataset\n
\n soup wrap:

Use params as you mentioned:

@pytest.fixture(scope='module', params=[load_dataset1, load_dataset2])
def data(request):
    loader = request.param
    dataset = loader()
    return dataset

Use fixture finalization if you want to do fixture specific finalization:

@pytest.fixture(scope='module', params=[load_dataset1, load_dataset2])
def data(request):
    loader = request.param
    dataset = loader()
    def fin():
        # finalize dataset-related resource
        pass
    request.addfinalizer(fin)
    return dataset
qid & accept id: (22616944, 22617146) query: Convert Country Names to Country Code using Python DictReader/DictWriter soup:

Iterate over pycountry.countries and initialize a mapping name -> short name (alpha2, or alpha3):

\n
mapping = {country.name: country.alpha2 for country in pycountry.countries}\nfor column in csv_file:\n    print column['name'], mapping.get(column['name'], 'No country found')\n
\n

For the file containing:

\n
name\nKazakhstan\nUkraine\n
\n

it prints:

\n
Kazakhstan KZ\nUkraine UA\n
\n soup wrap:

Iterate over pycountry.countries and initialize a mapping name -> short name (alpha2, or alpha3):

mapping = {country.name: country.alpha2 for country in pycountry.countries}
for column in csv_file:
    print column['name'], mapping.get(column['name'], 'No country found')

For the file containing:

name
Kazakhstan
Ukraine

it prints:

Kazakhstan KZ
Ukraine UA
qid & accept id: (22649851, 22649904) query: Best way of removing single newlines but keeping multiple newlines soup:
>>> re.sub('(?
\n

This looks for \r?\n or \n?\r and uses lookbehind and lookahead assertions to prevent there from being a newline on either side.

\n

For what it's worth, there are three types of line endings found in the wild:

\n
    \n
  1. \n on Linux, Mac OS X, and other Unices
  2. \n
  3. \r\n on Windows, and in the HTTP protocol
  4. \n
  5. \r on Mac OS 9 and earlier
  6. \n
\n

The first two are by far the most common. If you want to limit the possibilities to just those three, you could do:

\n
>>> re.sub('(?
\n

And of course, get rid of the |\r if you don't care about Mac line endings, which are rare.

\n soup wrap:
>>> re.sub('(?

This looks for \r?\n or \n?\r and uses lookbehind and lookahead assertions to prevent there from being a newline on either side.

For what it's worth, there are three types of line endings found in the wild:

  1. \n on Linux, Mac OS X, and other Unices
  2. \r\n on Windows, and in the HTTP protocol
  3. \r on Mac OS 9 and earlier

The first two are by far the most common. If you want to limit the possibilities to just those three, you could do:

>>> re.sub('(?

And of course, get rid of the |\r if you don't care about Mac line endings, which are rare.

qid & accept id: (22657498, 22720439) query: I want to choose the Transport Layer Security protocol in urllib2 soup:

Update: Since Python 2.7.9 you could pass SSLContext that specifies TLS protocol to urlopen() function:

\n
import ssl\nimport urllib2\n\ncontext = ssl.SSLContext(ssl.PROTOCOL_TLSv1)\n# other settings (see ssl.create_default_context() implementation)\nurllib2.urlopen('https://example.com', context=context).close()\n
\n
\n

old answer:

\n

httplib.HTTPSConnection and urllib2.HTTPSHandler do not allow to change ssl version but\nssl.wrap_socket() does.

\n

You could define your own HTTPSHandler that would allow you to pass arbitrary arguments to ssl.wrap_socket() e.g., urllib2_ssl.py:

\n
>>> import ssl\n>>> import urllib2\n>>> import urllib2_ssl # https://gist.github.com/zed/1347055\n>>> opener = urllib2.build_opener(urllib2_ssl.HTTPSHandler(\n...     ssl_version=ssl.PROTOCOL_TLSv1, #XXX you need to modify urllib2_ssl\n...     ca_certs='cacert.pem')) # http://curl.haxx.se/ca/cacert.pem.bz2\n>>> opener.open('https://example.com/').read()\n
\n soup wrap:

Update: Since Python 2.7.9 you could pass SSLContext that specifies TLS protocol to urlopen() function:

import ssl
import urllib2

context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
# other settings (see ssl.create_default_context() implementation)
urllib2.urlopen('https://example.com', context=context).close()

old answer:

httplib.HTTPSConnection and urllib2.HTTPSHandler do not allow to change ssl version but ssl.wrap_socket() does.

You could define your own HTTPSHandler that would allow you to pass arbitrary arguments to ssl.wrap_socket() e.g., urllib2_ssl.py:

>>> import ssl
>>> import urllib2
>>> import urllib2_ssl # https://gist.github.com/zed/1347055
>>> opener = urllib2.build_opener(urllib2_ssl.HTTPSHandler(
...     ssl_version=ssl.PROTOCOL_TLSv1, #XXX you need to modify urllib2_ssl
...     ca_certs='cacert.pem')) # http://curl.haxx.se/ca/cacert.pem.bz2
>>> opener.open('https://example.com/').read()
qid & accept id: (22668427, 22668590) query: How do you read in a text (.txt) file as a .py file in Python 2.7? soup:

Looks like what you need is a JSON file.

\n

Example: consider you have a source.txt with the following contents:

\n
{"hello": "world"}\n
\n

Then, in your python script, you can load the JSON data structure into the python dictionary by using json.load():

\n
import json \n\nwith open('source.txt', 'rb') as f:\n    print json.load(f)\n
\n

prints:

\n
{u'hello': u'world'}\n
\n
\n

You can also use exec(), but I don't really recommend it. Here's an example just for educational purposes:

\n

source.txt:

\n
d = {"hello": "world"}\n
\n

your script:

\n
with open('test.txt', 'rb') as f:\n    exec(f)\n    print d\n
\n

prints:

\n
{'hello': 'world'}\n
\n

Hope that helps.

\n soup wrap:

Looks like what you need is a JSON file.

Example: consider you have a source.txt with the following contents:

{"hello": "world"}

Then, in your python script, you can load the JSON data structure into the python dictionary by using json.load():

import json 

with open('source.txt', 'rb') as f:
    print json.load(f)

prints:

{u'hello': u'world'}

You can also use exec(), but I don't really recommend it. Here's an example just for educational purposes:

source.txt:

d = {"hello": "world"}

your script:

with open('test.txt', 'rb') as f:
    exec(f)
    print d

prints:

{'hello': 'world'}

Hope that helps.

qid & accept id: (22672340, 22672656) query: (Python) Formatting strings for struct.unpack? soup:

How about using bytearray:

\n
bytearray([222,7])\nOut[15]: bytearray(b'\xde\x07')\n\nstruct.unpack('H', bytearray([222,7]))\nOut[16]: (2014,)\n
\n

In your case, coming from a csv reader, something like

\n
row = ['222','7']\n\nstruct.unpack('H', bytearray(map(int,row)))\nOut[24]: (2014,)\n
\n soup wrap:

How about using bytearray:

bytearray([222,7])
Out[15]: bytearray(b'\xde\x07')

struct.unpack('H', bytearray([222,7]))
Out[16]: (2014,)

In your case, coming from a csv reader, something like

row = ['222','7']

struct.unpack('H', bytearray(map(int,row)))
Out[24]: (2014,)
qid & accept id: (22674166, 22674670) query: Store all stdout to file while still displaying it on screen soup:

Wrappers is sexy

\n
import sys\n\nclass Logger(file):\n    def __init__(self,*a,**kw):\n        # copy original stdout to instance\n        self.stdout = sys.stdout\n        return super(Logger,self).__init__(*a,**kw)\n\n    def write(self,data):\n        self.stdout.write(data) # to screen\n        return super(Logger,self).write(data) #to file\n\n    def writelines(self,lines):\n        for line in lines: self.write(line)\n\n    def close(self):\n        # return it back\n        sys.stdout = self.stdout\n\n\n\nsome_list = ['elem1', 'elem2']\n\nfor elem in some_list:\n    with Logger("/tmp/1/{}.log".format(elem), "w") as sys.stdout:\n    # Do lots of stuff that print messages.\n        print 'lots of stuff for', elem\n\n\n\nprint 'Code finished'\n
\n

Result

\n
$ python2 out.py \nCode finished\n$ ls\nelem1.log  elem2.log  out.py\n
\n

Cool side effect:

\n
print 'this on screen'\n\nwith Logger("/tmp/1/main.log", "w") as sys.stdout:\n     print 'this on screen an in main.log'\n\n     with Logger("/tmp/1/sub.log", "w") as sys.stdout:\n          print 'this on screen, in man.log and in sub.log'\n\nprint 'only on screen again'\n
\n soup wrap:

Wrappers is sexy

import sys

class Logger(file):
    def __init__(self,*a,**kw):
        # copy original stdout to instance
        self.stdout = sys.stdout
        return super(Logger,self).__init__(*a,**kw)

    def write(self,data):
        self.stdout.write(data) # to screen
        return super(Logger,self).write(data) #to file

    def writelines(self,lines):
        for line in lines: self.write(line)

    def close(self):
        # return it back
        sys.stdout = self.stdout



some_list = ['elem1', 'elem2']

for elem in some_list:
    with Logger("/tmp/1/{}.log".format(elem), "w") as sys.stdout:
    # Do lots of stuff that print messages.
        print 'lots of stuff for', elem



print 'Code finished'

Result

$ python2 out.py 
Code finished
$ ls
elem1.log  elem2.log  out.py

Cool side effect:

print 'this on screen'

with Logger("/tmp/1/main.log", "w") as sys.stdout:
     print 'this on screen an in main.log'

     with Logger("/tmp/1/sub.log", "w") as sys.stdout:
          print 'this on screen, in man.log and in sub.log'

print 'only on screen again'
qid & accept id: (22693260, 22694490) query: append csv files on column basis soup:

Using join command:

\n
$ join -t\; -j 1 file1 file2 | sed 's/;;/;/g'\nDATE;BS-ICI,NSA,BAL,AT;BS-ICI,NSA,BAL,BE;BS-BYL,NSA,BAL,AT;BS-NAN,NSA,BAL,BE;\n2014M02;0.9;1.5;1.5;6.7;\n2014M01;-5.4;-4.4;-8.8;-4.4;\n2013M11;-7.9;-9.2;-2.5;-9.6;\n2013M10;-8.6;-14.0;-8.9;-11.4;\n
\n

or if you don't want to pipe to sed, you can do (a little more verbose) by setting the output format:

\n
$ join -t\; -j 1 -o 1.1 1.2 1.3 2.2 2.3 2.4 file1 file2 \nDATE;BS-ICI,NSA,BAL,AT;BS-ICI,NSA,BAL,BE;BS-BYL,NSA,BAL,AT;BS-NAN,NSA,BAL,BE;\n2014M02;0.9;1.5;1.5;6.7;\n2014M01;-5.4;-4.4;-8.8;-4.4;\n2013M11;-7.9;-9.2;-2.5;-9.6;\n2013M10;-8.6;-14.0;-8.9;-11.4;\n
\n soup wrap:

Using join command:

$ join -t\; -j 1 file1 file2 | sed 's/;;/;/g'
DATE;BS-ICI,NSA,BAL,AT;BS-ICI,NSA,BAL,BE;BS-BYL,NSA,BAL,AT;BS-NAN,NSA,BAL,BE;
2014M02;0.9;1.5;1.5;6.7;
2014M01;-5.4;-4.4;-8.8;-4.4;
2013M11;-7.9;-9.2;-2.5;-9.6;
2013M10;-8.6;-14.0;-8.9;-11.4;

or if you don't want to pipe to sed, you can do (a little more verbose) by setting the output format:

$ join -t\; -j 1 -o 1.1 1.2 1.3 2.2 2.3 2.4 file1 file2 
DATE;BS-ICI,NSA,BAL,AT;BS-ICI,NSA,BAL,BE;BS-BYL,NSA,BAL,AT;BS-NAN,NSA,BAL,BE;
2014M02;0.9;1.5;1.5;6.7;
2014M01;-5.4;-4.4;-8.8;-4.4;
2013M11;-7.9;-9.2;-2.5;-9.6;
2013M10;-8.6;-14.0;-8.9;-11.4;
qid & accept id: (22694038, 22695279) query: SQLAlchemy (sql) conditional query soup:

sounds like you want COALESCE(), which takes several arguments and returns the first which is not null (or null if all args are null)

\n

Presuming a reasonable setup:

\n
import sqlalchemy as sa\nfrom sqlalchemy.ext.declarative import declarative_base\n\nBase = declarative_base()\n\nclass Product(Base):\n    __tablename__ = "product"\n    id = sa.Column(sa.Integer, primary_key=True)\n    actual = sa.Column(sa.String)\n    target = sa.Column(sa.String)\n
\n

use sqlalchemy.func.coalesce():

\n
>>> print session.query(sa.func.coalesce(Product.target, Product.actual).label('x'))\nSELECT coalesce(product.target, product.actual) AS x \nFROM product\n>>> session.query(sa.func.coalesce(Product.target, Product.actual).label('x')).all()\n[(u'p12'), (u'h20'), (u'p16'), (u'p16'), (u'p16'), (u'p16'), (u'p16')]\n
\n
\n

edit: if your missing values are not null, but some other value, you should use a CASE expression.

\n
>>> print session.query(sa.case([(Product.target == '', Product.actual)], else_=Product.target))\nSELECT CASE WHEN (product.target = :target_1) THEN product.actual ELSE product.target END AS anon_1 \nFROM product\n
\n soup wrap:

sounds like you want COALESCE(), which takes several arguments and returns the first which is not null (or null if all args are null)

Presuming a reasonable setup:

import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Product(Base):
    __tablename__ = "product"
    id = sa.Column(sa.Integer, primary_key=True)
    actual = sa.Column(sa.String)
    target = sa.Column(sa.String)

use sqlalchemy.func.coalesce():

>>> print session.query(sa.func.coalesce(Product.target, Product.actual).label('x'))
SELECT coalesce(product.target, product.actual) AS x 
FROM product
>>> session.query(sa.func.coalesce(Product.target, Product.actual).label('x')).all()
[(u'p12'), (u'h20'), (u'p16'), (u'p16'), (u'p16'), (u'p16'), (u'p16')]

edit: if your missing values are not null, but some other value, you should use a CASE expression.

>>> print session.query(sa.case([(Product.target == '', Product.actual)], else_=Product.target))
SELECT CASE WHEN (product.target = :target_1) THEN product.actual ELSE product.target END AS anon_1 
FROM product
qid & accept id: (22696168, 22696249) query: How to restart a python script after it finishes soup:

You could wrap your script in a

\n
while True:\n    ...\n
\n

block, or with a bash script:

\n
while true ; do\n    yourpythonscript.py\ndone\n
\n soup wrap:

You could wrap your script in a

while True:
    ...

block, or with a bash script:

while true ; do
    yourpythonscript.py
done
qid & accept id: (22700457, 22700510) query: Sort Python Dictionary by first four characters in Key soup:

Use a sorting key:

\n
sorted(yourdict, key=lambda k: int(k.split()[0]))\n
\n

This returns a list of keys, sorted numerically on the first part of the key (split on whitespace).

\n

Demo:

\n
>>> yourdict = {'666 -> 999': 4388, '4000 -> 4332': 4383, '1333 -> 1665': 7998, '5666 -> 5999': 4495, '3666 -> 3999': 6267, '3000 -> 3332': 9753, '6333 -> 6665': 7966, '0 -> 332': 877}\n>>> sorted(yourdict, key=lambda k: int(k.split()[0]))\n['0 -> 332', '666 -> 999', '1333 -> 1665', '3000 -> 3332', '3666 -> 3999', '4000 -> 4332', '5666 -> 5999', '6333 -> 6665']\n
\n

Sorting both keys and values together:

\n
sorted(yourdict.items(), key=lambda item: int(item[0].split()[0]))\n
\n

This produces key-value pairs:

\n
>>> sorted(yourdict.items(), key=lambda item: int(item[0].split()[0]))\n[('0 -> 332', 877), ('666 -> 999', 4388), ('1333 -> 1665', 7998), ('3000 -> 3332', 9753), ('3666 -> 3999', 6267), ('4000 -> 4332', 4383), ('5666 -> 5999', 4495), ('6333 -> 6665', 7966)]\n
\n

You could produce an collections.OrderedDict() object with that:

\n
>>> from collections import OrderedDict\n>>> OrderedDict(sorted(yourdict.items(), key=lambda item: int(item[0].split()[0])))\nOrderedDict([('0 -> 332', 877), ('666 -> 999', 4388), ('1333 -> 1665', 7998), ('3000 -> 3332', 9753), ('3666 -> 3999', 6267), ('4000 -> 4332', 4383), ('5666 -> 5999', 4495), ('6333 -> 6665', 7966)])\n
\n soup wrap:

Use a sorting key:

sorted(yourdict, key=lambda k: int(k.split()[0]))

This returns a list of keys, sorted numerically on the first part of the key (split on whitespace).

Demo:

>>> yourdict = {'666 -> 999': 4388, '4000 -> 4332': 4383, '1333 -> 1665': 7998, '5666 -> 5999': 4495, '3666 -> 3999': 6267, '3000 -> 3332': 9753, '6333 -> 6665': 7966, '0 -> 332': 877}
>>> sorted(yourdict, key=lambda k: int(k.split()[0]))
['0 -> 332', '666 -> 999', '1333 -> 1665', '3000 -> 3332', '3666 -> 3999', '4000 -> 4332', '5666 -> 5999', '6333 -> 6665']

Sorting both keys and values together:

sorted(yourdict.items(), key=lambda item: int(item[0].split()[0]))

This produces key-value pairs:

>>> sorted(yourdict.items(), key=lambda item: int(item[0].split()[0]))
[('0 -> 332', 877), ('666 -> 999', 4388), ('1333 -> 1665', 7998), ('3000 -> 3332', 9753), ('3666 -> 3999', 6267), ('4000 -> 4332', 4383), ('5666 -> 5999', 4495), ('6333 -> 6665', 7966)]

You could produce an collections.OrderedDict() object with that:

>>> from collections import OrderedDict
>>> OrderedDict(sorted(yourdict.items(), key=lambda item: int(item[0].split()[0])))
OrderedDict([('0 -> 332', 877), ('666 -> 999', 4388), ('1333 -> 1665', 7998), ('3000 -> 3332', 9753), ('3666 -> 3999', 6267), ('4000 -> 4332', 4383), ('5666 -> 5999', 4495), ('6333 -> 6665', 7966)])
qid & accept id: (22727800, 22728205) query: How do I sort objects inside of objects in JSON? (using Python 2.7) soup:

You need to ensure that both json does not escape characters, and you write your json output as unicode:

\n
import codecs\nimport json\n\nwith codecs.open('tmp.json', 'w', encoding='utf-8') as f:\n    f.write(json.dumps({u'hello' : u'привет!'}, ensure_ascii=False) + '\n')\n\n\n$ cat tmp.json\n{"hello": "привет!"}\n
\n

As for your second question: you can use collections.OrderedDict, but you need to be careful to pass it directly to json.dumps without changing it to simple dict. See the difference:

\n
from collections import OrderedDict\ndata = OrderedDict(zip(('first', 'second', 'last'), (1, 10, 3)))\nprint json.dumps(dict(data)) # {"second": 10, "last": 3, "first": 1}\nprint json.dumps(data) # {"first": 1, "second": 10, "last": 3}\n
\n soup wrap:

You need to ensure that both json does not escape characters, and you write your json output as unicode:

import codecs
import json

with codecs.open('tmp.json', 'w', encoding='utf-8') as f:
    f.write(json.dumps({u'hello' : u'привет!'}, ensure_ascii=False) + '\n')


$ cat tmp.json
{"hello": "привет!"}

As for your second question: you can use collections.OrderedDict, but you need to be careful to pass it directly to json.dumps without changing it to simple dict. See the difference:

from collections import OrderedDict
data = OrderedDict(zip(('first', 'second', 'last'), (1, 10, 3)))
print json.dumps(dict(data)) # {"second": 10, "last": 3, "first": 1}
print json.dumps(data) # {"first": 1, "second": 10, "last": 3}
qid & accept id: (22739971, 22739988) query: Organize Numerically Descending then Alphabetically Ascending by Different Elements in a List Python soup:

Return a tuple, and negate the number instead of using reverse:

\n
list_name.sort(key=lambda x: (-x[2],) + x[:2])\n
\n

This returns (-item3, item1, item2) and sorting takes place first by the integer item3 in descending order, when tied on the number sorting is done on item1 (alphabetically, ascending order), then on item2.

\n

In effect, tuples are sorted in lexicographical order.

\n

Demo:

\n
>>> list_name = [('ABC', 'DEF', 2), ('GHI', 'JKL', 6), ('MNO', 'PQR', 22), ('ABC', 'STU', 2)]\n>>> list_name.sort(key=lambda x: (-x[2],) + x[:2])\n>>> list_name\n[('MNO', 'PQR', 22), ('GHI', 'JKL', 6), ('ABC', 'DEF', 2), ('ABC', 'STU', 2)]\n
\n soup wrap:

Return a tuple, and negate the number instead of using reverse:

list_name.sort(key=lambda x: (-x[2],) + x[:2])

This returns (-item3, item1, item2) and sorting takes place first by the integer item3 in descending order, when tied on the number sorting is done on item1 (alphabetically, ascending order), then on item2.

In effect, tuples are sorted in lexicographical order.

Demo:

>>> list_name = [('ABC', 'DEF', 2), ('GHI', 'JKL', 6), ('MNO', 'PQR', 22), ('ABC', 'STU', 2)]
>>> list_name.sort(key=lambda x: (-x[2],) + x[:2])
>>> list_name
[('MNO', 'PQR', 22), ('GHI', 'JKL', 6), ('ABC', 'DEF', 2), ('ABC', 'STU', 2)]
qid & accept id: (22741040, 22741056) query: How to make unique combinations of the following list of tuples soup:

You can use itertools.product like this

\n
from itertools import product\nnames = [('Bob', 'Tom'), ('GreenWood', 'Pearson')]\nfor item in product(*names):\n    print(item)\n
\n

Output

\n
('Bob', 'GreenWood')\n('Bob', 'Pearson')\n('Tom', 'GreenWood')\n('Tom', 'Pearson')\n
\n

If you wanted to print the possible names as string, then you can join the result like this

\n
print(" ".join(item))\n
\n

This will produce

\n
Bob GreenWood\nBob Pearson\nTom GreenWood\nTom Pearson\n
\n soup wrap:

You can use itertools.product like this

from itertools import product
names = [('Bob', 'Tom'), ('GreenWood', 'Pearson')]
for item in product(*names):
    print(item)

Output

('Bob', 'GreenWood')
('Bob', 'Pearson')
('Tom', 'GreenWood')
('Tom', 'Pearson')

If you wanted to print the possible names as string, then you can join the result like this

print(" ".join(item))

This will produce

Bob GreenWood
Bob Pearson
Tom GreenWood
Tom Pearson
qid & accept id: (22750555, 22838643) query: python mock patch top level packages soup:
import unittest.mock as mock\n\nmock_argparse = mock.Mock()\nwith mock.patch.dict('sys.modules', argparse=mock_argparse):\n    import argparse\n    print(argparse.ArgumentParser()) \n# \n
\n

As for mock_open patching:

\n
m = mock_open()\nwith patch('__main__.open', m, create=True):\n
\n

It seems like it creates __main__.open attribute with mock object which shadows the built-in version as if you defined open() function in your module. I suppose for the actual tests you should patch module_x.open() where "module_x" is the module that actually calls open().

\n soup wrap:
import unittest.mock as mock

mock_argparse = mock.Mock()
with mock.patch.dict('sys.modules', argparse=mock_argparse):
    import argparse
    print(argparse.ArgumentParser()) 
# 

As for mock_open patching:

m = mock_open()
with patch('__main__.open', m, create=True):

It seems like it creates __main__.open attribute with mock object which shadows the built-in version as if you defined open() function in your module. I suppose for the actual tests you should patch module_x.open() where "module_x" is the module that actually calls open().

qid & accept id: (22769503, 22769632) query: Selecting text nodes with text not equal to a value soup:

Use //encryption[text()!="WPA-PSK"]/text() xpath:

\n
from lxml import etree\n\ndata = """\n\n    \n        \n            WEP\n        \n    \n    \n        \n            WPA-PSK\n        \n    \n    \n        \n            WPA2-PSK\n        \n    \n\n"""\n\nroot = etree.fromstring(data)\nprint root.xpath('//encryption[text()!="WPA-PSK"]/text()')\n
\n

prints:

\n
['WEP', 'WPA2-PSK']\n
\n soup wrap:

Use //encryption[text()!="WPA-PSK"]/text() xpath:

from lxml import etree

data = """

    
        
            WEP
        
    
    
        
            WPA-PSK
        
    
    
        
            WPA2-PSK
        
    

"""

root = etree.fromstring(data)
print root.xpath('//encryption[text()!="WPA-PSK"]/text()')

prints:

['WEP', 'WPA2-PSK']
qid & accept id: (22775168, 22778482) query: Pandas -- how to iterate through a list of dates which filter a DataFrame soup:

I'm not 100% sure I understand what you want but I think you want to a sub dataframe (taken from the number dataframe) for each date in datelist. So in your example you want 7 dataframes created?

\n

If so this is what I would do:

\n

First turn the date column in number (I'll call it df) and datelist into datetime64 columns. I'll assume the date column in numbers is already of type datetime64:

\n
print df\n        date   group  number   \n0 2013-02-01  group1  -0.098765\n1 2013-02-02  group2   0.519878\n2 2013-02-03  group1  -0.098765\n3 2013-02-04  group3   1.960784\n4 2013-02-05  group3   2.859412\n5 2013-02-06  group2   1.960784\n6 2013-02-07  group1  -0.696594\n
\n

And in datelist, I will create a new column that is a datetime64 type (note I changed that dates in datelist so not all of the dates in the number dataframe were less than all the dates in datelist and I made datelist have less observations to reduce the size of the output):

\n
parse = lambda x: datetime(int(x[0]),int(x[1]),int(x[2]))\ndatelist['end'] = datelist['date'].str.split(',').apply(parse)\nprint datelist \n\n        date        end\n0  2013, 2,3 2013-02-03\n1  2013, 2,6 2013-02-06\n2  2013, 3,6 2013-03-06\n3  2013, 3,8 2013-03-08\n
\n

Now, I will just loop the rows of datelist and create a new dataframe each time through out of the rows where date <= end:

\n
pieces = []\nfor idx,rows in datelist[['end']].iterrows():\n  x = df[df['date'] <= rows['end']]\n  x['end'] = rows['end']\n  pieces.append(x)\n\nprint pd.concat(pieces,ignore_index=True)\n\n          date   group  number           end\n0  2013-02-01  group1  -0.098765 2013-02-03\n1  2013-02-02  group2   0.519878 2013-02-03\n2  2013-02-03  group1  -0.098765 2013-02-03\n3  2013-02-01  group1  -0.098765 2013-02-06\n4  2013-02-02  group2   0.519878 2013-02-06\n5  2013-02-03  group1  -0.098765 2013-02-06\n6  2013-02-04  group3   1.960784 2013-02-06\n7  2013-02-05  group3   2.859412 2013-02-06\n8  2013-02-06  group2   1.960784 2013-02-06\n9  2013-02-01  group1  -0.098765 2013-03-06\n10 2013-02-02  group2   0.519878 2013-03-06\n11 2013-02-03  group1  -0.098765 2013-03-06\n12 2013-02-04  group3   1.960784 2013-03-06\n13 2013-02-05  group3   2.859412 2013-03-06\n14 2013-02-06  group2   1.960784 2013-03-06\n15 2013-02-07  group1  -0.696594 2013-03-06\n16 2013-02-01  group1  -0.098765 2013-03-08\n17 2013-02-02  group2   0.519878 2013-03-08\n18 2013-02-03  group1  -0.098765 2013-03-08\n19 2013-02-04  group3   1.960784 2013-03-08\n20 2013-02-05  group3   2.859412 2013-03-08\n21 2013-02-06  group2   1.960784 2013-03-08\n22 2013-02-07  group1  -0.696594 2013-03-08\n
\n

I concatenated the dataframes but you can process them by doing a groupby on 'end'.

\n soup wrap:

I'm not 100% sure I understand what you want but I think you want to a sub dataframe (taken from the number dataframe) for each date in datelist. So in your example you want 7 dataframes created?

If so this is what I would do:

First turn the date column in number (I'll call it df) and datelist into datetime64 columns. I'll assume the date column in numbers is already of type datetime64:

print df
        date   group  number   
0 2013-02-01  group1  -0.098765
1 2013-02-02  group2   0.519878
2 2013-02-03  group1  -0.098765
3 2013-02-04  group3   1.960784
4 2013-02-05  group3   2.859412
5 2013-02-06  group2   1.960784
6 2013-02-07  group1  -0.696594

And in datelist, I will create a new column that is a datetime64 type (note I changed that dates in datelist so not all of the dates in the number dataframe were less than all the dates in datelist and I made datelist have less observations to reduce the size of the output):

parse = lambda x: datetime(int(x[0]),int(x[1]),int(x[2]))
datelist['end'] = datelist['date'].str.split(',').apply(parse)
print datelist 

        date        end
0  2013, 2,3 2013-02-03
1  2013, 2,6 2013-02-06
2  2013, 3,6 2013-03-06
3  2013, 3,8 2013-03-08

Now, I will just loop the rows of datelist and create a new dataframe each time through out of the rows where date <= end:

pieces = []
for idx,rows in datelist[['end']].iterrows():
  x = df[df['date'] <= rows['end']]
  x['end'] = rows['end']
  pieces.append(x)

print pd.concat(pieces,ignore_index=True)

          date   group  number           end
0  2013-02-01  group1  -0.098765 2013-02-03
1  2013-02-02  group2   0.519878 2013-02-03
2  2013-02-03  group1  -0.098765 2013-02-03
3  2013-02-01  group1  -0.098765 2013-02-06
4  2013-02-02  group2   0.519878 2013-02-06
5  2013-02-03  group1  -0.098765 2013-02-06
6  2013-02-04  group3   1.960784 2013-02-06
7  2013-02-05  group3   2.859412 2013-02-06
8  2013-02-06  group2   1.960784 2013-02-06
9  2013-02-01  group1  -0.098765 2013-03-06
10 2013-02-02  group2   0.519878 2013-03-06
11 2013-02-03  group1  -0.098765 2013-03-06
12 2013-02-04  group3   1.960784 2013-03-06
13 2013-02-05  group3   2.859412 2013-03-06
14 2013-02-06  group2   1.960784 2013-03-06
15 2013-02-07  group1  -0.696594 2013-03-06
16 2013-02-01  group1  -0.098765 2013-03-08
17 2013-02-02  group2   0.519878 2013-03-08
18 2013-02-03  group1  -0.098765 2013-03-08
19 2013-02-04  group3   1.960784 2013-03-08
20 2013-02-05  group3   2.859412 2013-03-08
21 2013-02-06  group2   1.960784 2013-03-08
22 2013-02-07  group1  -0.696594 2013-03-08

I concatenated the dataframes but you can process them by doing a groupby on 'end'.

qid & accept id: (22827317, 22861195) query: Changing background of a Button to a different shape and Styles like shadow effect etc in kivy python soup:

Button in kivy starts with a ButtonBehavior which is combined with a Label adding properties like background_normal/down...for handling textures on the canvas.

\n

Knowing this you can simply combine ButtonBehavior with any other widget you choose. Eg.

\n
from kivy.base import runTouchApp\nfrom kivy.lang import Builder\n\nkv = '''\n\n\nFloatLayout:\n    # we don't specify anything here so float layout takes the entire size of the window.\n    ButImage:\n        id: but\n        # take 50% size of the FloatLayout\n        size_hint: .5, .5\n        # Make Button change it's opacity when pressed for visual indication\n        opacity: 1 if self.state == 'normal' else .5\n        source: 'http://www.victoriamorrow.com/sitebuildercontent/sitebuilderpictures/enter_button.gif'\n        # Introduce Label incase you want text on top of the image\n        Label:\n            center: but.center\n            # change text acc to but state\n            text: "Normal" if but.state == 'normal' else 'down'\n'''\n\nif __name__ == '__main__':\n    runTouchApp(Builder.load_string(kv))\n
\n

Here we just set the ButtonBehavior to be combined with a AsyncImage which downloads the image from web for your background.

\n

you should see something like thisscreenshot asyncimage button

\n
\n

Animation affect in background

\n
\n

This would be as simple as changing the source to animated gif or list of images inside a .zip.

\n
from kivy.base import runTouchApp\nfrom kivy.lang import Builder\n\n\nkv = '''\n\n\nFloatLayout:\n    ButImage:\n        id: but\n        size_hint: .5, .5\n        opacity: 1 if self.state == 'normal' else .5\n        allow_stretch: True\n        keep_ratio: False\n        source: 'http://media1.policymic.com/site/article-items/2095/1_gif.gif'\n        Label:\n            center: but.center\n            text: "Normal" if but.state == 'normal' else 'down'\n\n\n'''\n\nif __name__ == '__main__':\n    runTouchApp(Builder.load_string(kv))\n
\n

Look at the sequence images example This was done before ButtonBehaviors were introduced so it even has a example of a AnimatedButton class using the older method which is essentially not needed any more.

\n
\n

Shadow Effect:

\n
\n

There are many ways to do this too.

\n

You could either add a shadow to a widget/layout and have the button on top of this widget/layout take up less space than the shadow so as to account for touch on the shadows.

\n

Or Create your own CustomButtonBehavior class derived from ButtonBehavior that overrides collidepoint method to only return True for custom collision. There is a example of using custom collision for widgets. You could even set the Image's keep_data property to True and later check pixel data for alpha to determine if you want to return true for collision.

\n
\n

Rounded edges etc.

\n
\n

Simply use a image with rounded edges kivy supports use of BorderImage Instruction which is equivalent to css borderimage in terms of functionality. Kivy's own button by default uses this.\nTry and experiment with border attribute of BorderImage.

\n soup wrap:

Button in kivy starts with a ButtonBehavior which is combined with a Label adding properties like background_normal/down...for handling textures on the canvas.

Knowing this you can simply combine ButtonBehavior with any other widget you choose. Eg.

from kivy.base import runTouchApp
from kivy.lang import Builder

kv = '''


FloatLayout:
    # we don't specify anything here so float layout takes the entire size of the window.
    ButImage:
        id: but
        # take 50% size of the FloatLayout
        size_hint: .5, .5
        # Make Button change it's opacity when pressed for visual indication
        opacity: 1 if self.state == 'normal' else .5
        source: 'http://www.victoriamorrow.com/sitebuildercontent/sitebuilderpictures/enter_button.gif'
        # Introduce Label incase you want text on top of the image
        Label:
            center: but.center
            # change text acc to but state
            text: "Normal" if but.state == 'normal' else 'down'
'''

if __name__ == '__main__':
    runTouchApp(Builder.load_string(kv))

Here we just set the ButtonBehavior to be combined with a AsyncImage which downloads the image from web for your background.

you should see something like thisscreenshot asyncimage button

Animation affect in background

This would be as simple as changing the source to animated gif or list of images inside a .zip.

from kivy.base import runTouchApp
from kivy.lang import Builder


kv = '''


FloatLayout:
    ButImage:
        id: but
        size_hint: .5, .5
        opacity: 1 if self.state == 'normal' else .5
        allow_stretch: True
        keep_ratio: False
        source: 'http://media1.policymic.com/site/article-items/2095/1_gif.gif'
        Label:
            center: but.center
            text: "Normal" if but.state == 'normal' else 'down'


'''

if __name__ == '__main__':
    runTouchApp(Builder.load_string(kv))

Look at the sequence images example This was done before ButtonBehaviors were introduced so it even has a example of a AnimatedButton class using the older method which is essentially not needed any more.

Shadow Effect:

There are many ways to do this too.

You could either add a shadow to a widget/layout and have the button on top of this widget/layout take up less space than the shadow so as to account for touch on the shadows.

Or Create your own CustomButtonBehavior class derived from ButtonBehavior that overrides collidepoint method to only return True for custom collision. There is a example of using custom collision for widgets. You could even set the Image's keep_data property to True and later check pixel data for alpha to determine if you want to return true for collision.

Rounded edges etc.

Simply use a image with rounded edges kivy supports use of BorderImage Instruction which is equivalent to css borderimage in terms of functionality. Kivy's own button by default uses this. Try and experiment with border attribute of BorderImage.

qid & accept id: (22856566, 22857440) query: XPATH: If there is element with certain value assume "phone" then get it's sibling value soup:

With scrapy Selector and SelectorList you can use regular expressions via their .re() method:

\n
>>> hxs.xpath('//td[contains(., "Phone")]/following-sibling::td[1]').re(r'(\d[\d ]+\d)')\n[u'020 641512']\n>>> \n
\n

Alternative using the new CSS selectors:

\n
>>> from scrapy.selector import Selector\n>>> selector = Selector(response)\n>>> selector.css('td:contains("Phone") + td').re(r'(\d[\d ]+\d)')\n[u'020 641512']\n>>> \n
\n soup wrap:

With scrapy Selector and SelectorList you can use regular expressions via their .re() method:

>>> hxs.xpath('//td[contains(., "Phone")]/following-sibling::td[1]').re(r'(\d[\d ]+\d)')
[u'020 641512']
>>> 

Alternative using the new CSS selectors:

>>> from scrapy.selector import Selector
>>> selector = Selector(response)
>>> selector.css('td:contains("Phone") + td').re(r'(\d[\d ]+\d)')
[u'020 641512']
>>> 
qid & accept id: (22880882, 22926626) query: How can I get certain text from a website with Python? soup:

You can get all 2 spans with metrics-authority class, first one is a Domain Authority, second one is a Page Authority. Additionally, you can get Root Domains from the div with id="metrics-page-link-metrics":

\n
import urllib2\nfrom lxml import html\n\ntree = html.parse(urllib2.urlopen('http://www.opensiteexplorer.org/links?site=www.google.com'))\n\nspans = tree.xpath('//span[@class="metrics-authority"]')\ndata = [item.text.strip() for item in spans]\nprint "Domain Authority: {0}, Page Authority: {1}".format(*data)\n\ndiv = tree.xpath('//div[@id="metrics-page-link-metrics"]//div[@class="has-tooltip"]')[1]\nprint "Root Domains: {0}".format(div.text.strip())\n
\n

prints:

\n
Domain Authority: 100, Page Authority: 97 \nRoot Domains: 680\n
\n

Hope that helps.

\n soup wrap:

You can get all 2 spans with metrics-authority class, first one is a Domain Authority, second one is a Page Authority. Additionally, you can get Root Domains from the div with id="metrics-page-link-metrics":

import urllib2
from lxml import html

tree = html.parse(urllib2.urlopen('http://www.opensiteexplorer.org/links?site=www.google.com'))

spans = tree.xpath('//span[@class="metrics-authority"]')
data = [item.text.strip() for item in spans]
print "Domain Authority: {0}, Page Authority: {1}".format(*data)

div = tree.xpath('//div[@id="metrics-page-link-metrics"]//div[@class="has-tooltip"]')[1]
print "Root Domains: {0}".format(div.text.strip())

prints:

Domain Authority: 100, Page Authority: 97 
Root Domains: 680

Hope that helps.

qid & accept id: (22893271, 22893355) query: sum two lists element-by-element in python recursively soup:

Here is a recursive implementation

\n
def recursive_sum(l1, l2, idx = 0):\n    if idx < min(len(l1), len(l2)):\n        return [l1[idx] + l2[idx]] + recursive_sum(l1, l2, idx + 1)\n    else:\n        return []\n\nprint recursive_sum([1, 2, 3], [4, 5, 6])\n# [5, 7, 9]\n
\n

Or

\n
def recursive_sum(l1, l2, result = None, idx = 0):\n    if result is None:\n        result = []\n    if idx < min(len(l1), len(l2)):\n        result.append(l1[idx] + l2[idx])\n        return recursive_sum(l1, l2, result, idx + 1)\n    else:\n        return result\n
\n soup wrap:

Here is a recursive implementation

def recursive_sum(l1, l2, idx = 0):
    if idx < min(len(l1), len(l2)):
        return [l1[idx] + l2[idx]] + recursive_sum(l1, l2, idx + 1)
    else:
        return []

print recursive_sum([1, 2, 3], [4, 5, 6])
# [5, 7, 9]

Or

def recursive_sum(l1, l2, result = None, idx = 0):
    if result is None:
        result = []
    if idx < min(len(l1), len(l2)):
        result.append(l1[idx] + l2[idx])
        return recursive_sum(l1, l2, result, idx + 1)
    else:
        return result
qid & accept id: (22897195, 22897564) query: Selecting rows with similar index names in Pandas soup:

Using @Akavall setup code

\n
df = pd.DataFrame(data = my_data, index=['a_1', 'a_2', 'b_1', 'b_2'], columns=['a', 'b'])\n\nIn [1]: my_data = np.arange(8).reshape(4,2)\n\nIn [2]: my_data[0,0] = 4\n\nIn [3]: df = pd.DataFrame(data = my_data, index=['a_1', 'a_2', 'b_1', 'b_2'], columns=['a', 'b'])\n\nIn [5]: df.filter(regex='a',axis=0)\nOut[5]: \n     a  b\na_1  4  1\na_2  2  3\n\n[2 rows x 2 columns]\n
\n

Note that in general this is better posed as a multi-index

\n
In [6]: df.index = MultiIndex.from_product([['a','b'],[1,2]])\n\nIn [7]: df\nOut[7]: \n     a  b\na 1  4  1\n  2  2  3\nb 1  4  5\n  2  6  7\n\n[4 rows x 2 columns]\n\nIn [8]: df.loc['a']\nOut[8]: \n   a  b\n1  4  1\n2  2  3\n\n[2 rows x 2 columns]\n\nIn [9]: df.loc[['a']]\nOut[9]: \n     a  b\na 1  4  1\n  2  2  3\n\n[2 rows x 2 columns]\n
\n soup wrap:

Using @Akavall setup code

df = pd.DataFrame(data = my_data, index=['a_1', 'a_2', 'b_1', 'b_2'], columns=['a', 'b'])

In [1]: my_data = np.arange(8).reshape(4,2)

In [2]: my_data[0,0] = 4

In [3]: df = pd.DataFrame(data = my_data, index=['a_1', 'a_2', 'b_1', 'b_2'], columns=['a', 'b'])

In [5]: df.filter(regex='a',axis=0)
Out[5]: 
     a  b
a_1  4  1
a_2  2  3

[2 rows x 2 columns]

Note that in general this is better posed as a multi-index

In [6]: df.index = MultiIndex.from_product([['a','b'],[1,2]])

In [7]: df
Out[7]: 
     a  b
a 1  4  1
  2  2  3
b 1  4  5
  2  6  7

[4 rows x 2 columns]

In [8]: df.loc['a']
Out[8]: 
   a  b
1  4  1
2  2  3

[2 rows x 2 columns]

In [9]: df.loc[['a']]
Out[9]: 
     a  b
a 1  4  1
  2  2  3

[2 rows x 2 columns]
qid & accept id: (22912351, 22912718) query: Intersection between multiple files soup:

To get lines that are common to all files you can use:

\n
for f in sys.argv[1:]:\n    data = []\n    with open(f) as inp:\n           lines = set(line.rstrip() for line in  inp)\n           data.append(lines)\n    common_lines = data[0].intersection(*data[1:])\n
\n

For the second part use itertools.combinations:

\n
from itertools import combinations\n\nfor f1, f2 in combinations(sys.argv[1:], 2):\n    with open(f1) as inp1, open(f2) as inp2:\n        print set(line.rstrip() for line in inp1).intersection(map(str.rstrip,\n                                                                           inp2))\n
\n soup wrap:

To get lines that are common to all files you can use:

for f in sys.argv[1:]:
    data = []
    with open(f) as inp:
           lines = set(line.rstrip() for line in  inp)
           data.append(lines)
    common_lines = data[0].intersection(*data[1:])

For the second part use itertools.combinations:

from itertools import combinations

for f1, f2 in combinations(sys.argv[1:], 2):
    with open(f1) as inp1, open(f2) as inp2:
        print set(line.rstrip() for line in inp1).intersection(map(str.rstrip,
                                                                           inp2))
qid & accept id: (22912598, 33530846) query: How to resize subfigures when using ImageGrid from Matplotlib soup:

For me, specifying a figure size helped:

\n
fig = plt.figure(1, (6., 6.))\n
\n

I also had to change the figtext location:

\n
plt.figtext(0.0,0.85,'(a)',size=20)\nplt.figtext(0.0,0.45,'(b)',size=20)\n
\n

Result:

\n

Seems like it maintained the aspect ratio

\n soup wrap:

For me, specifying a figure size helped:

fig = plt.figure(1, (6., 6.))

I also had to change the figtext location:

plt.figtext(0.0,0.85,'(a)',size=20)
plt.figtext(0.0,0.45,'(b)',size=20)

Result:

Seems like it maintained the aspect ratio

qid & accept id: (22920023, 22920045) query: Query current directory in Python (the one the script is running from) soup:

To get the path of the current script, use:

\n
__file__\n
\n

To get the directory from that:

\n
os.path.dirname(__file__)\n
\n
\n

To get the current working directory (directory you were in when the script ran), use:

\n
os.getcwd()\n
\n

os.getcwd documentation

\n soup wrap:

To get the path of the current script, use:

__file__

To get the directory from that:

os.path.dirname(__file__)

To get the current working directory (directory you were in when the script ran), use:

os.getcwd()

os.getcwd documentation

qid & accept id: (22953550, 22953607) query: python list manipulation nesting vertically, making it look like a matrix soup:

You can group the elements with the list comprehension and then transpose it with zip function, like this

\n
data = [15, 20, 25, 35, -20, -15, -10, -5, 10, 15, 20,\n        25, -25, -20, -15, -10, 5, 10, 15, 20, -35, -25, -20, -15]\nlength = len(data) / 3\ndata = [data[i:i + length] for i in xrange(0, len(data), length)]\n
\n

Till this point we grouped the data like this

\n
[[15, 20, 25, 35, -20, -15, -10, -5],\n [10, 15, 20, 25, -25, -20, -15, -10],\n [5, 10, 15, 20, -35, -25, -20, -15]]\n
\n

Now, we just have to transpose data, with zip

\n
print zip(*data)\n
\n

Output

\n
[(15, 10, 5),\n (20, 15, 10),\n (25, 20, 15),\n (35, 25, 20),\n (-20, -25, -35),\n (-15, -20, -25),\n (-10, -15, -20),\n (-5, -10, -15)]\n
\n

zip(*data) means we unpack all the elements of data and pass each of the elements as parameters to zip. It is equivalent to

\n
zip(data[0], data[1], data[2])\n
\n soup wrap:

You can group the elements with the list comprehension and then transpose it with zip function, like this

data = [15, 20, 25, 35, -20, -15, -10, -5, 10, 15, 20,
        25, -25, -20, -15, -10, 5, 10, 15, 20, -35, -25, -20, -15]
length = len(data) / 3
data = [data[i:i + length] for i in xrange(0, len(data), length)]

Till this point we grouped the data like this

[[15, 20, 25, 35, -20, -15, -10, -5],
 [10, 15, 20, 25, -25, -20, -15, -10],
 [5, 10, 15, 20, -35, -25, -20, -15]]

Now, we just have to transpose data, with zip

print zip(*data)

Output

[(15, 10, 5),
 (20, 15, 10),
 (25, 20, 15),
 (35, 25, 20),
 (-20, -25, -35),
 (-15, -20, -25),
 (-10, -15, -20),
 (-5, -10, -15)]

zip(*data) means we unpack all the elements of data and pass each of the elements as parameters to zip. It is equivalent to

zip(data[0], data[1], data[2])
qid & accept id: (22959268, 22959485) query: Loop through multiple different sized python dictionaries soup:

Sort your init_treats keys:

\n
treats = sorted(init_treats)\n
\n

now you can use itertools.groupby() to group them on the first part of your key:

\n
from itertools import groupby\nfrom operator import itemgetter\n\nfor untreat, group in groupby(sorted(init_treats), itemgetter(0)):\n    # group is now a sorted iterator of keys with the same first value\n    if init_untreat[untreat] + sum(map(init_treats.get, group)) == 0:\n        # sum of init_treat_n_m + init_untreat_n is 0\n
\n

Because this uses sorting, this is a O(NlogN) solution (N being the size of the init_treats dictionary)

\n

You could use a dictionary for a O(N + K) solution (K being the size of the init_untreats dictionary):

\n
sums = init_untreat.copy()\nfor untreat, id in init_treats:\n    sums[untreat] += init_treats[untreat, id]\n\nfor untreat, total in sums.items():  # use sums.iteritems() in Python 2\n    if total == 0:\n        # sum of init_treat_n_m + init_untreat_n is 0\n
\n

Because K is always smaller than N in your case, asymptotically speaking this is a O(N) algorithm, of course.

\n soup wrap:

Sort your init_treats keys:

treats = sorted(init_treats)

now you can use itertools.groupby() to group them on the first part of your key:

from itertools import groupby
from operator import itemgetter

for untreat, group in groupby(sorted(init_treats), itemgetter(0)):
    # group is now a sorted iterator of keys with the same first value
    if init_untreat[untreat] + sum(map(init_treats.get, group)) == 0:
        # sum of init_treat_n_m + init_untreat_n is 0

Because this uses sorting, this is a O(NlogN) solution (N being the size of the init_treats dictionary)

You could use a dictionary for a O(N + K) solution (K being the size of the init_untreats dictionary):

sums = init_untreat.copy()
for untreat, id in init_treats:
    sums[untreat] += init_treats[untreat, id]

for untreat, total in sums.items():  # use sums.iteritems() in Python 2
    if total == 0:
        # sum of init_treat_n_m + init_untreat_n is 0

Because K is always smaller than N in your case, asymptotically speaking this is a O(N) algorithm, of course.

qid & accept id: (22961541, 22965622) query: python matplotlib plot sparse matrix pattern soup:

You can get a nice result using a coo_matrix, plot() and some adjustments:

\n
import matplotlib.pyplot as plt\nfrom scipy.sparse import coo_matrix\n\ndef plot_coo_matrix(m):\n    if not isinstance(m, coo_matrix):\n        m = coo_matrix(m)\n    fig = plt.figure()\n    ax = fig.add_subplot(111, axisbg='black')\n    ax.plot(m.col, m.row, 's', color='white', ms=1)\n    ax.set_xlim(0, m.shape[1])\n    ax.set_ylim(0, m.shape[0])\n    ax.set_aspect('equal')\n    for spine in ax.spines.values():\n        spine.set_visible(False)\n    ax.invert_yaxis()\n    ax.set_aspect('equal')\n    ax.set_xticks([])\n    ax.set_yticks([])\n    return ax\n
\n

Note that the y axis is inverted to put the first row at the top of the figure. One example:

\n
import numpy as np\nfrom scipy.sparse import coo_matrix\n\nshape = (100000, 100000)\nrows = np.int_(np.round_(shape[0]*np.random.random(1000)))\ncols = np.int_(np.round_(shape[1]*np.random.random(1000)))\nvals = np.ones_like(rows)\n\nm = coo_matrix((vals, (rows, cols)), shape=shape)\nax = plot_coo_matrix(m)\nax.figure.show()\n
\n

enter image description here

\n soup wrap:

You can get a nice result using a coo_matrix, plot() and some adjustments:

import matplotlib.pyplot as plt
from scipy.sparse import coo_matrix

def plot_coo_matrix(m):
    if not isinstance(m, coo_matrix):
        m = coo_matrix(m)
    fig = plt.figure()
    ax = fig.add_subplot(111, axisbg='black')
    ax.plot(m.col, m.row, 's', color='white', ms=1)
    ax.set_xlim(0, m.shape[1])
    ax.set_ylim(0, m.shape[0])
    ax.set_aspect('equal')
    for spine in ax.spines.values():
        spine.set_visible(False)
    ax.invert_yaxis()
    ax.set_aspect('equal')
    ax.set_xticks([])
    ax.set_yticks([])
    return ax

Note that the y axis is inverted to put the first row at the top of the figure. One example:

import numpy as np
from scipy.sparse import coo_matrix

shape = (100000, 100000)
rows = np.int_(np.round_(shape[0]*np.random.random(1000)))
cols = np.int_(np.round_(shape[1]*np.random.random(1000)))
vals = np.ones_like(rows)

m = coo_matrix((vals, (rows, cols)), shape=shape)
ax = plot_coo_matrix(m)
ax.figure.show()

enter image description here

qid & accept id: (23008799, 23009423) query: How to Search data from a list of Key-Value pair that it is in list or not soup:

I would use a dictionary to hold the values for each character:

\n
a = [('X', '63.658'), ('Y', '21.066'), ...]\n\nprocessed = {}\n\nfor char, value in data:\n    if char not in processed:\n        processed[char] = []\n    processed[char].append(value)\n
\n

Then iterate through all ASCII uppercase characters, printing either the calculated values or e.g. "N is empty..."

\n
import string\n\nfor char in string.ascii_uppercase:\n    if char not in processed:\n        print("{0} is empty...".format(char))\n    else:\n        print("{0}: min={1}, max={2}".format(char, \n                                             min(processed[char]),\n                                             max(processed[char])))\n
\n

You could simplify slightly with collections.defaultdict:

\n
from collections import defaultdict\n\nprocessed = defaultdict(list)\n\nfor char, value in data:\n    processed[char].append(value)\n
\n

Generally, I would suggest breaking up your program a bit - separate the import of tuple data from a text file into one function, and processing the list into another. This makes it easier to develop and test each in isolation.

\n soup wrap:

I would use a dictionary to hold the values for each character:

a = [('X', '63.658'), ('Y', '21.066'), ...]

processed = {}

for char, value in data:
    if char not in processed:
        processed[char] = []
    processed[char].append(value)

Then iterate through all ASCII uppercase characters, printing either the calculated values or e.g. "N is empty..."

import string

for char in string.ascii_uppercase:
    if char not in processed:
        print("{0} is empty...".format(char))
    else:
        print("{0}: min={1}, max={2}".format(char, 
                                             min(processed[char]),
                                             max(processed[char])))

You could simplify slightly with collections.defaultdict:

from collections import defaultdict

processed = defaultdict(list)

for char, value in data:
    processed[char].append(value)

Generally, I would suggest breaking up your program a bit - separate the import of tuple data from a text file into one function, and processing the list into another. This makes it easier to develop and test each in isolation.

qid & accept id: (23024861, 23024945) query: Inherit/Extend Django Module or cram into same Module? soup:

I would consider a folder structure like this:

\n
myproject\n-> task\n----> models\n--------> __init__.py\n--------> base.py\n--------> math.py\n--------> etc.\n----> views\n--------> __init__.py\n--------> math.py\n--------> etc.\n----> urls\n--------> __init__.py\n--------> etc.\n-> check\n----> models\n--------> __init__.py\n--------> base.py\n--------> etc.\n\n-etc.- (you get the idea)\n
\n

This way, you'll divide your Django project in three separate apps, and you'll divide each apps' models, views, forms, etc. into separate files.

\n

To import a specific model, view, form etc. you'll just do:

\n
from task.models.math import MathTask\nfrom task.views.image import ImageView\netc.\n
\n

This is how you'll make the abstract base models:

\n
class BaseTask(models.Model):\n    # your fields goes here\n\n    class Meta:\n        abstract = True\n
\n soup wrap:

I would consider a folder structure like this:

myproject
-> task
----> models
--------> __init__.py
--------> base.py
--------> math.py
--------> etc.
----> views
--------> __init__.py
--------> math.py
--------> etc.
----> urls
--------> __init__.py
--------> etc.
-> check
----> models
--------> __init__.py
--------> base.py
--------> etc.

-etc.- (you get the idea)

This way, you'll divide your Django project in three separate apps, and you'll divide each apps' models, views, forms, etc. into separate files.

To import a specific model, view, form etc. you'll just do:

from task.models.math import MathTask
from task.views.image import ImageView
etc.

This is how you'll make the abstract base models:

class BaseTask(models.Model):
    # your fields goes here

    class Meta:
        abstract = True
qid & accept id: (23025497, 23025647) query: Retrieve position of elements with setting some criteria in numpy soup:

Using scipy, you could characterize such points as those which are both the maximum and the minimum of its neighborhood:

\n
import numpy as np\nimport scipy.ndimage.filters as filters\n\ndef using_filters(data):\n    return np.where(np.logical_and.reduce(\n        [data == f(data, footprint=np.ones((3,3)), mode='constant', cval=np.inf)\n         for f in (filters.maximum_filter, filters.minimum_filter)]))  \n\nusing_filters(data)\n# (array([2, 3]), array([5, 9]))\n
\n

Using only numpy, you could compare data with 8 shifted slices of itself to find the points which are equal:

\n
def using_eight_shifts(data):\n    h, w = data.shape\n    data2 = np.empty((h+2, w+2))\n    data2[(0,-1),:] = np.nan\n    data2[:,(0,-1)] = np.nan\n    data2[1:1+h,1:1+w] = data\n\n    result = np.where(np.logical_and.reduce([\n        (data2[i:i+h,j:j+w] == data)\n        for i in range(3)\n        for j in range(3)\n        if not (i==1 and j==1)]))\n    return result\n
\n

As you can see above, this strategy makes an expanded array which has a border of NaNs around the data. This allows the shifted slices to be expressed as data2[i:i+h,j:j+w].

\n

If you know that you are going to be comparing against neighbors, it might behoove you to define data with a border of NaNs from the very beginning so you don't have to make a second array as done above.

\n

Using eight shifts (and comparisons) is much faster than looping over each cell in data and comparing it against its neighbors:

\n
def using_quadratic_loop(data):\n    return np.array([[i,j]\n            for i in range(1,np.shape(data)[0]-1)\n            for j in range(1,np.shape(data)[1]-1)\n            if np.all(data[i-1:i+2,j-1:j+2]==data[i,j])]).T\n
\n

Here is a benchmark:

\n
using_filters            : 0.130\nusing_eight_shifts       : 0.340\nusing_quadratic_loop     : 18.794\n
\n
\n

Here is the code used to produce the benchmark:

\n
import timeit\nimport operator\nimport numpy as np\nimport scipy.ndimage.filters as filters\nimport matplotlib.pyplot as plt\n\ndata  = np.array([\n    [0,1,2,3,4,7,6,7,8,9,10], \n    [3,3,3,4,7,7,7,8,11,12,11],  \n    [3,3,3,5,7,7,7,9,11,11,11],\n    [3,4,3,6,7,7,7,10,11,11,11],\n    [4,5,6,7,7,9,10,11,11,11,11]\n    ])\n\ndata = np.tile(data, (50,50))\n\ndef using_filters(data):\n    return np.where(np.logical_and.reduce(\n        [data == f(data, footprint=np.ones((3,3)), mode='constant', cval=np.inf)\n         for f in (filters.maximum_filter, filters.minimum_filter)]))    \n\n\ndef using_eight_shifts(data):\n    h, w = data.shape\n    data2 = np.empty((h+2, w+2))\n    data2[(0,-1),:] = np.nan\n    data2[:,(0,-1)] = np.nan\n    data2[1:1+h,1:1+w] = data\n\n    result = np.where(np.logical_and.reduce([\n        (data2[i:i+h,j:j+w] == data)\n        for i in range(3)\n        for j in range(3)\n        if not (i==1 and j==1)]))\n    return result\n\n\ndef using_quadratic_loop(data):\n    return np.array([[i,j]\n            for i in range(1,np.shape(data)[0]-1)\n            for j in range(1,np.shape(data)[1]-1)\n            if np.all(data[i-1:i+2,j-1:j+2]==data[i,j])]).T\n\nnp.testing.assert_equal(using_quadratic_loop(data), using_filters(data))\nnp.testing.assert_equal(using_eight_shifts(data), using_filters(data))\n\ntiming = dict()\nfor f in ('using_filters', 'using_eight_shifts', 'using_quadratic_loop'):\n    timing[f] = timeit.timeit('{f}(data)'.format(f=f),\n                              'from __main__ import data, {f}'.format(f=f),\n                              number=10) \n\nfor f, t in sorted(timing.items(), key=operator.itemgetter(1)):\n    print('{f:25}: {t:.3f}'.format(f=f, t=t))\n
\n soup wrap:

Using scipy, you could characterize such points as those which are both the maximum and the minimum of its neighborhood:

import numpy as np
import scipy.ndimage.filters as filters

def using_filters(data):
    return np.where(np.logical_and.reduce(
        [data == f(data, footprint=np.ones((3,3)), mode='constant', cval=np.inf)
         for f in (filters.maximum_filter, filters.minimum_filter)]))  

using_filters(data)
# (array([2, 3]), array([5, 9]))

Using only numpy, you could compare data with 8 shifted slices of itself to find the points which are equal:

def using_eight_shifts(data):
    h, w = data.shape
    data2 = np.empty((h+2, w+2))
    data2[(0,-1),:] = np.nan
    data2[:,(0,-1)] = np.nan
    data2[1:1+h,1:1+w] = data

    result = np.where(np.logical_and.reduce([
        (data2[i:i+h,j:j+w] == data)
        for i in range(3)
        for j in range(3)
        if not (i==1 and j==1)]))
    return result

As you can see above, this strategy makes an expanded array which has a border of NaNs around the data. This allows the shifted slices to be expressed as data2[i:i+h,j:j+w].

If you know that you are going to be comparing against neighbors, it might behoove you to define data with a border of NaNs from the very beginning so you don't have to make a second array as done above.

Using eight shifts (and comparisons) is much faster than looping over each cell in data and comparing it against its neighbors:

def using_quadratic_loop(data):
    return np.array([[i,j]
            for i in range(1,np.shape(data)[0]-1)
            for j in range(1,np.shape(data)[1]-1)
            if np.all(data[i-1:i+2,j-1:j+2]==data[i,j])]).T

Here is a benchmark:

using_filters            : 0.130
using_eight_shifts       : 0.340
using_quadratic_loop     : 18.794

Here is the code used to produce the benchmark:

import timeit
import operator
import numpy as np
import scipy.ndimage.filters as filters
import matplotlib.pyplot as plt

data  = np.array([
    [0,1,2,3,4,7,6,7,8,9,10], 
    [3,3,3,4,7,7,7,8,11,12,11],  
    [3,3,3,5,7,7,7,9,11,11,11],
    [3,4,3,6,7,7,7,10,11,11,11],
    [4,5,6,7,7,9,10,11,11,11,11]
    ])

data = np.tile(data, (50,50))

def using_filters(data):
    return np.where(np.logical_and.reduce(
        [data == f(data, footprint=np.ones((3,3)), mode='constant', cval=np.inf)
         for f in (filters.maximum_filter, filters.minimum_filter)]))    


def using_eight_shifts(data):
    h, w = data.shape
    data2 = np.empty((h+2, w+2))
    data2[(0,-1),:] = np.nan
    data2[:,(0,-1)] = np.nan
    data2[1:1+h,1:1+w] = data

    result = np.where(np.logical_and.reduce([
        (data2[i:i+h,j:j+w] == data)
        for i in range(3)
        for j in range(3)
        if not (i==1 and j==1)]))
    return result


def using_quadratic_loop(data):
    return np.array([[i,j]
            for i in range(1,np.shape(data)[0]-1)
            for j in range(1,np.shape(data)[1]-1)
            if np.all(data[i-1:i+2,j-1:j+2]==data[i,j])]).T

np.testing.assert_equal(using_quadratic_loop(data), using_filters(data))
np.testing.assert_equal(using_eight_shifts(data), using_filters(data))

timing = dict()
for f in ('using_filters', 'using_eight_shifts', 'using_quadratic_loop'):
    timing[f] = timeit.timeit('{f}(data)'.format(f=f),
                              'from __main__ import data, {f}'.format(f=f),
                              number=10) 

for f, t in sorted(timing.items(), key=operator.itemgetter(1)):
    print('{f:25}: {t:.3f}'.format(f=f, t=t))
qid & accept id: (23039188, 23039206) query: Using a string as a variable name soup:

What you have written will work, but you are overwriting the value in the assignment, since you repeat var. Instead, collect the results in a list:

\n
def post(self):\n    var_list = ['var1', 'var2']\n    result_list = []\n    for var in var_list:\n        result_list.append(self.request.get(var))\n    return result_list # etc.\n
\n

You can further simplify it by using a list comprehension:

\n
def post(self):\n    return [self.request.get(var) for var in ['var1', 'var2']]\n
\n soup wrap:

What you have written will work, but you are overwriting the value in the assignment, since you repeat var. Instead, collect the results in a list:

def post(self):
    var_list = ['var1', 'var2']
    result_list = []
    for var in var_list:
        result_list.append(self.request.get(var))
    return result_list # etc.

You can further simplify it by using a list comprehension:

def post(self):
    return [self.request.get(var) for var in ['var1', 'var2']]
qid & accept id: (23039664, 23039792) query: Covert a list to string soup:

You should convert the list element to string before concatenating. Also, when the list is empty, then return empty string, instead of the empty list.

\n
def to_str(lst):\n    if len(lst) == 0:\n        return ''  # return empty string\n    count = 0\n    while count <= len(lst):\n        # convert lst[count] to string before concatenating\n        result = str(lst[count]) + to_str(lst[1:]) \n        count += 1\n        return result\n
\n

However, there's lot of unnecessary code in your function. You should write it as

\n
def to_str(lst):\n    if not lst:\n        return ''\n    return str(lst[0]) + to_str(lst[1:])\n
\n soup wrap:

You should convert the list element to string before concatenating. Also, when the list is empty, then return empty string, instead of the empty list.

def to_str(lst):
    if len(lst) == 0:
        return ''  # return empty string
    count = 0
    while count <= len(lst):
        # convert lst[count] to string before concatenating
        result = str(lst[count]) + to_str(lst[1:]) 
        count += 1
        return result

However, there's lot of unnecessary code in your function. You should write it as

def to_str(lst):
    if not lst:
        return ''
    return str(lst[0]) + to_str(lst[1:])
qid & accept id: (23046827, 23046925) query: how to format numbers with commas in python soup:

You can use a dict to store the results of your die rolls in a much simpler fashion. This will allow you to loop over all the results instead of writing a separate print statement for each. It also simplifies your code a lot!

\n

For example:

\n
import random\nresults = {1:0, 2:0, 3:0, 4:0, 5:0, 6:0}\nfor count in range(6000000):\n    die = random.randint(1, 6)\n    results[die] += 1\n\nprint('Here are the results:')\n# Loop over the *keys* of the dictionary, which are the die numbers\nfor die in results:\n    # The format(..., ',d') function formats a number with thousands separators\n    print(die, '=', format(results[die], ',d'))\n# Sum up all the die results and print them out\nprint('Total rolls equal:', sum(results.values()))\n
\n

Here's some sample output:

\n
Here are the results:\n1 = 1,000,344\n2 = 1,000,381\n3 = 999,903\n4 = 999,849\n5 = 1,000,494\n6 = 999,029\nTotal rolls equal: 6000000\n
\n

Note that for this simple example, we could also use a list to store the results. However, because of the index translation between zero-indexing and one-indexing, the code would be less clear.

\n soup wrap:

You can use a dict to store the results of your die rolls in a much simpler fashion. This will allow you to loop over all the results instead of writing a separate print statement for each. It also simplifies your code a lot!

For example:

import random
results = {1:0, 2:0, 3:0, 4:0, 5:0, 6:0}
for count in range(6000000):
    die = random.randint(1, 6)
    results[die] += 1

print('Here are the results:')
# Loop over the *keys* of the dictionary, which are the die numbers
for die in results:
    # The format(..., ',d') function formats a number with thousands separators
    print(die, '=', format(results[die], ',d'))
# Sum up all the die results and print them out
print('Total rolls equal:', sum(results.values()))

Here's some sample output:

Here are the results:
1 = 1,000,344
2 = 1,000,381
3 = 999,903
4 = 999,849
5 = 1,000,494
6 = 999,029
Total rolls equal: 6000000

Note that for this simple example, we could also use a list to store the results. However, because of the index translation between zero-indexing and one-indexing, the code would be less clear.

qid & accept id: (23047215, 23047400) query: Webcrawler - Check if tag with href is within an li tag using Beautiful soup? soup:

You can use find_parents() method of BeautifulSoup. This will tell you if a particular tag is within another tag with specified attributes. In this case we are looking for an anchor tag within another tag with nv-talk or nv-view class attribute.

\n

Demo:

\n
html = '''
  • t
  • '''\nsoup = BeautifulSoup(html)\na_tag = soup.find('a')\na_tag.find_parents(attrs={'class':'nv-talk'})\n
    \n

    which gives you:

    \n
    [
  • t
  • ]\n
    \n

    For every anchor tag in the list of your urls, you can check if find_parents() returns an empty list. If yes, it means this link does not belong to a Talk or a Discuss page and hence safe for your crawling.

    \n

    Another way to go about this problem would be to see if the href attribute of the anchor tag begins with 'http' or 'https'. But I am not entirely sure if it fits the logic of your code. What I mean by this is, anchor tags with href attributes that begin with # are links to sections within the same page. If you need to ignore these you can look for anchor tags that do not begin with # but instead begin with http or https. This is what I mean:

    \n
    html = '''\n
  • 1 Overview
  • \n
  • 1 Overview
  • \n
  • 1 Overview
  • \n'''\nsoup = BeautifulSoup(html)\na_tag = soup.find('a', attrs={'href': re.compile(r'^http.*')})\n
    \n

    This gives you only the link that begins with http.

    \n soup wrap:

    You can use find_parents() method of BeautifulSoup. This will tell you if a particular tag is within another tag with specified attributes. In this case we are looking for an anchor tag within another tag with nv-talk or nv-view class attribute.

    Demo:

    html = '''
  • t
  • ''' soup = BeautifulSoup(html) a_tag = soup.find('a') a_tag.find_parents(attrs={'class':'nv-talk'})

    which gives you:

    [
  • t
  • ]

    For every anchor tag in the list of your urls, you can check if find_parents() returns an empty list. If yes, it means this link does not belong to a Talk or a Discuss page and hence safe for your crawling.

    Another way to go about this problem would be to see if the href attribute of the anchor tag begins with 'http' or 'https'. But I am not entirely sure if it fits the logic of your code. What I mean by this is, anchor tags with href attributes that begin with # are links to sections within the same page. If you need to ignore these you can look for anchor tags that do not begin with # but instead begin with http or https. This is what I mean:

    html = '''
    
  • 1 Overview
  • 1 Overview
  • 1 Overview
  • ''' soup = BeautifulSoup(html) a_tag = soup.find('a', attrs={'href': re.compile(r'^http.*')})

    This gives you only the link that begins with http.

    qid & accept id: (23059398, 23059825) query: How to merge item in list soup:

    Try this ,

    \n
    >>> a\n[[1, 2, 3], [4, 5, 6]]\n>>> result=[]\n>>> for i in a:\n    result+=i\n\n\n>>> result\n[1, 2, 3, 4, 5, 6]\n>>>\n
    \n

    OR

    \n
    >>> a\n[[1, 2, 3], [4, 5, 6]]\n>>> sum(a, [])\n
    \n

    Output:

    \n
    [1, 2, 3, 4, 5, 6]\n
    \n

    OR

    \n
    >>> a1\n[1, 2, 3]\n>>> a2\n[4, 5, 6]\n>>> [item for item in itertools.chain(a1, a2)]\n
    \n

    Output:

    \n
    [1, 2, 3, 4, 5, 6]\n
    \n soup wrap:

    Try this ,

    >>> a
    [[1, 2, 3], [4, 5, 6]]
    >>> result=[]
    >>> for i in a:
        result+=i
    
    
    >>> result
    [1, 2, 3, 4, 5, 6]
    >>>
    

    OR

    >>> a
    [[1, 2, 3], [4, 5, 6]]
    >>> sum(a, [])
    

    Output:

    [1, 2, 3, 4, 5, 6]
    

    OR

    >>> a1
    [1, 2, 3]
    >>> a2
    [4, 5, 6]
    >>> [item for item in itertools.chain(a1, a2)]
    

    Output:

    [1, 2, 3, 4, 5, 6]
    
    qid & accept id: (23100704, 23102874) query: Running infinite loops using threads in python soup:

    As far as I understood your question, you have two different tasks that you want them to perform continuously. Now regarding your questions:

    \n
    \n

    how do I go about running two infinite loops?

    \n
    \n

    You can create two different threads that will run these infinite loops for you. The first thread will perform your task1 and second one will perform task2.

    \n
    \n

    Also, once I start executing a thread, how do I execute the other\n thread when the first thread is running continuously/infinitely?

    \n
    \n

    If you are using two different threads then you don't need to be worried about this issue. If the threads are not sharing any resource then you don't need to worry about this fact.\nHow ever if you want to stop/pause one thread from the other thread or vice versa then you can implement a mechanism using flags or locks. These questions will help in this case:

    \n

    Is there any way to kill a Thread in Python?

    \n

    Why does the python threading.Thread object has 'start', but not 'stop'?

    \n

    making-a-program-munltithreaded

    \n

    Sample example using threading:

    \n
    from threading import Thread\n\nclass myClassA(Thread):\n    def __init__(self):\n        Thread.__init__(self)\n        self.daemon = True\n        self.start()\n    def run(self):\n        while True:\n            print 'A'\n\nclass myClassB(Thread):\n    def __init__(self):\n        Thread.__init__(self)\n        self.daemon = True\n        self.start()\n    def run(self):\n        while True:\n            print 'B'\n\n\nmyClassA()\nmyClassB()\nwhile True:\n    pass\n
    \n
    \n

    For shared resources?

    \n
    \n

    Use Locks for them. Here are some examples. One, two and How to synchronize threads in python?

    \n
    \n

    what if I don't want to run it using classes? How do I do this using only methods?

    \n
    \n
    from threading import Thread\n\ndef runA():\n    while True:\n        print 'A\n'\n\ndef runB():\n    while True:\n        print 'B\n'\n\nif __name__ == "__main__":\n    t1 = Thread(target = runA)\n    t2 = Thread(target = runB)\n    t1.setDaemon(True)\n    t2.setDaemon(True)\n    t1.start()\n    t2.start()\n    while True:\n        pass\n
    \n soup wrap:

    As far as I understood your question, you have two different tasks that you want them to perform continuously. Now regarding your questions:

    how do I go about running two infinite loops?

    You can create two different threads that will run these infinite loops for you. The first thread will perform your task1 and second one will perform task2.

    Also, once I start executing a thread, how do I execute the other thread when the first thread is running continuously/infinitely?

    If you are using two different threads then you don't need to be worried about this issue. If the threads are not sharing any resource then you don't need to worry about this fact. How ever if you want to stop/pause one thread from the other thread or vice versa then you can implement a mechanism using flags or locks. These questions will help in this case:

    Is there any way to kill a Thread in Python?

    Why does the python threading.Thread object has 'start', but not 'stop'?

    making-a-program-munltithreaded

    Sample example using threading:

    from threading import Thread
    
    class myClassA(Thread):
        def __init__(self):
            Thread.__init__(self)
            self.daemon = True
            self.start()
        def run(self):
            while True:
                print 'A'
    
    class myClassB(Thread):
        def __init__(self):
            Thread.__init__(self)
            self.daemon = True
            self.start()
        def run(self):
            while True:
                print 'B'
    
    
    myClassA()
    myClassB()
    while True:
        pass
    

    For shared resources?

    Use Locks for them. Here are some examples. One, two and How to synchronize threads in python?

    what if I don't want to run it using classes? How do I do this using only methods?

    from threading import Thread
    
    def runA():
        while True:
            print 'A\n'
    
    def runB():
        while True:
            print 'B\n'
    
    if __name__ == "__main__":
        t1 = Thread(target = runA)
        t2 = Thread(target = runB)
        t1.setDaemon(True)
        t2.setDaemon(True)
        t1.start()
        t2.start()
        while True:
            pass
    
    qid & accept id: (23114734, 23120126) query: how to remove all non english characters and words using NLTK > soup:

    I never worked with nltk before. There could be a better solution too. \nIn my code snippet I am simply doing the following:

    \n
      \n
    1. Reading a file that needs to be checked for non-english/english words named as frequencyList.txt to a variable named as lines.

    2. \n
    3. Then I am opening a new file named as eng_words_only.txt. This file will contain the english words only. Initially this file will be empty, later after executing the script this file will contain all the English language words present in frequencyList.txt

    4. \n
    5. Now for every word in frequencyList.txt I check if it is also present in wordnet.\nIf the word is present then I write this word to the eng_words_only.txt file, else I do nothing. Please see I am using wordnet just for demo purpose. It doesn't contains all the English language words!

    6. \n
    \n

    Code:

    \n
    from nltk.corpus import wordnet\n\nfList = open("frequencyList.txt","r")#Read the file\nlines = fList.readlines()\n\neWords = open("eng_words_only.txt", "a")#Open file for writing\n\nfor w in lines:\n    if not wordnet.synsets(w):#Comparing if word is non-English\n        print 'not '+w\n    else:#If word is an English word\n        print 'yes '+w\n        eWords.write(w)#Write to file \n\neWords.close()#Close the file\n
    \n

    Testing: I first created a file named as frequencyList.txt with the following contents:

    \n
    cat \nmeoooow \nmouse\n
    \n

    then upon executing the code snippet you'll see the following output in the console:

    \n
    not cat\n\nnot meoooow\n\nyes mouse\n
    \n

    Then a file will be created eng_words_only.txt which contains only the words that were supposed to be of the English language. The eng_words_only.txt will contain only mouse word. You may notice that cat is an English word but it is still not in the eng_words_only.txt file. This is the reason why you should use a good source instead of wordnet.\nPlease note: The python script file and the frequencyList.txt should be in the same directory. Also, instead of frequencyList.txt you can use any of your file that you want to check/investigate. In that case don't forget to change the files names in the code snippet too.

    \n

    Second Solution: Although you didn't ask for it but still there is an other way too to do this English word test.

    \n

    Here is the code: Here the wordlist-eng.txt is the file which contains the English words. You have to keep

    \n

    wordlist-eng.txt, frequencyList.txt and the python script in the same directory.

    \n
    with open("wordlist-eng.txt") as word_file:\n    english_words = set(word.strip().lower() for word in word_file)\n\nfList = open("frequencyList.txt","r")\nlines = fList.readlines()\nfList.close()\n\neWords = open("eng_words_only.txt", "a")\n\nfor w in lines:\n    if w.strip().lower() in english_words:\n        eWords.write(w)\n    else: pass\neWords.close()\n
    \n

    After executing the script the eng_words_only.txt will contain all the English words that were present in frequencyList.txt file.

    \n

    I hope this was helpful.

    \n soup wrap:

    I never worked with nltk before. There could be a better solution too. In my code snippet I am simply doing the following:

    1. Reading a file that needs to be checked for non-english/english words named as frequencyList.txt to a variable named as lines.

    2. Then I am opening a new file named as eng_words_only.txt. This file will contain the english words only. Initially this file will be empty, later after executing the script this file will contain all the English language words present in frequencyList.txt

    3. Now for every word in frequencyList.txt I check if it is also present in wordnet. If the word is present then I write this word to the eng_words_only.txt file, else I do nothing. Please see I am using wordnet just for demo purpose. It doesn't contains all the English language words!

    Code:

    from nltk.corpus import wordnet
    
    fList = open("frequencyList.txt","r")#Read the file
    lines = fList.readlines()
    
    eWords = open("eng_words_only.txt", "a")#Open file for writing
    
    for w in lines:
        if not wordnet.synsets(w):#Comparing if word is non-English
            print 'not '+w
        else:#If word is an English word
            print 'yes '+w
            eWords.write(w)#Write to file 
    
    eWords.close()#Close the file
    

    Testing: I first created a file named as frequencyList.txt with the following contents:

    cat 
    meoooow 
    mouse
    

    then upon executing the code snippet you'll see the following output in the console:

    not cat
    
    not meoooow
    
    yes mouse
    

    Then a file will be created eng_words_only.txt which contains only the words that were supposed to be of the English language. The eng_words_only.txt will contain only mouse word. You may notice that cat is an English word but it is still not in the eng_words_only.txt file. This is the reason why you should use a good source instead of wordnet. Please note: The python script file and the frequencyList.txt should be in the same directory. Also, instead of frequencyList.txt you can use any of your file that you want to check/investigate. In that case don't forget to change the files names in the code snippet too.

    Second Solution: Although you didn't ask for it but still there is an other way too to do this English word test.

    Here is the code: Here the wordlist-eng.txt is the file which contains the English words. You have to keep

    wordlist-eng.txt, frequencyList.txt and the python script in the same directory.

    with open("wordlist-eng.txt") as word_file:
        english_words = set(word.strip().lower() for word in word_file)
    
    fList = open("frequencyList.txt","r")
    lines = fList.readlines()
    fList.close()
    
    eWords = open("eng_words_only.txt", "a")
    
    for w in lines:
        if w.strip().lower() in english_words:
            eWords.write(w)
        else: pass
    eWords.close()
    

    After executing the script the eng_words_only.txt will contain all the English words that were present in frequencyList.txt file.

    I hope this was helpful.

    qid & accept id: (23147735, 23147748) query: Python: How to remove a list containing Nones from a list of lists? soup:

    You could do it as:

    \n
    my_new_list = [i for i in myList if i.count(None) < 4]\n\n[OUTPUT]\n[[3, 4, None, None, None]]\n
    \n

    The problem is that you are modifying your list while iterating through it. If you want to use that kind of loop structure, do it as this instead:

    \n
    i = 0\nwhile i < len(myList):\n    if(myList[i].count(None) >= 4):\n        del myList[i]\n    else:\n        i += 1\n
    \n soup wrap:

    You could do it as:

    my_new_list = [i for i in myList if i.count(None) < 4]
    
    [OUTPUT]
    [[3, 4, None, None, None]]
    

    The problem is that you are modifying your list while iterating through it. If you want to use that kind of loop structure, do it as this instead:

    i = 0
    while i < len(myList):
        if(myList[i].count(None) >= 4):
            del myList[i]
        else:
            i += 1
    
    qid & accept id: (23177075, 23177129) query: Printing a list into grid soup:

    This two liners would be enough to solve your problem!

    \n

    Try this:

    \n
    >>> for values in List:\n...     print (" ".join([chars for chars in values]), "\n")\n
    \n

    And the output:

    \n
    a a a a a a a a a\n\nb b b b b b b b b\n\nc c c c c c c c c\n
    \n soup wrap:

    This two liners would be enough to solve your problem!

    Try this:

    >>> for values in List:
    ...     print (" ".join([chars for chars in values]), "\n")
    

    And the output:

    a a a a a a a a a
    
    b b b b b b b b b
    
    c c c c c c c c c
    
    qid & accept id: (23206062, 23206269) query: Printing a two dimensional list soup:

    After you added the rest of the code, I'm slightly confused by your way of representing rows and columns so here is how I would write the code (if I was approaching the problem like you did); hope it helps:

    \n
    def _new_game_board() -> [[str]]:\n    """\n    Creates a new game board.  Initially, a game board has the size\n    BOARD_COLUMNS x BOARD_ROWS and is comprised only of strings with the\n    value NONE\n    """\n\n    return [[None] * BOARD_COLUMNS for _ in range(BOARD_ROWS)]\n\nConnectFourGameState = namedtuple('ConnectFourGameState', ['board', 'turn'])\n\ndef new_game_state() -> ConnectFourGameState:\n    """\n    Returns a ConnectFourGameState representing a brand new game\n    in which no moves have been made yet.\n    """\n\n    return ConnectFourGameState(board=_new_game_board(), turn=RED)\n
    \n

    And as for the print_board function:

    \n
    def print_board(game_state):\n    """Prints the game board given the current game state"""\n\n    print("1 2 3 4 5 6 7")\n    for row in range(BOARD_ROWS):\n        for col in range(BOARD_COLUMNS):\n            if game_state.board[row][col] == connect_four.NONE:\n                print('.', end=' ')\n            elif game_state.board[row][col] == connect_four.RED:\n                print('R', end=' ')\n            elif game_state.board[row][col] == connect_four.YELLOW:\n               print('Y', end=' ')\n\n        print()\n
    \n soup wrap:

    After you added the rest of the code, I'm slightly confused by your way of representing rows and columns so here is how I would write the code (if I was approaching the problem like you did); hope it helps:

    def _new_game_board() -> [[str]]:
        """
        Creates a new game board.  Initially, a game board has the size
        BOARD_COLUMNS x BOARD_ROWS and is comprised only of strings with the
        value NONE
        """
    
        return [[None] * BOARD_COLUMNS for _ in range(BOARD_ROWS)]
    
    ConnectFourGameState = namedtuple('ConnectFourGameState', ['board', 'turn'])
    
    def new_game_state() -> ConnectFourGameState:
        """
        Returns a ConnectFourGameState representing a brand new game
        in which no moves have been made yet.
        """
    
        return ConnectFourGameState(board=_new_game_board(), turn=RED)
    

    And as for the print_board function:

    def print_board(game_state):
        """Prints the game board given the current game state"""
    
        print("1 2 3 4 5 6 7")
        for row in range(BOARD_ROWS):
            for col in range(BOARD_COLUMNS):
                if game_state.board[row][col] == connect_four.NONE:
                    print('.', end=' ')
                elif game_state.board[row][col] == connect_four.RED:
                    print('R', end=' ')
                elif game_state.board[row][col] == connect_four.YELLOW:
                   print('Y', end=' ')
    
            print()
    
    qid & accept id: (23224696, 23224799) query: Python OptParse combine multiple options soup:

    The options can be combined as you want. The program run with -tb

    \n
    import optparse, sys \nparser = optparse.OptionParser(usage='python %prog -t -b -q',\n                           prog=sys.argv[0],\n                                                      )   \nparser.add_option('-t','--tt', action="store_true", help="Blah",dest="t")\nparser.add_option('-b','--bb', action="store_true", help="Blah",dest="b")\nparser.add_option('-q','--qq', action="store_true", help="Blah",dest="q")\noptions, args = parser.parse_args()\n\nprint options\n
    \n

    produces

    \n
    {'q': None, 'b': True, 't': True}\n
    \n soup wrap:

    The options can be combined as you want. The program run with -tb

    import optparse, sys 
    parser = optparse.OptionParser(usage='python %prog -t -b -q',
                               prog=sys.argv[0],
                                                          )   
    parser.add_option('-t','--tt', action="store_true", help="Blah",dest="t")
    parser.add_option('-b','--bb', action="store_true", help="Blah",dest="b")
    parser.add_option('-q','--qq', action="store_true", help="Blah",dest="q")
    options, args = parser.parse_args()
    
    print options
    

    produces

    {'q': None, 'b': True, 't': True}
    
    qid & accept id: (23231840, 23232392) query: Spreadsheet Manipulation Tricks w/ Python's Pandas soup:

    In general, you want to be thinking about vectorized operations on columns instead of operations on specific cells.

    \n

    So, for example, if you had a data column, and you wanted another column that was the same but with each value multiplied by 3, you could do this in two basic ways. The first is the "cell-by-cell" operation.

    \n
    df['data_prime'] = df['data'].apply(lambda x: 3*x)\n
    \n

    The second is the vectorized way:

    \n
    df['data_prime'] = df['data'] * 3\n
    \n
    \n

    So, column-by-column in your spreadsheet:

    \n

    Count (you can add 1 to the right side if you want it to start at 1 instead of 0):

    \n
    df['count'] = pandas.Series(range(len(df))\n
    \n

    Running total:

    \n
    df['running total'] = df['data'].cumsum()\n
    \n

    Difference from a scalar (set the scalar to a particular value in your df if you want):

    \n
    df['diff'] = scalar - df['data']\n
    \n

    Moving average:

    \n
    df['moving average'] = df['running total'] / df['count'].astype('float')\n
    \n

    Basic formula from your spreadsheet:

    \n

    I think you have enough to this on your own.

    \n

    If statement:

    \n
    df['new column'] = 0\nmask = df['data column'] >= 3\ndf.loc[mask, 'new column'] = 1\n
    \n soup wrap:

    In general, you want to be thinking about vectorized operations on columns instead of operations on specific cells.

    So, for example, if you had a data column, and you wanted another column that was the same but with each value multiplied by 3, you could do this in two basic ways. The first is the "cell-by-cell" operation.

    df['data_prime'] = df['data'].apply(lambda x: 3*x)
    

    The second is the vectorized way:

    df['data_prime'] = df['data'] * 3
    

    So, column-by-column in your spreadsheet:

    Count (you can add 1 to the right side if you want it to start at 1 instead of 0):

    df['count'] = pandas.Series(range(len(df))
    

    Running total:

    df['running total'] = df['data'].cumsum()
    

    Difference from a scalar (set the scalar to a particular value in your df if you want):

    df['diff'] = scalar - df['data']
    

    Moving average:

    df['moving average'] = df['running total'] / df['count'].astype('float')
    

    Basic formula from your spreadsheet:

    I think you have enough to this on your own.

    If statement:

    df['new column'] = 0
    mask = df['data column'] >= 3
    df.loc[mask, 'new column'] = 1
    
    qid & accept id: (23246125, 42264525) query: How to center labels in histogram plot soup:

    The following alternative solution is compatible with plt.hist() (and this has the advantage for instance that you can call it after a pandas.DataFrame.hist().

    \n
    import numpy as np\n\ndef bins_labels(bins, **kwargs):\n    bin_w = (max(bins) - min(bins)) / (len(bins) - 1)\n    plt.xticks(np.arange(min(bins)+bin_w/2, max(bins), bin_w), bins, **kwargs)\n    plt.xlim(bins[0], bins[-1])\n
    \n

    (The last line is not strictly requested by the OP but it makes the output nicer)

    \n

    This can be used as in:

    \n
    import matplotlib.pyplot as plt\nbins = range(5)\nplt.hist(results, bins=bins)\nbins_labels(bins, fontsize=20)\nplt.show()\n
    \n

    Result: success!

    \n soup wrap:

    The following alternative solution is compatible with plt.hist() (and this has the advantage for instance that you can call it after a pandas.DataFrame.hist().

    import numpy as np
    
    def bins_labels(bins, **kwargs):
        bin_w = (max(bins) - min(bins)) / (len(bins) - 1)
        plt.xticks(np.arange(min(bins)+bin_w/2, max(bins), bin_w), bins, **kwargs)
        plt.xlim(bins[0], bins[-1])
    

    (The last line is not strictly requested by the OP but it makes the output nicer)

    This can be used as in:

    import matplotlib.pyplot as plt
    bins = range(5)
    plt.hist(results, bins=bins)
    bins_labels(bins, fontsize=20)
    plt.show()
    

    Result: success!

    qid & accept id: (23248583, 23249158) query: How to add regression functions in python, or create a new regression function from given coefficients? soup:

    The prediction generated by this model should be exactly

    \n
    np.dot(X_test, res_wls.params)\n
    \n

    Thus, if you want to sum several models, e.g.

    \n
    summed_params = np.array([res_wls.params for res_wls in all_my_res_wls]).sum(axis=0)\n
    \n

    your prediction should be

    \n
    np.dot(X_test, summed_params)\n
    \n

    In this case there would be no need to use the built-in functions of the estimator.

    \n soup wrap:

    The prediction generated by this model should be exactly

    np.dot(X_test, res_wls.params)
    

    Thus, if you want to sum several models, e.g.

    summed_params = np.array([res_wls.params for res_wls in all_my_res_wls]).sum(axis=0)
    

    your prediction should be

    np.dot(X_test, summed_params)
    

    In this case there would be no need to use the built-in functions of the estimator.

    qid & accept id: (23271192, 23272873) query: Find nested sub-classes in a class in the order they're defined soup:

    The metaclass documentation includes a nice example of how to get a class to remember what order its members were defined in:

    \n
    class OrderedClass(type):\n\n     @classmethod\n     def __prepare__(metacls, name, bases, **kwds):\n        return collections.OrderedDict()\n\n     def __new__(cls, name, bases, namespace, **kwds):\n        result = type.__new__(cls, name, bases, dict(namespace))\n        result.members = tuple(namespace)\n        return result\n\nclass A(metaclass=OrderedClass):\n    def one(self): pass\n    def two(self): pass\n    def three(self): pass\n    def four(self): pass \n\n\n>>> A.members\n('__module__', 'one', 'two', 'three', 'four')\n
    \n

    You can adapt this to your case like this:

    \n
    class A:\n    pass\n\nclass B(metaclass=OrderedClass):\n    x = 5\n    class D(A):\n        pass\n    class C(A):\n        pass\n\nprint(filter(lambda x: isinstance(getattr(B, x), type), b.members)))\n
    \n

    gives:

    \n
    ['D', 'C']\n
    \n

    Note that this gives you the names of the classes; if you want the classes themselves, you can do this instead:

    \n
    print(list(filter(lambda x: isinstance(x, type), (getattr(B, x) for x in B.members))))\n
    \n soup wrap:

    The metaclass documentation includes a nice example of how to get a class to remember what order its members were defined in:

    class OrderedClass(type):
    
         @classmethod
         def __prepare__(metacls, name, bases, **kwds):
            return collections.OrderedDict()
    
         def __new__(cls, name, bases, namespace, **kwds):
            result = type.__new__(cls, name, bases, dict(namespace))
            result.members = tuple(namespace)
            return result
    
    class A(metaclass=OrderedClass):
        def one(self): pass
        def two(self): pass
        def three(self): pass
        def four(self): pass 
    
    
    >>> A.members
    ('__module__', 'one', 'two', 'three', 'four')
    

    You can adapt this to your case like this:

    class A:
        pass
    
    class B(metaclass=OrderedClass):
        x = 5
        class D(A):
            pass
        class C(A):
            pass
    
    print(filter(lambda x: isinstance(getattr(B, x), type), b.members)))
    

    gives:

    ['D', 'C']
    

    Note that this gives you the names of the classes; if you want the classes themselves, you can do this instead:

    print(list(filter(lambda x: isinstance(x, type), (getattr(B, x) for x in B.members))))
    
    qid & accept id: (23297197, 23297344) query: reading Unicode string as json object in python soup:

    That's not valid JSON; not all JavaScript is JSON. You could try to convert this to JSON it with:

    \n
    import re\n\ndef repair_json(val):\n    return re.sub(r'(\w+):', r'"\1":', \n                  val.replace('"', '\u0022').replace("'", '"'))\n
    \n

    This:

    \n
      \n
    • quotes embedded double quotes with a JSON \uxxxx unicode escape sequence.
    • \n
    • Replaces single quotes with double; this'll not work if any JSON string values contain single quotes.
    • \n
    • Quotes key names; again, if there are any embedded words with colon in the values, these'll be quoted as well and be broken.
    • \n
    \n

    For your given value, this method works:

    \n
    >>> json.loads(repair_json(v[0]))\n{u'maxColsVisible': 100, u'maxRowsVisible': 20, u'hasSearch': True, u'parent': u'rcJobsGrid_parent', u'url': u'/jobs/apply/ajax?action=careerCenterBean.jobsGrid.onAJAX&type=METHOD_ACTION', u'onPostRenderTable': u'if(WFN.getWidget("rcJobsGrid_toolbar_delete")!=null){WFN.getWidget("rcJobsGrid_toolbar_delete").set("useBusy",false);}WFN.handleButtonEnabling("rcJobsGrid", "rcJobsGrid_toolbar_delete");', u'widthType': u'px', u'store': {u'maxRowsVisible': 20, u'endPosition': 6, u'gridId': u'rcJobsGrid', u'gridExpressionString': u'#{careerCenterBean.jobsGrid}', u'noDataMessage': u'There are currently no jobs available.', u'customProperties': [{u'value': u'false', u'key': u'USE_DEFAULT_CONFIRM_DELETE_DLG'}, {u'value': u'0', u'key': u'OTHER_PAGES_SELECTION_COUNT'}, {u'value': u'Are you sure you want to delete the selected records?', u'key': u'TABLE_GRID_DELETE_CONFIRM_MSG'}], u'total': 6, u'hasPagination': True, u'tabIndex': 0, u'headerRows': [{u'columns': [{u'locked': False, u'align': u'left', u'label': u'Job Opening', u'width': 300, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'0'}, {u'locked': False, u'align': u'left', u'label': u'Worked In Country', u'width': 200, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'1'}, {u'locked': False, u'align': u'left', u'label': u'Location', u'width': 225, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'2'}, {u'locked': False, u'align': u'left', u'label': u'Date Posted', u'width': 150, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'3'}, {u'locked': False, u'align': u'left', u'label': u'Job ID', u'width': 75, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'4'}]}], u'rows': [{u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'Research Assistant'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/16/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1010'}], u'selected': False, u'id': u'0', u'customProperties': [{u'value': u'46702', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'Research Analyst'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/16/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1011'}], u'selected': False, u'id': u'1', u'customProperties': [{u'value': u'46747', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'User Experience Researcher'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/08/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1007'}], u'selected': False, u'id': u'2', u'customProperties': [{u'value': u'46467', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'Research Manager'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/03/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1004'}], u'selected': False, u'id': u'3', u'customProperties': [{u'value': u'15082', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'Summer Intern'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/03/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1008'}], u'selected': False, u'id': u'4', u'customProperties': [{u'value': u'46476', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'All Other Jobs'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/03/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1009'}], u'selected': False, u'id': u'5', u'customProperties': [{u'value': u'46530', u'key': u'oid'}]}], u'maxColsVisible': 100, u'label': u'name', u'width': 950, u'sortType': 1, u'hasSearch': True, u'lastSort': 0, u'widthType': u'px', u'transparent': False, u'url': u'/jobs/apply/ajax?action=careerCenterBean.jobsGrid.onAJAX&type=METHOD_ACTION', u'footerRows': [], u'startPosition': 1, u'identifier': u'id', u'possibleRowsPerPage': u'10, 20, 30', u'rowsPerPage': 20}, u'possibleRowsPerPage': [10, 20, 30], u'hasPagination': True, u'customRenderers': [{u'toggle': False, u'type': u'STATUS_PROGRESS_BAR_CUSTOM_TYPE', u'renderer': u'com.adp.wfn.customrenderers.renderStatusProgressBar'}], u'toolbar': [{u'iconClass': u'', u'title': u'', u'iconClassDisabled': u'', u'children': None, u'value': u'', u'label': u'', u'active': False, u'onClick': u'', u'action': u'', u'id': u'_toolbar_add'}, {u'iconClass': u'', u'title': u'', u'iconClassDisabled': u'', u'children': None, u'value': u'', u'label': u'', u'active': False, u'onClick': u'', u'action': u'', u'id': u'_toolbar_delete'}], u'timeout': 30000, u'hasResizeColumns': True, u'transparent': False, u'id': u'rcJobsGrid', u'rowsPerPage': 20, u'tabIndex': 0}\n
    \n

    The rows list is a key under the store key:

    \n
    for row in data['store']['rows']:\n    print row\n
    \n

    To explore structures like these, I find the pprint.pprint() function invaluable.

    \n soup wrap:

    That's not valid JSON; not all JavaScript is JSON. You could try to convert this to JSON it with:

    import re
    
    def repair_json(val):
        return re.sub(r'(\w+):', r'"\1":', 
                      val.replace('"', '\u0022').replace("'", '"'))
    

    This:

    • quotes embedded double quotes with a JSON \uxxxx unicode escape sequence.
    • Replaces single quotes with double; this'll not work if any JSON string values contain single quotes.
    • Quotes key names; again, if there are any embedded words with colon in the values, these'll be quoted as well and be broken.

    For your given value, this method works:

    >>> json.loads(repair_json(v[0]))
    {u'maxColsVisible': 100, u'maxRowsVisible': 20, u'hasSearch': True, u'parent': u'rcJobsGrid_parent', u'url': u'/jobs/apply/ajax?action=careerCenterBean.jobsGrid.onAJAX&type=METHOD_ACTION', u'onPostRenderTable': u'if(WFN.getWidget("rcJobsGrid_toolbar_delete")!=null){WFN.getWidget("rcJobsGrid_toolbar_delete").set("useBusy",false);}WFN.handleButtonEnabling("rcJobsGrid", "rcJobsGrid_toolbar_delete");', u'widthType': u'px', u'store': {u'maxRowsVisible': 20, u'endPosition': 6, u'gridId': u'rcJobsGrid', u'gridExpressionString': u'#{careerCenterBean.jobsGrid}', u'noDataMessage': u'There are currently no jobs available.', u'customProperties': [{u'value': u'false', u'key': u'USE_DEFAULT_CONFIRM_DELETE_DLG'}, {u'value': u'0', u'key': u'OTHER_PAGES_SELECTION_COUNT'}, {u'value': u'Are you sure you want to delete the selected records?', u'key': u'TABLE_GRID_DELETE_CONFIRM_MSG'}], u'total': 6, u'hasPagination': True, u'tabIndex': 0, u'headerRows': [{u'columns': [{u'locked': False, u'align': u'left', u'label': u'Job Opening', u'width': 300, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'0'}, {u'locked': False, u'align': u'left', u'label': u'Worked In Country', u'width': 200, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'1'}, {u'locked': False, u'align': u'left', u'label': u'Location', u'width': 225, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'2'}, {u'locked': False, u'align': u'left', u'label': u'Date Posted', u'width': 150, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'3'}, {u'locked': False, u'align': u'left', u'label': u'Job ID', u'width': 75, u'html': False, u'widthType': u'px', u'sortable': True, u'hidden': False, u'id': u'4'}]}], u'rows': [{u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'Research Assistant'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/16/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1010'}], u'selected': False, u'id': u'0', u'customProperties': [{u'value': u'46702', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'Research Analyst'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/16/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1011'}], u'selected': False, u'id': u'1', u'customProperties': [{u'value': u'46747', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'User Experience Researcher'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/08/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1007'}], u'selected': False, u'id': u'2', u'customProperties': [{u'value': u'46467', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'Research Manager'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/03/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1004'}], u'selected': False, u'id': u'3', u'customProperties': [{u'value': u'15082', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'Summer Intern'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2', u'value': u'Arlington, VA'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/03/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1008'}], u'selected': False, u'id': u'4', u'customProperties': [{u'value': u'46476', u'key': u'oid'}]}, {u'cells': [{u'action': u'#{careerCenterBean.viewJobPostingDetails}', u'align': u'left', u'type': u'LINK', u'id': u'0', u'value': u'All Other Jobs'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'1', u'value': u'UNITED STATES'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'2'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'3', u'value': u'04/03/2014'}, {u'align': u'left', u'type': u'OUTPUT_TEXT', u'id': u'4', u'value': u'1009'}], u'selected': False, u'id': u'5', u'customProperties': [{u'value': u'46530', u'key': u'oid'}]}], u'maxColsVisible': 100, u'label': u'name', u'width': 950, u'sortType': 1, u'hasSearch': True, u'lastSort': 0, u'widthType': u'px', u'transparent': False, u'url': u'/jobs/apply/ajax?action=careerCenterBean.jobsGrid.onAJAX&type=METHOD_ACTION', u'footerRows': [], u'startPosition': 1, u'identifier': u'id', u'possibleRowsPerPage': u'10, 20, 30', u'rowsPerPage': 20}, u'possibleRowsPerPage': [10, 20, 30], u'hasPagination': True, u'customRenderers': [{u'toggle': False, u'type': u'STATUS_PROGRESS_BAR_CUSTOM_TYPE', u'renderer': u'com.adp.wfn.customrenderers.renderStatusProgressBar'}], u'toolbar': [{u'iconClass': u'', u'title': u'', u'iconClassDisabled': u'', u'children': None, u'value': u'', u'label': u'', u'active': False, u'onClick': u'', u'action': u'', u'id': u'_toolbar_add'}, {u'iconClass': u'', u'title': u'', u'iconClassDisabled': u'', u'children': None, u'value': u'', u'label': u'', u'active': False, u'onClick': u'', u'action': u'', u'id': u'_toolbar_delete'}], u'timeout': 30000, u'hasResizeColumns': True, u'transparent': False, u'id': u'rcJobsGrid', u'rowsPerPage': 20, u'tabIndex': 0}
    

    The rows list is a key under the store key:

    for row in data['store']['rows']:
        print row
    

    To explore structures like these, I find the pprint.pprint() function invaluable.

    qid & accept id: (23298460, 23299713) query: Averaging out sections of a multiple row array in Python soup:

    I hope this is not too clever. TIL boolean indexing does not broadcast, so I had to manually do the broadcasting. Let me know if anything is unclear.

    \n
    import numpy as np\nA = [1,2,3,4,5]\nB = [50,40,30,20,10]\nC = np.vstack((A,B)) # float so that I can use np.nan\n\ni = np.arange(0, 6, 2)[:, None]\nselections = np.logical_and(A >= i, A < i+2)[None]\n\nD, selections = np.broadcast_arrays(C[:, None], selections)\nD = D.astype(float)     # allows use of nan, and makes a copy to prevent repeated behavior\nD[~selections] = np.nan # exclude these elements from mean\n\nD = np.nanmean(D, axis=-1)\n
    \n

    Then,

    \n
    >>> D\narray([[  1. ,   2.5,   4.5],\n       [ 50. ,  35. ,  15. ]])\n
    \n

    Another way, using np.histogram to bin your data. This may be faster for large arrays, but is only useful for few rows, since a hist must be done with different weights for each row:

    \n
    bins = np.arange(0, 7, 2)     # include the end\nn = np.histogram(A, bins)[0]  # number of columns in each bin\na_mean = np.histogram(A, bins, weights=A)[0]/n\nb_mean = np.histogram(A, bins, weights=B)[0]/n\nD = np.vstack([a_mean, b_mean])\n
    \n soup wrap:

    I hope this is not too clever. TIL boolean indexing does not broadcast, so I had to manually do the broadcasting. Let me know if anything is unclear.

    import numpy as np
    A = [1,2,3,4,5]
    B = [50,40,30,20,10]
    C = np.vstack((A,B)) # float so that I can use np.nan
    
    i = np.arange(0, 6, 2)[:, None]
    selections = np.logical_and(A >= i, A < i+2)[None]
    
    D, selections = np.broadcast_arrays(C[:, None], selections)
    D = D.astype(float)     # allows use of nan, and makes a copy to prevent repeated behavior
    D[~selections] = np.nan # exclude these elements from mean
    
    D = np.nanmean(D, axis=-1)
    

    Then,

    >>> D
    array([[  1. ,   2.5,   4.5],
           [ 50. ,  35. ,  15. ]])
    

    Another way, using np.histogram to bin your data. This may be faster for large arrays, but is only useful for few rows, since a hist must be done with different weights for each row:

    bins = np.arange(0, 7, 2)     # include the end
    n = np.histogram(A, bins)[0]  # number of columns in each bin
    a_mean = np.histogram(A, bins, weights=A)[0]/n
    b_mean = np.histogram(A, bins, weights=B)[0]/n
    D = np.vstack([a_mean, b_mean])
    
    qid & accept id: (23310630, 23318776) query: Counting elements matching a pattern in a tuple of tuples soup:

    The fastest so far:

    \n
    def count_zeros(matrix):\n    total = 0\n    for row in matrix:\n        total += row.count(0)\n    return total\n
    \n
    \n

    For 2D tuple you could use a generator expression:

    \n
    def count_zeros_gen(matrix):\n    return sum(row.count(0) for row in matrix)\n
    \n

    Time comparison:

    \n
    %timeit [item for row in m for item in row].count(0) # OP\n1000000 loops, best of 3: 1.15 µs per loop\n\n%timeit len([item for row in m for item in row if item == 0]) # @thefourtheye\n1000000 loops, best of 3: 913 ns per loop\n\n%timeit sum(row.count(0) for row in m) \n1000000 loops, best of 3: 1 µs per loop\n\n%timeit count_zeros(m)\n1000000 loops, best of 3: 775 ns per loop\n
    \n

    For the baseline:

    \n
    def f(m): pass\n%timeit f(m)\n10000000 loops, best of 3: 110 ns per loop\n
    \n soup wrap:

    The fastest so far:

    def count_zeros(matrix):
        total = 0
        for row in matrix:
            total += row.count(0)
        return total
    

    For 2D tuple you could use a generator expression:

    def count_zeros_gen(matrix):
        return sum(row.count(0) for row in matrix)
    

    Time comparison:

    %timeit [item for row in m for item in row].count(0) # OP
    1000000 loops, best of 3: 1.15 µs per loop
    
    %timeit len([item for row in m for item in row if item == 0]) # @thefourtheye
    1000000 loops, best of 3: 913 ns per loop
    
    %timeit sum(row.count(0) for row in m) 
    1000000 loops, best of 3: 1 µs per loop
    
    %timeit count_zeros(m)
    1000000 loops, best of 3: 775 ns per loop
    

    For the baseline:

    def f(m): pass
    %timeit f(m)
    10000000 loops, best of 3: 110 ns per loop
    
    qid & accept id: (23312803, 23312852) query: pyplot: loglog() with base e soup:

    When plotting using plt.loglog you can pass the keyword arguments basex and basey as shown below.

    \n

    From numpy you can get the e constant with numpy.e (or np.e if you import numpy as np)

    \n
    import numpy as np\nimport matplotlib.pyplot as plt\n\n# Generate some data.\nx = np.linspace(0, 2, 1000)\ny = x**np.e\n\nplt.loglog(x,y, basex=np.e, basey=np.e)\nplt.show()\n
    \n

    Edit

    \n

    Additionally if you want pretty looking ticks you can use matplotlib.ticker to choose the format of your ticks, an example of which is given below.

    \n
    import numpy as np\n\nimport matplotlib.pyplot as plt\nimport matplotlib.ticker as mtick\n\nx = np.linspace(1, 4, 1000)\n\ny = x**3\n\nfig, ax = plt.subplots()\n\nax.loglog(x,y, basex=np.e, basey=np.e)\n\ndef ticks(y, pos):\n    return r'$e^{:.0f}$'.format(np.log(y))\n\nax.xaxis.set_major_formatter(mtick.FuncFormatter(ticks))\nax.yaxis.set_major_formatter(mtick.FuncFormatter(ticks))\n\nplt.show()\n
    \n

    Plot

    \n soup wrap:

    When plotting using plt.loglog you can pass the keyword arguments basex and basey as shown below.

    From numpy you can get the e constant with numpy.e (or np.e if you import numpy as np)

    import numpy as np
    import matplotlib.pyplot as plt
    
    # Generate some data.
    x = np.linspace(0, 2, 1000)
    y = x**np.e
    
    plt.loglog(x,y, basex=np.e, basey=np.e)
    plt.show()
    

    Edit

    Additionally if you want pretty looking ticks you can use matplotlib.ticker to choose the format of your ticks, an example of which is given below.

    import numpy as np
    
    import matplotlib.pyplot as plt
    import matplotlib.ticker as mtick
    
    x = np.linspace(1, 4, 1000)
    
    y = x**3
    
    fig, ax = plt.subplots()
    
    ax.loglog(x,y, basex=np.e, basey=np.e)
    
    def ticks(y, pos):
        return r'$e^{:.0f}$'.format(np.log(y))
    
    ax.xaxis.set_major_formatter(mtick.FuncFormatter(ticks))
    ax.yaxis.set_major_formatter(mtick.FuncFormatter(ticks))
    
    plt.show()
    

    Plot

    qid & accept id: (23317342, 23317595) query: Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries soup:

    I'd do something like the following:

    \n
    foo = lambda x: pd.Series([i for i in reversed(x.split(','))])\nrev = df['City, State, Country'].apply(foo)\nprint rev\n\n      0    1        2\n0   HUN  NaN      NaN\n1   ESP  NaN      NaN\n2   GBR  NaN      NaN\n3   ESP  NaN      NaN\n4   FRA  NaN      NaN\n5   USA   ID      NaN\n6   USA   GA      NaN\n7   USA   NJ  Hoboken\n8   USA   NJ      NaN\n9   AUS  NaN      NaN\n
    \n

    I think that gets you what you want but if you also want to pretty things up and get a City, State, Country column order, you could add the following:

    \n
    rev.rename(columns={0:'Country',1:'State',2:'City'},inplace=True)\nrev = rev[['City','State','Country']]\nprint rev\n\n     City State Country\n0      NaN   NaN     HUN\n1      NaN   NaN     ESP\n2      NaN   NaN     GBR\n3      NaN   NaN     ESP\n4      NaN   NaN     FRA\n5      NaN    ID     USA\n6      NaN    GA     USA\n7  Hoboken    NJ     USA\n8      NaN    NJ     USA\n9      NaN   NaN     AUS\n
    \n soup wrap:

    I'd do something like the following:

    foo = lambda x: pd.Series([i for i in reversed(x.split(','))])
    rev = df['City, State, Country'].apply(foo)
    print rev
    
          0    1        2
    0   HUN  NaN      NaN
    1   ESP  NaN      NaN
    2   GBR  NaN      NaN
    3   ESP  NaN      NaN
    4   FRA  NaN      NaN
    5   USA   ID      NaN
    6   USA   GA      NaN
    7   USA   NJ  Hoboken
    8   USA   NJ      NaN
    9   AUS  NaN      NaN
    

    I think that gets you what you want but if you also want to pretty things up and get a City, State, Country column order, you could add the following:

    rev.rename(columns={0:'Country',1:'State',2:'City'},inplace=True)
    rev = rev[['City','State','Country']]
    print rev
    
         City State Country
    0      NaN   NaN     HUN
    1      NaN   NaN     ESP
    2      NaN   NaN     GBR
    3      NaN   NaN     ESP
    4      NaN   NaN     FRA
    5      NaN    ID     USA
    6      NaN    GA     USA
    7  Hoboken    NJ     USA
    8      NaN    NJ     USA
    9      NaN   NaN     AUS
    
    qid & accept id: (23323675, 23323697) query: Returning a list in each iteration using list comprehension soup:

    Nope, you either have to flatten like this

    \n
    print([item for lang in languages_list for item in [lang.code] + list(lang.alt)])\n
    \n

    Or

    \n
    from itertools import chain\nprint([item for lang in languages_list for item in chain([lang.code], lang.alt)])\n
    \n

    I would prefer the itertools.chain method, since it doesn't have to create a long list, incase your lang.alt is long.

    \n soup wrap:

    Nope, you either have to flatten like this

    print([item for lang in languages_list for item in [lang.code] + list(lang.alt)])
    

    Or

    from itertools import chain
    print([item for lang in languages_list for item in chain([lang.code], lang.alt)])
    

    I would prefer the itertools.chain method, since it doesn't have to create a long list, incase your lang.alt is long.

    qid & accept id: (23368164, 23368457) query: Python - "properly" organise (spread out) x and y data soup:

    You can try something like this:

    \n
    from __future__ import division\ndef spreadout(X, Y):\n    ratio = len(X) / len(Y)\n    result = []\n    while X or Y:\n        if not Y or len(X)/len(Y) >= ratio:\n            result.append(X.pop())\n        else:\n            result.append(Y.pop())\n    return result\n
    \n

    The idea behind the algorithm is to determine the ratio of the X andY lists and to alternately pop elements from either of the lists to keep the ratio in the result list similar.

    \n

    This implementaiton works with lists of arbitrary elements and will return the result as a list. If you want just your x,y string, the code can be simplified and optimized some, e.g. using len this often would be wasteful is you have very long lists of xs and ys. Or you can just write a wrapper for that:

    \n
    def xy_wrapper(x, y):\n    return ",".join(spreadout(['x'] * x, ['y'] * y))\n
    \n

    Example Output:

    \n
    >>> spreadout(range(6), list("ABC"))\n[5, 'C', 4, 3, 'B', 2, 1, 'A', 0]\n>>> xy_wrapper(5, 17)\n'x,y,y,y,y,x,y,y,y,x,y,y,y,y,x,y,y,y,x,y,y,y'\n
    \n soup wrap:

    You can try something like this:

    from __future__ import division
    def spreadout(X, Y):
        ratio = len(X) / len(Y)
        result = []
        while X or Y:
            if not Y or len(X)/len(Y) >= ratio:
                result.append(X.pop())
            else:
                result.append(Y.pop())
        return result
    

    The idea behind the algorithm is to determine the ratio of the X andY lists and to alternately pop elements from either of the lists to keep the ratio in the result list similar.

    This implementaiton works with lists of arbitrary elements and will return the result as a list. If you want just your x,y string, the code can be simplified and optimized some, e.g. using len this often would be wasteful is you have very long lists of xs and ys. Or you can just write a wrapper for that:

    def xy_wrapper(x, y):
        return ",".join(spreadout(['x'] * x, ['y'] * y))
    

    Example Output:

    >>> spreadout(range(6), list("ABC"))
    [5, 'C', 4, 3, 'B', 2, 1, 'A', 0]
    >>> xy_wrapper(5, 17)
    'x,y,y,y,y,x,y,y,y,x,y,y,y,y,x,y,y,y,x,y,y,y'
    
    qid & accept id: (23388668, 23388712) query: How to start at a specific step in a script? soup:

    You could do e.g.

    \n
    if start >= 1:\n    function1()\nif start >= 2:\n    function2()\nif start >= 3:\n    function3()\n
    \n

    or have a list of functions:

    \n
    f = [None, function1, function2, function3, ...]\nfor f in f_list[start:]:\n    f()\n
    \n soup wrap:

    You could do e.g.

    if start >= 1:
        function1()
    if start >= 2:
        function2()
    if start >= 3:
        function3()
    

    or have a list of functions:

    f = [None, function1, function2, function3, ...]
    for f in f_list[start:]:
        f()
    
    qid & accept id: (23402150, 23402197) query: Grouping and Computing Frequency ,Pandas soup:

    I think you want to group on both 'Type' and 'Name':

    \n
    print df.groupby(['Type','Name']).size()\n\nType     Name       \nBird     Flappy Bird    1\n         Pigeon         2\nPokemon  Jerry          3\n         Mudkip         2\n
    \n

    Or if it is important to have the column named 'Frequency', you could do something like the following:

    \n
    print df.groupby(['Type','Name'])['Type'].agg({'Frequency':'count'})\n\n                     Frequency\nType    Name                  \nBird    Flappy Bird          1\n        Pigeon               2\nPokemon Jerry                3\n        Mudkip               2\n
    \n soup wrap:

    I think you want to group on both 'Type' and 'Name':

    print df.groupby(['Type','Name']).size()
    
    Type     Name       
    Bird     Flappy Bird    1
             Pigeon         2
    Pokemon  Jerry          3
             Mudkip         2
    

    Or if it is important to have the column named 'Frequency', you could do something like the following:

    print df.groupby(['Type','Name'])['Type'].agg({'Frequency':'count'})
    
                         Frequency
    Type    Name                  
    Bird    Flappy Bird          1
            Pigeon               2
    Pokemon Jerry                3
            Mudkip               2
    
    qid & accept id: (23429426, 23429481) query: Sorting a List by frequency of occurrence in a list soup:
    from collections import Counter\nprint [item for items, c in Counter(a).most_common() for item in [items] * c]\n# [5, 5, 5, 5, 3, 3, 3, 4, 4, 4, 1, 1, 2]\n
    \n

    Or even better (efficient) implementation

    \n
    from collections import Counter\nfrom itertools import repeat, chain\nprint list(chain.from_iterable(repeat(i, c) for i,c in Counter(a).most_common()))\n# [5, 5, 5, 5, 3, 3, 3, 4, 4, 4, 1, 1, 2]\n
    \n

    Or

    \n
    from collections import Counter\nprint sorted(a, key=Counter(a).get, reverse=True)\n# [5, 5, 5, 5, 3, 3, 3, 4, 4, 4, 1, 1, 2]\n
    \n

    If you prefer in-place sort

    \n
    a.sort(key=Counter(a).get, reverse=True)\n
    \n soup wrap:
    from collections import Counter
    print [item for items, c in Counter(a).most_common() for item in [items] * c]
    # [5, 5, 5, 5, 3, 3, 3, 4, 4, 4, 1, 1, 2]
    

    Or even better (efficient) implementation

    from collections import Counter
    from itertools import repeat, chain
    print list(chain.from_iterable(repeat(i, c) for i,c in Counter(a).most_common()))
    # [5, 5, 5, 5, 3, 3, 3, 4, 4, 4, 1, 1, 2]
    

    Or

    from collections import Counter
    print sorted(a, key=Counter(a).get, reverse=True)
    # [5, 5, 5, 5, 3, 3, 3, 4, 4, 4, 1, 1, 2]
    

    If you prefer in-place sort

    a.sort(key=Counter(a).get, reverse=True)
    
    qid & accept id: (23429968, 23431368) query: In Django, how could I in a single query get total row count based on distinct field values? soup:

    So I've been fiddling around with this, and I believe I have come up with a solution that produces the results I'm after. Given the model above, I'm using the following:

    \n
    Lead.objects.values('site', 'companies').annotate(Count('id'))\n
    \n

    This yields a list of dictionaries, one dictionary per unique site/company combination, each containing an id__count key that contains the total number of rows for that particular combination of site and company. And if one lead has multiple companies associated with it, a separate dictionary is produced for each.

    \n

    In our actual model, we have different types of leads as well and an additional field type. So if I wanted to take that into consideration as well, I would simply add a type field to the model and use the following:

    \n
    Lead.objects.values('site', 'companies', 'type').annotate(Count('id'))\n
    \n

    Using this, I would get one dictionary per site, per type, per company, each with its own count. Django is pretty smart!

    \n

    Anyway, basic question I'm sure, but I couldn't quite find anything that addressed it. Hope this helps someone.

    \n soup wrap:

    So I've been fiddling around with this, and I believe I have come up with a solution that produces the results I'm after. Given the model above, I'm using the following:

    Lead.objects.values('site', 'companies').annotate(Count('id'))
    

    This yields a list of dictionaries, one dictionary per unique site/company combination, each containing an id__count key that contains the total number of rows for that particular combination of site and company. And if one lead has multiple companies associated with it, a separate dictionary is produced for each.

    In our actual model, we have different types of leads as well and an additional field type. So if I wanted to take that into consideration as well, I would simply add a type field to the model and use the following:

    Lead.objects.values('site', 'companies', 'type').annotate(Count('id'))
    

    Using this, I would get one dictionary per site, per type, per company, each with its own count. Django is pretty smart!

    Anyway, basic question I'm sure, but I couldn't quite find anything that addressed it. Hope this helps someone.

    qid & accept id: (23441228, 23441749) query: Fasted Python way to bulk csv convert outside of using pandas soup:

    You can try numpy.savetxt; it seems to be (not quite) three times faster in this test; I have no idea how it scales. savetxt can add delimiters (see makebigfile in the example below).

    \n

    Some of your code is a bit unusual, like the str(files); I'd expect the filenames to be strings already?

    \n
    import numpy\nimport timeit\n\n\ndef makebigfile(outname):\n    data = numpy.random.standard_normal((100000, 7))\n    numpy.savetxt(outname, data, delimiter=",")\n\n\ndef csvdump(files, original=True):\n\n        date, time, opens, high, low, close, vol = numpy.genfromtxt(str(files)+'.csv', unpack=True, delimiter=',')\n        if original:\n            for line in high:\n                x=str(1/line)\n                outr=open(str(files)+"inverse-original.txt", "a")\n                outr.write(x)\n                outr.write('\n')\n        else:\n            numpy.savetxt(str(files)+"inverse-savetxt.txt",1/high)\n\n\n\nmakebigfile('foo.txt')\n\n\nprint timeit.timeit(stmt='__main__.csvdump("foo",True)',setup='import __main__',number=1000)\nprint timeit.timeit(stmt='__main__.csvdump("foo",False)',setup='import __main__',number=1000)\n
    \n

    On my system this gives:

    \n
    1.41840219498\n0.56161403656\n
    \n soup wrap:

    You can try numpy.savetxt; it seems to be (not quite) three times faster in this test; I have no idea how it scales. savetxt can add delimiters (see makebigfile in the example below).

    Some of your code is a bit unusual, like the str(files); I'd expect the filenames to be strings already?

    import numpy
    import timeit
    
    
    def makebigfile(outname):
        data = numpy.random.standard_normal((100000, 7))
        numpy.savetxt(outname, data, delimiter=",")
    
    
    def csvdump(files, original=True):
    
            date, time, opens, high, low, close, vol = numpy.genfromtxt(str(files)+'.csv', unpack=True, delimiter=',')
            if original:
                for line in high:
                    x=str(1/line)
                    outr=open(str(files)+"inverse-original.txt", "a")
                    outr.write(x)
                    outr.write('\n')
            else:
                numpy.savetxt(str(files)+"inverse-savetxt.txt",1/high)
    
    
    
    makebigfile('foo.txt')
    
    
    print timeit.timeit(stmt='__main__.csvdump("foo",True)',setup='import __main__',number=1000)
    print timeit.timeit(stmt='__main__.csvdump("foo",False)',setup='import __main__',number=1000)
    

    On my system this gives:

    1.41840219498
    0.56161403656
    
    qid & accept id: (23460155, 23461999) query: python how to create list of interchangeable values? soup:

    Like I said in the comments, a dictionary of tuples is what you're probably looking for. Example:

    \n
    data = {'C3': ('frequency', 261.6255653006), \n    261.6255653006: ('midinumber', 60), \n    60: ('midinote', 'C3'),\n}\n
    \n

    To validate your input you can do:

    \n
    input = raw_input()\ntry:\n    key = float(input)\nexcept ValueError:\n    key = input\n\ntry:\n    value = data[key]\nexcept KeyError:\n    print "Invalid input. Valid keys are: " + ', '.join(data.keys())\nelse:\n    #input was valid, so value == data[key]\n
    \n

    Tuples are indexed just like lists are. However, they are immutable which means you can't change them or append new items to them. And I believe that's desired in your case.

    \n

    Dictionaries are indexed by keys, for example data['C3'] returns ('frequency', 261.6255653006) and data['C3'][0] returns 'frequency'.

    \n soup wrap:

    Like I said in the comments, a dictionary of tuples is what you're probably looking for. Example:

    data = {'C3': ('frequency', 261.6255653006), 
        261.6255653006: ('midinumber', 60), 
        60: ('midinote', 'C3'),
    }
    

    To validate your input you can do:

    input = raw_input()
    try:
        key = float(input)
    except ValueError:
        key = input
    
    try:
        value = data[key]
    except KeyError:
        print "Invalid input. Valid keys are: " + ', '.join(data.keys())
    else:
        #input was valid, so value == data[key]
    

    Tuples are indexed just like lists are. However, they are immutable which means you can't change them or append new items to them. And I believe that's desired in your case.

    Dictionaries are indexed by keys, for example data['C3'] returns ('frequency', 261.6255653006) and data['C3'][0] returns 'frequency'.

    qid & accept id: (23555829, 23556901) query: Saving an Element in an Array Permanently soup:

    A simpler solution will be to use json

    \n
    import json\nli = []\ndef getinput(li):\n    li.append(raw_input("Type in a string: "))\n
    \n

    To save the list you would do the following

    \n
    savefile = file("backup.json", "w")\nsavefile.write(json.dumps(li))\n
    \n

    And to load the file you simply do

    \n
    savefile = open("backup.json")\nli = json.loads(savefile.read())\n
    \n

    You may want to handle the case where the file does not exist. One thing to note would be that complex structures like classes cannot be stored as json.

    \n soup wrap:

    A simpler solution will be to use json

    import json
    li = []
    def getinput(li):
        li.append(raw_input("Type in a string: "))
    

    To save the list you would do the following

    savefile = file("backup.json", "w")
    savefile.write(json.dumps(li))
    

    And to load the file you simply do

    savefile = open("backup.json")
    li = json.loads(savefile.read())
    

    You may want to handle the case where the file does not exist. One thing to note would be that complex structures like classes cannot be stored as json.

    qid & accept id: (23555995, 23567187) query: Can you do regex with concordance? soup:

    In short, nltk is not able to create a concordance from a regex in its present state. The difficulty of creating a concordance from nltk's ConcordanceIndex class (or a subclass thereof)--which is what you are using--is that the class accepts a list of tokens as an argument (and is built around those tokens) rather than a full text string.

    \n

    I guess my suggestion would be to create your own class, which accepts a string as an argument instead of tokens. Here is a class loosely based upon the nltk's ConcordanceIndex class that might function as a starting point:

    \n
    import re\n\n\nclass RegExConcordanceIndex(object):\n    "Class to mimic nltk's ConcordanceIndex.print_concordance."\n\n    def __init__(self, text):\n        self._text = text\n\n    def print_concordance(self, regex, width=80, lines=25, demarcation=''):\n        """\n        Prints n <= @lines contexts for @regex with a context <= @width".\n        Make @lines 0 to display all matches.\n        Designate @demarcation to enclose matches in demarcating characters.\n        """ \n        concordance = []\n        matches = re.finditer(regex, self._text, flags=re.M)\n        if matches:\n            for match in matches:\n                start, end = match.start(), match.end()\n                match_width = end - start\n                remaining = (width - match_width) // 2\n                if start - remaining > 0:\n                    context_start = self._text[start - remaining:start]\n                    #  cut the string short if it contains a newline character\n                    context_start = context_start.split('\n')[-1]\n                else:\n                    context_start = self._text[0:start + 1].split('\n')[-1]\n                context_end = self._text[end:end + remaining].split('\n')[0]\n                concordance.append(context_start + demarcation + self._text\n                                   [start:end] + demarcation + context_end)\n                if lines and len(concordance) >= lines:\n                    break\n            print("Displaying %s matches:" % (len(concordance)))\n            print '\n'.join(concordance)\n        else:\n            print "No matches"\n
    \n

    Now you can test the class like this:

    \n
    >>> from nltk.corpus import gutenberg\n>>> emma = gutenberg.raw(fileids='austen-emma.txt')\n>>> comma_separated = RegExConcordanceIndex(emma)\n>>> comma_separated.print_concordance(r"(?<=, )[A-Za-z]+(?=,)", demarcation='**')  # matches are enclosed in double asterisks\n\nDisplaying 25 matches:\nEmma Woodhouse, **handsome**, clever, and rich, with a comfortab\nEmma Woodhouse, handsome, **clever**, and rich, with a comfortable home\nThe real evils, **indeed**, of Emma's situation were the power \no her many enjoyments.  The danger, **however**, was at present\nwell-informed, **useful**, gentle, knowing all the ways of the\nwell-informed, useful, **gentle**, knowing all the ways of the family,\na good-humoured, **pleasant**, excellent man, that he thoroughly \n"No, **papa**, nobody thought of your walking.  We \n"I believe it is very true, my dear, **indeed**," said Mr. Woodhouse,\nshould not like her so well as we do, **sir**,\ne none for myself, papa; but I must, **indeed**,\nmet with him in Broadway Lane, **when**, because it began to drizzle,\nlike Mr. Elton, **papa**,--I must look about for a wife for hi\n"With a great deal of pleasure, **sir**, at any time," said Mr. Knightley,\nbetter thing.  Invite him to dinner, **Emma**, and help him to the best\ny.  He had received a good education, **but**,\nMiss Churchill, **however**, being of age, and with the full co\nFrom the expense of the child, **however**, he was soon relieved.\nIt was most unlikely, **therefore**, that he should ever want his\n strong enough to affect one so dear, **and**, as he believed,\nIt was, **indeed**, a highly prized letter.  Mrs. Westo\nand he had, **therefore**, earnestly tried to dissuade them \nFortunately for him, **Highbury**, including Randalls in the same par\nhandsome, **rich**, nor married.  Miss Bates stood in th\na real, **honest**, old-fashioned Boarding-school, wher\n
    \n soup wrap:

    In short, nltk is not able to create a concordance from a regex in its present state. The difficulty of creating a concordance from nltk's ConcordanceIndex class (or a subclass thereof)--which is what you are using--is that the class accepts a list of tokens as an argument (and is built around those tokens) rather than a full text string.

    I guess my suggestion would be to create your own class, which accepts a string as an argument instead of tokens. Here is a class loosely based upon the nltk's ConcordanceIndex class that might function as a starting point:

    import re
    
    
    class RegExConcordanceIndex(object):
        "Class to mimic nltk's ConcordanceIndex.print_concordance."
    
        def __init__(self, text):
            self._text = text
    
        def print_concordance(self, regex, width=80, lines=25, demarcation=''):
            """
            Prints n <= @lines contexts for @regex with a context <= @width".
            Make @lines 0 to display all matches.
            Designate @demarcation to enclose matches in demarcating characters.
            """ 
            concordance = []
            matches = re.finditer(regex, self._text, flags=re.M)
            if matches:
                for match in matches:
                    start, end = match.start(), match.end()
                    match_width = end - start
                    remaining = (width - match_width) // 2
                    if start - remaining > 0:
                        context_start = self._text[start - remaining:start]
                        #  cut the string short if it contains a newline character
                        context_start = context_start.split('\n')[-1]
                    else:
                        context_start = self._text[0:start + 1].split('\n')[-1]
                    context_end = self._text[end:end + remaining].split('\n')[0]
                    concordance.append(context_start + demarcation + self._text
                                       [start:end] + demarcation + context_end)
                    if lines and len(concordance) >= lines:
                        break
                print("Displaying %s matches:" % (len(concordance)))
                print '\n'.join(concordance)
            else:
                print "No matches"
    

    Now you can test the class like this:

    >>> from nltk.corpus import gutenberg
    >>> emma = gutenberg.raw(fileids='austen-emma.txt')
    >>> comma_separated = RegExConcordanceIndex(emma)
    >>> comma_separated.print_concordance(r"(?<=, )[A-Za-z]+(?=,)", demarcation='**')  # matches are enclosed in double asterisks
    
    Displaying 25 matches:
    Emma Woodhouse, **handsome**, clever, and rich, with a comfortab
    Emma Woodhouse, handsome, **clever**, and rich, with a comfortable home
    The real evils, **indeed**, of Emma's situation were the power 
    o her many enjoyments.  The danger, **however**, was at present
    well-informed, **useful**, gentle, knowing all the ways of the
    well-informed, useful, **gentle**, knowing all the ways of the family,
    a good-humoured, **pleasant**, excellent man, that he thoroughly 
    "No, **papa**, nobody thought of your walking.  We 
    "I believe it is very true, my dear, **indeed**," said Mr. Woodhouse,
    should not like her so well as we do, **sir**,
    e none for myself, papa; but I must, **indeed**,
    met with him in Broadway Lane, **when**, because it began to drizzle,
    like Mr. Elton, **papa**,--I must look about for a wife for hi
    "With a great deal of pleasure, **sir**, at any time," said Mr. Knightley,
    better thing.  Invite him to dinner, **Emma**, and help him to the best
    y.  He had received a good education, **but**,
    Miss Churchill, **however**, being of age, and with the full co
    From the expense of the child, **however**, he was soon relieved.
    It was most unlikely, **therefore**, that he should ever want his
     strong enough to affect one so dear, **and**, as he believed,
    It was, **indeed**, a highly prized letter.  Mrs. Westo
    and he had, **therefore**, earnestly tried to dissuade them 
    Fortunately for him, **Highbury**, including Randalls in the same par
    handsome, **rich**, nor married.  Miss Bates stood in th
    a real, **honest**, old-fashioned Boarding-school, wher
    
    qid & accept id: (23575195, 23575388) query: Python and the modulus operator with very large numbers soup:

    I believe your problem is that the return is in the wrong place. Your code currently only loops through the 2, and any odd number is obviously not divisible by 2. So, it goes into the if n%j==0, returns True, and since a return breaks out of the loop, stops going. So, any odd number will return True.

    \n

    Instead, try:

    \n
    def isprime(n):\n    if n % 1 != 0:\n        return True\n    else:\n        for j in range(2, math.ceil(math.sqrt(n))):\n            if n % j != 0:\n                return False\n        return True\n
    \n

    I think that works.\nEDIT: No, it actually doesn't. Here, I'll post a different prime checker:

    \n
    def isprime(n):\n    '''check if integer n is a prime'''\n    n = abs(int(n))\n    if n < 2:\n        return False\n    if n == 2: \n        return True    \n    if not n & 1: \n        return False\n    for x in range(3, int(n**0.5)+1, 2):\n        if n % x == 0:\n            return False\n    return True\n
    \n soup wrap:

    I believe your problem is that the return is in the wrong place. Your code currently only loops through the 2, and any odd number is obviously not divisible by 2. So, it goes into the if n%j==0, returns True, and since a return breaks out of the loop, stops going. So, any odd number will return True.

    Instead, try:

    def isprime(n):
        if n % 1 != 0:
            return True
        else:
            for j in range(2, math.ceil(math.sqrt(n))):
                if n % j != 0:
                    return False
            return True
    

    I think that works. EDIT: No, it actually doesn't. Here, I'll post a different prime checker:

    def isprime(n):
        '''check if integer n is a prime'''
        n = abs(int(n))
        if n < 2:
            return False
        if n == 2: 
            return True    
        if not n & 1: 
            return False
        for x in range(3, int(n**0.5)+1, 2):
            if n % x == 0:
                return False
        return True
    
    qid & accept id: (23576318, 23576327) query: Python - sum variables from a text file soup:

    Use sum():

    \n
    with open("numberGood.txt") as f:\n    print(sum(float(line) for line in f))\n
    \n

    Demo:

    \n
    $ cat numberGood.txt \n10.01\n19.99\n30.0\n40\n$ python3\n>>> with open("numberGood.txt") as f:\n...     print(sum(float(line) for line in f))\n... \n100.0\n
    \n soup wrap:

    Use sum():

    with open("numberGood.txt") as f:
        print(sum(float(line) for line in f))
    

    Demo:

    $ cat numberGood.txt 
    10.01
    19.99
    30.0
    40
    $ python3
    >>> with open("numberGood.txt") as f:
    ...     print(sum(float(line) for line in f))
    ... 
    100.0
    
    qid & accept id: (23587296, 23590973) query: Recognising objects in images using HAAR cascade and OpenCV soup:

    i guess, you won't get good results from haar (or hog) cascade classifiers here.

    \n
      \n
    • your 'needle' does not have enough features/corners (it's just 2 crosses and a line)
    • \n
    • cascade classifiers are quite sensitive to rotation. it seems your object can take any arbitrary rotation here.
    • \n
    • if you train a classifier with many different rotations, it will just overfit.
    • \n
    • if you train many classifiers(one per rotation), - the same. ;(
    • \n
    \n

    so, imho, not much hope for that approach.

    \n

    i would go for contours/shapeMatching instead:

    \n
    void findNeedles( const std::vector & needle_contour, const cv::Mat & haystack_binarized)\n{\n    int nfound = 0;\n    std::vector> contours;\n    cv::findContours(haystack_binarized, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_NONE);\n    for (size_t i = 0; i < contours.size(); i++)\n    {\n        // pre-filter for size:\n        if ( ( contours[i].size() < needle_contour.size()/2 )\n          || ( contours[i].size() > needle_contour.size()*2 ) )\n          continue;\n\n        double d = cv::matchShapes(contours[i],needle_contour,CV_CONTOURS_MATCH_I2,0);\n        if ( d < 8.4 ) // heuristic value, experiments needed !!\n        {\n            cv::drawContours(haystack_binarized, contours, i, 128, 3);\n            nfound ++;\n        }\n    }\n    cerr << nfound << " objects found" << endl;\n    cv::imshow("haystack",haystack_binarized);\n    //imwrite("hay.png",haystack_binarized);\n    cv::waitKey();\n}\n\n\nint main()\n{\n    // 1. get the contour of our needle:\n    Mat needle = imread("needle.png",0);\n    Mat needle_b; \n    threshold(needle,needle_b,120,255,1); \n    imshow("needle",needle_b);\n\n    std::vector> needle_conts;\n    cv::findContours(needle_b, needle_conts, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_NONE);\n    if ( needle_conts.size() == 0 )\n    {\n        std::cout << " no contour Found" << std::endl;\n        return -1;\n    }\n    std::vector needle_contour = needle_conts[0];\n\n    // 2. check a positive sample:\n    Mat haypos = imread("hay_pos.png",0);\n    Mat haypos_b; \n    threshold(haypos,haypos_b,120,255,1);\n    findNeedles(needle_contour, haypos_b);\n\n    // 3. check a negative sample:\n    Mat hayneg = imread("hay_neg.png",0);\n    Mat hayneg_b; \n    threshold(hayneg,hayneg_b,120,255,1);\n    findNeedles(needle_contour, hayneg_b);\n\n    return 0;\n}\n
    \n

    --------------

    \n
    > haystack.exe\n5 objects found\n0 objects found\n
    \n

    enter image description here

    \n soup wrap:

    i guess, you won't get good results from haar (or hog) cascade classifiers here.

    • your 'needle' does not have enough features/corners (it's just 2 crosses and a line)
    • cascade classifiers are quite sensitive to rotation. it seems your object can take any arbitrary rotation here.
    • if you train a classifier with many different rotations, it will just overfit.
    • if you train many classifiers(one per rotation), - the same. ;(

    so, imho, not much hope for that approach.

    i would go for contours/shapeMatching instead:

    void findNeedles( const std::vector & needle_contour, const cv::Mat & haystack_binarized)
    {
        int nfound = 0;
        std::vector> contours;
        cv::findContours(haystack_binarized, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_NONE);
        for (size_t i = 0; i < contours.size(); i++)
        {
            // pre-filter for size:
            if ( ( contours[i].size() < needle_contour.size()/2 )
              || ( contours[i].size() > needle_contour.size()*2 ) )
              continue;
    
            double d = cv::matchShapes(contours[i],needle_contour,CV_CONTOURS_MATCH_I2,0);
            if ( d < 8.4 ) // heuristic value, experiments needed !!
            {
                cv::drawContours(haystack_binarized, contours, i, 128, 3);
                nfound ++;
            }
        }
        cerr << nfound << " objects found" << endl;
        cv::imshow("haystack",haystack_binarized);
        //imwrite("hay.png",haystack_binarized);
        cv::waitKey();
    }
    
    
    int main()
    {
        // 1. get the contour of our needle:
        Mat needle = imread("needle.png",0);
        Mat needle_b; 
        threshold(needle,needle_b,120,255,1); 
        imshow("needle",needle_b);
    
        std::vector> needle_conts;
        cv::findContours(needle_b, needle_conts, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_NONE);
        if ( needle_conts.size() == 0 )
        {
            std::cout << " no contour Found" << std::endl;
            return -1;
        }
        std::vector needle_contour = needle_conts[0];
    
        // 2. check a positive sample:
        Mat haypos = imread("hay_pos.png",0);
        Mat haypos_b; 
        threshold(haypos,haypos_b,120,255,1);
        findNeedles(needle_contour, haypos_b);
    
        // 3. check a negative sample:
        Mat hayneg = imread("hay_neg.png",0);
        Mat hayneg_b; 
        threshold(hayneg,hayneg_b,120,255,1);
        findNeedles(needle_contour, hayneg_b);
    
        return 0;
    }
    

    --------------

    > haystack.exe
    5 objects found
    0 objects found
    

    enter image description here

    qid & accept id: (23613138, 23637836) query: How do I get python to search a csv file for items in a dictionary then print out the entire excel row...Thanks soup:

    I'm not quite sure what you're asking, but if you want to print each row that contains any of the numbers in affiliate_phone_dict, this will do:

    \n
    lookup = {'name1': 'xxx-xxx-xxxx',\n          'name2': 'yyy-yyy-yyyy'}\n\nwith open('data.csv') as data_file, open('out.csv', 'w') as out_file:\n    for row in data_file:\n        if any(num in row for num in lookup.values()):\n            out_file.write(row)\n
    \n

    data.csv

    \n
    Date Time Length Cost Bill Category Destination Number Destination City Origin Number OriginCity\n01/01/0001  10:37   3   $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  xxx-xxx-xxxx    City Name   aaa-aaa-aaaa    City Name   Mobile\n01/01/0001  10:37   10  $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  yyy-yyy-yyyy    City Name   zzz-zzz-zzzz    City Name   Mobile\n01/01/0001  10:37   10  $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  123-456-7890    City Name   zzz-zzz-zzzz    City Name   Mobile\n
    \n

    out.csv

    \n
    01/01/0001  10:37   3   $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  xxx-xxx-xxxx    City Name   aaa-aaa-aaaa    City Name   Mobile\n01/01/0001  10:37   10  $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  yyy-yyy-yyyy    City Name   zzz-zzz-zzzz    City Name   Mobile\n
    \n soup wrap:

    I'm not quite sure what you're asking, but if you want to print each row that contains any of the numbers in affiliate_phone_dict, this will do:

    lookup = {'name1': 'xxx-xxx-xxxx',
              'name2': 'yyy-yyy-yyyy'}
    
    with open('data.csv') as data_file, open('out.csv', 'w') as out_file:
        for row in data_file:
            if any(num in row for num in lookup.values()):
                out_file.write(row)
    

    data.csv

    Date Time Length Cost Bill Category Destination Number Destination City Origin Number OriginCity
    01/01/0001  10:37   3   $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  xxx-xxx-xxxx    City Name   aaa-aaa-aaaa    City Name   Mobile
    01/01/0001  10:37   10  $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  yyy-yyy-yyyy    City Name   zzz-zzz-zzzz    City Name   Mobile
    01/01/0001  10:37   10  $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  123-456-7890    City Name   zzz-zzz-zzzz    City Name   Mobile
    

    out.csv

    01/01/0001  10:37   3   $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  xxx-xxx-xxxx    City Name   aaa-aaa-aaaa    City Name   Mobile
    01/01/0001  10:37   10  $0.00   LOCAL AIRTIME, LONG DISTANCE and INTERNATIONAL CHARGES  yyy-yyy-yyyy    City Name   zzz-zzz-zzzz    City Name   Mobile
    
    qid & accept id: (23614259, 23636531) query: Identifying price swings/trends in pandas dataframe with stock quotes soup:

    It's a bit tricky since you cannot mark a point as pivot until you find the next potential pivot (ie if you are in an upward trend, you can't say it's done until you find a low sufficiently low).

    \n

    This code does the trick - I've put your data in the tmpData.txt file for convenience, and get the desired result. Please check

    \n
    def get_pivots():\n    data = pd.DataFrame.from_csv('tmpData.txt')\n    data['swings'] = np.nan\n\n    pivot = data.irow(0).open\n    last_pivot_id = 0\n    up_down = 0\n\n    diff = .3\n\n    for i in range(0, len(data)):\n        row = data.irow(i)\n\n        # We don't have a trend yet\n        if up_down == 0:\n            if row.low < pivot - diff:\n                data.ix[i, 'swings'] = row.low - pivot\n                pivot, last_pivot_id = row.low, i\n                up_down = -1\n            elif row.high > pivot + diff:\n                data.ix[i, 'swings'] = row.high - pivot\n                pivot, last_pivot_id = row.high, i\n                up_down = 1\n\n        # Current trend is up\n        elif up_down == 1:\n            # If got higher than last pivot, update the swing\n            if row.high > pivot:\n                # Remove the last pivot, as it wasn't a real one\n                data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.high - data.ix[last_pivot_id, 'high'])\n                data.ix[last_pivot_id, 'swings'] = np.nan\n                pivot, last_pivot_id = row.high, i\n            elif row.low < pivot - diff:\n                data.ix[i, 'swings'] = row.low - pivot\n                pivot, last_pivot_id = row.low, i\n                # Change the trend indicator\n                up_down = -1\n\n        # Current trend is down\n        elif up_down == -1:\n             # If got lower than last pivot, update the swing\n            if row.low < pivot:\n                # Remove the last pivot, as it wasn't a real one\n                data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.low - data.ix[last_pivot_id, 'low'])\n                data.ix[last_pivot_id, 'swings'] = np.nan\n                pivot, last_pivot_id = row.low, i\n            elif row.high > pivot - diff:\n                data.ix[i, 'swings'] = row.high - pivot\n                pivot, last_pivot_id = row.high, i\n                # Change the trend indicator\n                up_down = 1\n\n    print data\n
    \n

    Output:

    \n
    date                  close  high    low     open    volume    swings                                            \n2014-05-09 13:30:00  187.56  187.73  187.54  187.70  1922600     NaN\n2014-05-09 13:31:00  187.49  187.56  187.42  187.55   534400     NaN\n2014-05-09 13:32:00  187.42  187.51  187.35  187.49   224800   -0.35\n2014-05-09 13:33:00  187.55  187.58  187.39  187.40   303700     NaN\n2014-05-09 13:34:00  187.67  187.67  187.53  187.56   438200     NaN\n2014-05-09 13:35:00  187.60  187.71  187.56  187.68   296400    0.36\n2014-05-09 13:36:00  187.41  187.67  187.38  187.60   329900     NaN\n2014-05-09 13:37:00  187.31  187.44  187.28  187.40   404000     NaN\n2014-05-09 13:38:00  187.26  187.37  187.26  187.30   912800     NaN\n2014-05-09 13:39:00  187.22  187.28  187.12  187.25   607700   -0.59\n
    \n soup wrap:

    It's a bit tricky since you cannot mark a point as pivot until you find the next potential pivot (ie if you are in an upward trend, you can't say it's done until you find a low sufficiently low).

    This code does the trick - I've put your data in the tmpData.txt file for convenience, and get the desired result. Please check

    def get_pivots():
        data = pd.DataFrame.from_csv('tmpData.txt')
        data['swings'] = np.nan
    
        pivot = data.irow(0).open
        last_pivot_id = 0
        up_down = 0
    
        diff = .3
    
        for i in range(0, len(data)):
            row = data.irow(i)
    
            # We don't have a trend yet
            if up_down == 0:
                if row.low < pivot - diff:
                    data.ix[i, 'swings'] = row.low - pivot
                    pivot, last_pivot_id = row.low, i
                    up_down = -1
                elif row.high > pivot + diff:
                    data.ix[i, 'swings'] = row.high - pivot
                    pivot, last_pivot_id = row.high, i
                    up_down = 1
    
            # Current trend is up
            elif up_down == 1:
                # If got higher than last pivot, update the swing
                if row.high > pivot:
                    # Remove the last pivot, as it wasn't a real one
                    data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.high - data.ix[last_pivot_id, 'high'])
                    data.ix[last_pivot_id, 'swings'] = np.nan
                    pivot, last_pivot_id = row.high, i
                elif row.low < pivot - diff:
                    data.ix[i, 'swings'] = row.low - pivot
                    pivot, last_pivot_id = row.low, i
                    # Change the trend indicator
                    up_down = -1
    
            # Current trend is down
            elif up_down == -1:
                 # If got lower than last pivot, update the swing
                if row.low < pivot:
                    # Remove the last pivot, as it wasn't a real one
                    data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.low - data.ix[last_pivot_id, 'low'])
                    data.ix[last_pivot_id, 'swings'] = np.nan
                    pivot, last_pivot_id = row.low, i
                elif row.high > pivot - diff:
                    data.ix[i, 'swings'] = row.high - pivot
                    pivot, last_pivot_id = row.high, i
                    # Change the trend indicator
                    up_down = 1
    
        print data
    

    Output:

    date                  close  high    low     open    volume    swings                                            
    2014-05-09 13:30:00  187.56  187.73  187.54  187.70  1922600     NaN
    2014-05-09 13:31:00  187.49  187.56  187.42  187.55   534400     NaN
    2014-05-09 13:32:00  187.42  187.51  187.35  187.49   224800   -0.35
    2014-05-09 13:33:00  187.55  187.58  187.39  187.40   303700     NaN
    2014-05-09 13:34:00  187.67  187.67  187.53  187.56   438200     NaN
    2014-05-09 13:35:00  187.60  187.71  187.56  187.68   296400    0.36
    2014-05-09 13:36:00  187.41  187.67  187.38  187.60   329900     NaN
    2014-05-09 13:37:00  187.31  187.44  187.28  187.40   404000     NaN
    2014-05-09 13:38:00  187.26  187.37  187.26  187.30   912800     NaN
    2014-05-09 13:39:00  187.22  187.28  187.12  187.25   607700   -0.59
    
    qid & accept id: (23626026, 23626611) query: How to exit a supervisor process with fabric file? soup:

    This is rather about using supervisorctl than using fabric

    \n

    Avoid fab calls to commands, requiring user interaction

    \n

    Fabric does one-shot calls to commands and then returns. Shall be no long term activity on console. The solution for your problem is not to enter interactive mode (which awaits some further input), but call supervisor only in non-interactive mode.

    \n

    Calling supervisorctl in non-interactive mode

    \n

    Supervisor control command provides interactive and non-interactive mode.

    \n

    You non-interacive mode.

    \n

    E.g. in my installation, I have a service called logproxy

    \n

    Calling supervisorctl in this way:

    \n
    $ supervisorctl status logproxy\nlogproxy                         STOPPED    Not started\n
    \n

    Applying this to your fab task shall make it working.

    \n

    Following the sample code from "Welcome to Fabric!" it would look like:

    \n
    from fabric.api import run\n\ndef super_status():\n    uname = "zen"\n    pswd = "then"\n    cmd = "supervisorctl -u {uname} -p {pswd} status logproxy".format(uname=uname, pswd=pswd)\n    # to see the command you are going to call, just for show\n    print cmd\n    # and run it\n    run(cmd)\n
    \n

    And would be used.

    \n
    $ fab -l\n
    \n

    to list it.

    \n

    and calling the task super_status:

    \n
    $ fab super_status -H localhost\n
    \n soup wrap:

    This is rather about using supervisorctl than using fabric

    Avoid fab calls to commands, requiring user interaction

    Fabric does one-shot calls to commands and then returns. Shall be no long term activity on console. The solution for your problem is not to enter interactive mode (which awaits some further input), but call supervisor only in non-interactive mode.

    Calling supervisorctl in non-interactive mode

    Supervisor control command provides interactive and non-interactive mode.

    You non-interacive mode.

    E.g. in my installation, I have a service called logproxy

    Calling supervisorctl in this way:

    $ supervisorctl status logproxy
    logproxy                         STOPPED    Not started
    

    Applying this to your fab task shall make it working.

    Following the sample code from "Welcome to Fabric!" it would look like:

    from fabric.api import run
    
    def super_status():
        uname = "zen"
        pswd = "then"
        cmd = "supervisorctl -u {uname} -p {pswd} status logproxy".format(uname=uname, pswd=pswd)
        # to see the command you are going to call, just for show
        print cmd
        # and run it
        run(cmd)
    

    And would be used.

    $ fab -l
    

    to list it.

    and calling the task super_status:

    $ fab super_status -H localhost
    
    qid & accept id: (23635576, 23636398) query: Send data from c program to python program using pipe? soup:

    Consider:

    \n

    fifo.c

    \n
    #include \n#include \n#include \n#include \n\nint main (void)\n{\n    // Array to send\n    int arr[] = {2,4,6,8};\n    int len = 4;\n\n    // Create FIFO\n    char filename[] = "fifo.tmp";\n\n    int s_fifo = mkfifo(filename, S_IRWXU);\n    if (s_fifo != 0)\n    {\n        printf("mkfifo() error: %d\n", s_fifo);\n        return -1;\n    }\n\n    FILE * wfd = fopen(filename, "w");\n    if (wfd < 0)\n    {\n        printf("open() error: %d\n", wfd);\n        return -1;\n    }\n\n    // Write to FIFO\n    for (int i=0; i
    \n

    fifo.py

    \n
    filename = "fifo.tmp"\n\n# Block until writer finishes...\nwith open(filename, 'r') as f:\n    data = f.read()\n\n# Split data into an array\narray = [int(x) for x in data.split()]\n\nprint array\n
    \n

    You'd first run the writer (c), which would block until the reader (python) opened and read the data. Then run the reader and both processes will terminate.

    \n
    \n$ python fifo.py\n[2, 4, 6, 8]\n
    \n

    Notes:

    \n
      \n
    • Some better error handling will probably be beneficial (eg. if the named fifo exists because the c program didn't exit cleanly).
    • \n
    • This is kind of inefficient, because you're converting the integer values to their string representation and sending that. I used it because the space delimiter is easy to work with, but you may consider sending the integer values themselves, and using a fixed-width parsing on the reader side.
    • \n
    \n soup wrap:

    Consider:

    fifo.c

    #include 
    #include 
    #include 
    #include 
    
    int main (void)
    {
        // Array to send
        int arr[] = {2,4,6,8};
        int len = 4;
    
        // Create FIFO
        char filename[] = "fifo.tmp";
    
        int s_fifo = mkfifo(filename, S_IRWXU);
        if (s_fifo != 0)
        {
            printf("mkfifo() error: %d\n", s_fifo);
            return -1;
        }
    
        FILE * wfd = fopen(filename, "w");
        if (wfd < 0)
        {
            printf("open() error: %d\n", wfd);
            return -1;
        }
    
        // Write to FIFO
        for (int i=0; i

    fifo.py

    filename = "fifo.tmp"
    
    # Block until writer finishes...
    with open(filename, 'r') as f:
        data = f.read()
    
    # Split data into an array
    array = [int(x) for x in data.split()]
    
    print array
    

    You'd first run the writer (c), which would block until the reader (python) opened and read the data. Then run the reader and both processes will terminate.

    $ python fifo.py
    [2, 4, 6, 8]
    

    Notes:

    • Some better error handling will probably be beneficial (eg. if the named fifo exists because the c program didn't exit cleanly).
    • This is kind of inefficient, because you're converting the integer values to their string representation and sending that. I used it because the space delimiter is easy to work with, but you may consider sending the integer values themselves, and using a fixed-width parsing on the reader side.
    qid & accept id: (23642406, 23643433) query: Sum grouped Pandas dataframe by single column soup:

    Well, you can include SampleMeta as part of the groupby:

    \n
    print test.groupby(['GroupID','Sample','SampleMeta']).sum()\n\n                           Value\nGroupID Sample SampleMeta       \n1       S1     S1_meta         2\n2       S2     S2_meta         1\n
    \n

    If you don't want SampleMeta as part of the index when done you could modify it as follows:

    \n
    print test.groupby(['GroupID','Sample','SampleMeta']).sum().reset_index(level=2)\n\n               SampleMeta  Value\nGroupID Sample                  \n1       S1        S1_meta      2\n2       S2        S2_meta      1\n
    \n

    This will only work right if there is no variation within SampleMeta for ['GroupID','Sample']. Of course, If there was variation within ['GroupID','Sample'] then you probably to exclude SampleMeta from the groupby/sum entirely:

    \n
    print test.groupby(['GroupID','Sample'])['Value'].sum()\n\nGroupID  Sample\n1        S1        2\n2        S2        1\n
    \n soup wrap:

    Well, you can include SampleMeta as part of the groupby:

    print test.groupby(['GroupID','Sample','SampleMeta']).sum()
    
                               Value
    GroupID Sample SampleMeta       
    1       S1     S1_meta         2
    2       S2     S2_meta         1
    

    If you don't want SampleMeta as part of the index when done you could modify it as follows:

    print test.groupby(['GroupID','Sample','SampleMeta']).sum().reset_index(level=2)
    
                   SampleMeta  Value
    GroupID Sample                  
    1       S1        S1_meta      2
    2       S2        S2_meta      1
    

    This will only work right if there is no variation within SampleMeta for ['GroupID','Sample']. Of course, If there was variation within ['GroupID','Sample'] then you probably to exclude SampleMeta from the groupby/sum entirely:

    print test.groupby(['GroupID','Sample'])['Value'].sum()
    
    GroupID  Sample
    1        S1        2
    2        S2        1
    
    qid & accept id: (23648826, 23656985) query: How to print available tags while using Robot Framework soup:

    There is nothing provided by robot to give you this information. However, it's pretty easy to write a python script that uses the robot parser to get all of the tag information. Here's a quick hack that I think is correct (though I only tested it very briefly):

    \n
    from robot.parsing import TestData\nimport sys\n\ndef main(path):\n    suite = TestData(parent=None, source=path)\n    tags = get_tags(suite)\n    print ", ".join(sorted(set(tags)))\n\ndef get_tags(suite):\n    tags = []\n\n    if suite.setting_table.force_tags:\n        tags.extend(suite.setting_table.force_tags.value)\n\n    if suite.setting_table.default_tags:\n        tags.extend(suite.setting_table.default_tags.value)\n\n    for testcase in suite.testcase_table.tests:\n        if testcase.tags:\n            tags.extend(testcase.tags.value)\n\n    for child_suite in suite.children:\n        tags.extend(get_tags(child_suite))\n\n    return tags\n\nif __name__ == "__main__":\n    main(sys.argv[1])\n
    \n

    Note that this will not get any tags created by the Set Tags keyword, nor does it take into account tags removed by Remove Tags.

    \n

    Save the code to a file, eg get_tags.py, and run it like this:

    \n
    $ python /tmp/get_tags.py /tmp/tests/\na tag, another force tag, another tag, default tag, force tag, tag-1, tag-2\n
    \n soup wrap:

    There is nothing provided by robot to give you this information. However, it's pretty easy to write a python script that uses the robot parser to get all of the tag information. Here's a quick hack that I think is correct (though I only tested it very briefly):

    from robot.parsing import TestData
    import sys
    
    def main(path):
        suite = TestData(parent=None, source=path)
        tags = get_tags(suite)
        print ", ".join(sorted(set(tags)))
    
    def get_tags(suite):
        tags = []
    
        if suite.setting_table.force_tags:
            tags.extend(suite.setting_table.force_tags.value)
    
        if suite.setting_table.default_tags:
            tags.extend(suite.setting_table.default_tags.value)
    
        for testcase in suite.testcase_table.tests:
            if testcase.tags:
                tags.extend(testcase.tags.value)
    
        for child_suite in suite.children:
            tags.extend(get_tags(child_suite))
    
        return tags
    
    if __name__ == "__main__":
        main(sys.argv[1])
    

    Note that this will not get any tags created by the Set Tags keyword, nor does it take into account tags removed by Remove Tags.

    Save the code to a file, eg get_tags.py, and run it like this:

    $ python /tmp/get_tags.py /tmp/tests/
    a tag, another force tag, another tag, default tag, force tag, tag-1, tag-2
    
    qid & accept id: (23669024, 23669051) query: How to strip a specific word from a string? soup:

    Use str.replace.

    \n
    >>> papa.replace('papa', '')\n' is a good man'\n>>> app.replace('papa', '')\n'app is important'\n
    \n

    Alternatively use re and use regular expressions. This will allow the removal of leading/trailing spaces.

    \n
    >>> import re\n>>> papa = 'papa is a good man'\n>>> app = 'app is important'\n>>> papa3 = 'papa is a papa, and papa'\n>>>\n>>> patt = re.compile('(\s*)papa(\s*)')\n>>> patt.sub('\\1mama\\2', papa)\n'mama is a good man'\n>>> patt.sub('\\1mama\\2', papa3)\n'mama is a mama, and mama'\n>>> patt.sub('', papa3)\n'is a, and'\n
    \n soup wrap:

    Use str.replace.

    >>> papa.replace('papa', '')
    ' is a good man'
    >>> app.replace('papa', '')
    'app is important'
    

    Alternatively use re and use regular expressions. This will allow the removal of leading/trailing spaces.

    >>> import re
    >>> papa = 'papa is a good man'
    >>> app = 'app is important'
    >>> papa3 = 'papa is a papa, and papa'
    >>>
    >>> patt = re.compile('(\s*)papa(\s*)')
    >>> patt.sub('\\1mama\\2', papa)
    'mama is a good man'
    >>> patt.sub('\\1mama\\2', papa3)
    'mama is a mama, and mama'
    >>> patt.sub('', papa3)
    'is a, and'
    
    qid & accept id: (23677258, 23686864) query: numpy multidimensional indexing and diagonal symmetries soup:

    You are on the right path. Using np.tril_indices you can indeed smartly index these lower triangles. What remains to be improved is the actual indexing/slicing of the data.

    \n

    Please try this (copy and pasteable):

    \n
    import numpy as np\nshape = (3, 10, 10, 19, 75, 10, 10)\np = np.arange(np.prod(shape)).reshape(shape)  # this is not symmetric, but not important\n\nix, iy = np.tril_indices(10)\n# In order to index properly, we need to add axes. This can be done by hand or with this\nix1, ix2 = np.ix_(ix, ix)\niy1, iy2 = np.ix_(iy, iy)\n\np_ltriag = p[:, ix1, iy1, :, :, ix2, iy2]\nprint p_ltriag.shape  # yields (55, 55, 3, 19, 75), axis order can be changed if needed\n\nq = np.zeros_like(p)\nq[:, ix1, iy1, :, :, ix2, iy2] = p_ltriag  # fills the lower triangles on both sides\nq[:, ix1, iy1, :, :, iy2, ix2] = p_ltriag  # fills the lower on left, upper on right\nq[:, iy1, ix1, :, :, ix2, iy2] = p_ltriag  # fills the upper on left, lower on right\nq[:, iy1, ix1, :, :, iy2, ix2] = p_ltriag  # fills the upper triangles on both sides\n
    \n

    The array q now contains a symmetrized version of p (where the upper triangles were replaced with the content of the lower triangles). Note that the last line contains iy and ix indices in inversed order, essentially creating a transpose of the lower triangular matrix.

    \n

    Comparison on lower triangles\nIn order to compare back, we set all the upper triangles to 0

    \n
    ux, uy = np.triu_indices(10)\np[:, ux, uy] = 0\nq[:, ux, uy] = 0\np[:, :, :, :, :, ux, uy] = 0\nq[:, :, :, :, :, ux, uy] = 0\n\nprint ((p - q) ** 2).sum()  # euclidean distance is 0, so p and q are equal\n\nprint ((p ** 2).sum(), (q ** 2).sum())  # prove that not all entries are 0 ;) - This has a negative result due to an overflow\n
    \n soup wrap:

    You are on the right path. Using np.tril_indices you can indeed smartly index these lower triangles. What remains to be improved is the actual indexing/slicing of the data.

    Please try this (copy and pasteable):

    import numpy as np
    shape = (3, 10, 10, 19, 75, 10, 10)
    p = np.arange(np.prod(shape)).reshape(shape)  # this is not symmetric, but not important
    
    ix, iy = np.tril_indices(10)
    # In order to index properly, we need to add axes. This can be done by hand or with this
    ix1, ix2 = np.ix_(ix, ix)
    iy1, iy2 = np.ix_(iy, iy)
    
    p_ltriag = p[:, ix1, iy1, :, :, ix2, iy2]
    print p_ltriag.shape  # yields (55, 55, 3, 19, 75), axis order can be changed if needed
    
    q = np.zeros_like(p)
    q[:, ix1, iy1, :, :, ix2, iy2] = p_ltriag  # fills the lower triangles on both sides
    q[:, ix1, iy1, :, :, iy2, ix2] = p_ltriag  # fills the lower on left, upper on right
    q[:, iy1, ix1, :, :, ix2, iy2] = p_ltriag  # fills the upper on left, lower on right
    q[:, iy1, ix1, :, :, iy2, ix2] = p_ltriag  # fills the upper triangles on both sides
    

    The array q now contains a symmetrized version of p (where the upper triangles were replaced with the content of the lower triangles). Note that the last line contains iy and ix indices in inversed order, essentially creating a transpose of the lower triangular matrix.

    Comparison on lower triangles In order to compare back, we set all the upper triangles to 0

    ux, uy = np.triu_indices(10)
    p[:, ux, uy] = 0
    q[:, ux, uy] = 0
    p[:, :, :, :, :, ux, uy] = 0
    q[:, :, :, :, :, ux, uy] = 0
    
    print ((p - q) ** 2).sum()  # euclidean distance is 0, so p and q are equal
    
    print ((p ** 2).sum(), (q ** 2).sum())  # prove that not all entries are 0 ;) - This has a negative result due to an overflow
    
    qid & accept id: (23699378, 23728306) query: Get permutation with specified degree by index number soup:

    This answer is less elegant/efficient than my other one, but it describes a polynomial-time algorithm that copes with the additional constraints on the ordering of permutations. I'm going to describe a subroutine that, given a prefix of an n-element permutation and a set of degrees, counts how many permutations have that prefix and a degree belonging to the set. Given this subroutine, we can do an n-ary search for the permutation of a specified rank in the specified subset, extending the known prefix one element at a time.

    \n

    We can visualize an n-element permutation p as an n-vertex, n-arc directed graph where, for each vertex v, there is an arc from v to p(v). This digraph consists of a collection of vertex-disjoint cycles. For example, the permutation 31024 looks like

    \n
     _______\n/       \\n\->2->0->3\n __     __\n/  |   /  |\n1<-/   4<-/ .\n
    \n

    Given a prefix of a permutation, we can visualize the subgraph corresponding to that prefix, which will be a collection of vertex-disjoint paths and cycles. For example, the prefix 310 looks like

    \n
    2->0->3\n __\n/  |\n1<-/ .\n
    \n

    I'm going to describe a bijection between (1) extensions of this prefix that are permutations and (2) complete permutations on a related set of elements. This bijection preserves up to a constant term the number of cycles (which is the number of elements minus the degree). The constant term is the number of cycles in the prefix.

    \n

    The permutations mentioned in (2) are on the following set of elements. Start with the original set, delete all elements involved in cycles that are complete in the prefix, and introduce a new element for each path. For example, if the prefix is 310, then we delete the complete cycle 1 and introduce a new element A for the path 2->0->3, resulting in the set {4, A}. Now, given a permutation in set (1), we obtain a permutation in set (2) by deleting the known cycles and replacing each path by its new element. For example, the permutation 31024 corresponds to the permutation 4->4, A->A, and the permutation 31042 corresponds to the permutation 4->A, A->4. I claim (1) that this map is a bijection and (2) that it preserves degrees as described before.

    \n

    The definition, more or less, of the (n,k)-th Stirling number of the first kind, written

    \n
    [n]\n[ ]\n[k]\n
    \n

    (ASCII art square brackets), is the number of n-element permutations of degree n - k. To compute the number of extensions of an r-element prefix of an n-element permutation, count c, the number of complete cycles in the prefix. Sum, for each degree d in the specified set, the Stirling number

    \n
    [  n - r  ]\n[         ]\n[n - d - c]\n
    \n

    of the first kind, taking the terms with "impossible" indices to be zero (some analytically motivated definitions of the Stirling numbers are nonzero in unexpected places).

    \n

    To get a rank from a permutation, we do n-ary search again, except this time, we use the permutation rather than the rank to guide the search.

    \n

    Here's some Python code for both (including a test function).

    \n
    import itertools\n\nmemostirling1 = {(0, 0): 1}\ndef stirling1(n, k):\n    ans = memostirling1.get((n, k))\n    if ans is None:\n        if not 1 <= k <= n: return 0\n        ans = (n - 1) * stirling1(n - 1, k) + stirling1(n - 1, k - 1)\n        memostirling1[(n, k)] = ans\n    return ans\n\ndef cyclecount(prefix):\n    c = 0\n    visited = [False] * len(prefix)\n    for (i, j) in enumerate(prefix):\n        while j < len(prefix) and not visited[j]:\n            visited[j] = True\n            if j == i:\n                c += 1\n                break\n            j = prefix[j]\n    return c\n\ndef extcount(n, dset, prefix):\n    c = cyclecount(prefix)\n    return sum(stirling1(n - len(prefix), n - d - c) for d in dset)\n\ndef unrank(n, dset, rnk):\n    assert rnk >= 0\n    choices = set(range(n))\n    prefix = []\n    while choices:\n        for i in sorted(choices):\n            prefix.append(i)\n            count = extcount(n, dset, prefix)\n            if rnk < count:\n                choices.remove(i)\n                break\n            del prefix[-1]\n            rnk -= count\n        else:\n            assert False\n    return tuple(prefix)\n\ndef rank(n, dset, perm):\n    assert n == len(perm)\n    rnk = 0\n    prefix = []\n    choices = set(range(n))\n    for j in perm:\n        choices.remove(j)\n        for i in sorted(choices):\n            if i < j:\n                prefix.append(i)\n                rnk += extcount(n, dset, prefix)\n                del prefix[-1]\n        prefix.append(j)\n    return rnk\n\ndef degree(perm):\n    return len(perm) - cyclecount(perm)\n\ndef test(n, dset):\n    for (rnk, perm) in enumerate(perm for perm in itertools.permutations(range(n)) if degree(perm) in dset):\n        assert unrank(n, dset, rnk) == perm\n        assert rank(n, dset, perm) == rnk\n\ntest(7, {2, 3, 5})\n
    \n soup wrap:

    This answer is less elegant/efficient than my other one, but it describes a polynomial-time algorithm that copes with the additional constraints on the ordering of permutations. I'm going to describe a subroutine that, given a prefix of an n-element permutation and a set of degrees, counts how many permutations have that prefix and a degree belonging to the set. Given this subroutine, we can do an n-ary search for the permutation of a specified rank in the specified subset, extending the known prefix one element at a time.

    We can visualize an n-element permutation p as an n-vertex, n-arc directed graph where, for each vertex v, there is an arc from v to p(v). This digraph consists of a collection of vertex-disjoint cycles. For example, the permutation 31024 looks like

     _______
    /       \
    \->2->0->3
     __     __
    /  |   /  |
    1<-/   4<-/ .
    

    Given a prefix of a permutation, we can visualize the subgraph corresponding to that prefix, which will be a collection of vertex-disjoint paths and cycles. For example, the prefix 310 looks like

    2->0->3
     __
    /  |
    1<-/ .
    

    I'm going to describe a bijection between (1) extensions of this prefix that are permutations and (2) complete permutations on a related set of elements. This bijection preserves up to a constant term the number of cycles (which is the number of elements minus the degree). The constant term is the number of cycles in the prefix.

    The permutations mentioned in (2) are on the following set of elements. Start with the original set, delete all elements involved in cycles that are complete in the prefix, and introduce a new element for each path. For example, if the prefix is 310, then we delete the complete cycle 1 and introduce a new element A for the path 2->0->3, resulting in the set {4, A}. Now, given a permutation in set (1), we obtain a permutation in set (2) by deleting the known cycles and replacing each path by its new element. For example, the permutation 31024 corresponds to the permutation 4->4, A->A, and the permutation 31042 corresponds to the permutation 4->A, A->4. I claim (1) that this map is a bijection and (2) that it preserves degrees as described before.

    The definition, more or less, of the (n,k)-th Stirling number of the first kind, written

    [n]
    [ ]
    [k]
    

    (ASCII art square brackets), is the number of n-element permutations of degree n - k. To compute the number of extensions of an r-element prefix of an n-element permutation, count c, the number of complete cycles in the prefix. Sum, for each degree d in the specified set, the Stirling number

    [  n - r  ]
    [         ]
    [n - d - c]
    

    of the first kind, taking the terms with "impossible" indices to be zero (some analytically motivated definitions of the Stirling numbers are nonzero in unexpected places).

    To get a rank from a permutation, we do n-ary search again, except this time, we use the permutation rather than the rank to guide the search.

    Here's some Python code for both (including a test function).

    import itertools
    
    memostirling1 = {(0, 0): 1}
    def stirling1(n, k):
        ans = memostirling1.get((n, k))
        if ans is None:
            if not 1 <= k <= n: return 0
            ans = (n - 1) * stirling1(n - 1, k) + stirling1(n - 1, k - 1)
            memostirling1[(n, k)] = ans
        return ans
    
    def cyclecount(prefix):
        c = 0
        visited = [False] * len(prefix)
        for (i, j) in enumerate(prefix):
            while j < len(prefix) and not visited[j]:
                visited[j] = True
                if j == i:
                    c += 1
                    break
                j = prefix[j]
        return c
    
    def extcount(n, dset, prefix):
        c = cyclecount(prefix)
        return sum(stirling1(n - len(prefix), n - d - c) for d in dset)
    
    def unrank(n, dset, rnk):
        assert rnk >= 0
        choices = set(range(n))
        prefix = []
        while choices:
            for i in sorted(choices):
                prefix.append(i)
                count = extcount(n, dset, prefix)
                if rnk < count:
                    choices.remove(i)
                    break
                del prefix[-1]
                rnk -= count
            else:
                assert False
        return tuple(prefix)
    
    def rank(n, dset, perm):
        assert n == len(perm)
        rnk = 0
        prefix = []
        choices = set(range(n))
        for j in perm:
            choices.remove(j)
            for i in sorted(choices):
                if i < j:
                    prefix.append(i)
                    rnk += extcount(n, dset, prefix)
                    del prefix[-1]
            prefix.append(j)
        return rnk
    
    def degree(perm):
        return len(perm) - cyclecount(perm)
    
    def test(n, dset):
        for (rnk, perm) in enumerate(perm for perm in itertools.permutations(range(n)) if degree(perm) in dset):
            assert unrank(n, dset, rnk) == perm
            assert rank(n, dset, perm) == rnk
    
    test(7, {2, 3, 5})
    
    qid & accept id: (23718340, 23718948) query: pandas dataframe: return column that is a compression of other columns soup:

    Well, I'd probably do it as follows (an example dataframe the hopefully captures your situation well enough):

    \n
    >>> df\n\n   A  B abc1 abc2 abc3 abc4\n0  1  4    x    r    a    d\n1  1  3    y    d    b    e\n2  2  4    z    e    c    r\n3  3  5    r    g    d    f\n4  4  8    z    z    z    z\n
    \n

    Get the columns of interest:

    \n
    >>> cols = [x for x in df.columns if 'abc' in x]\n>>> cols\n['abc1', 'abc2', 'abc3', 'abc4']\n\n>>> df['newcol'] = (df[cols] == 'r').any(axis=1).map({True:'r',False:'np.nan'})\n>>> df\n\n  A  B abc1 abc2 abc3 abc4  newcol\n0  1  4    x    r    a    d       r\n1  1  3    y    d    b    e  np.nan\n2  2  4    z    e    c    r       r\n3  3  5    r    g    d    f       r\n4  4  8    z    z    z    z  np.nan\n
    \n

    This should be pretty fast; I think even the use of map here will be a Cythonized call. If a boleen vector is sufficient for the newcol, you could just simplify it to the following:

    \n
    >>> df['newcol'] = (df[cols] == 'r').any(axis=1)\n>>> df\n\n   A  B abc1 abc2 abc3 abc4 newcol\n0  1  4    x    r    a    d   True\n1  1  3    y    d    b    e  False\n2  2  4    z    e    c    r   True\n3  3  5    r    g    d    f   True\n4  4  8    z    z    z    z  False\n
    \n

    Now, if you need to check if the strings contain 'r' instead of equalling 'r', you could do as follows:

    \n
    >>> df\n\n  A  B abc1  abc2 abc3 abc4\n0  1  4    x  root    a    d\n1  1  3    y     d    b    e\n2  2  4    z     e    c  bar\n3  3  5    r     g    d    f\n4  4  8    z     z    z    z\n\n>>> cols = [x for x in df.columns if 'abc' in x]\n>>> df['newcol'] = df[cols].apply(lambda x: x.str.contains('r'),axis=0).any(axis=1)\n>>> df['newcol'] = df['newcol'].map({True:'r',False:'np.nan'}) \n>>> df\n\n   A  B abc1  abc2 abc3 abc4  newcol\n0  1  4    x  root    a    d       r\n1  1  3    y     d    b    e  np.nan\n2  2  4    z     e    c  bar       r\n3  3  5    r     g    d    f       r\n4  4  8    z     z    z    z  np.nan\n
    \n

    This should still be pretty fast because it uses pandas' vectorized string methods for each of the columns (the apply is across the columns, not an iteration over the rows).

    \n soup wrap:

    Well, I'd probably do it as follows (an example dataframe the hopefully captures your situation well enough):

    >>> df
    
       A  B abc1 abc2 abc3 abc4
    0  1  4    x    r    a    d
    1  1  3    y    d    b    e
    2  2  4    z    e    c    r
    3  3  5    r    g    d    f
    4  4  8    z    z    z    z
    

    Get the columns of interest:

    >>> cols = [x for x in df.columns if 'abc' in x]
    >>> cols
    ['abc1', 'abc2', 'abc3', 'abc4']
    
    >>> df['newcol'] = (df[cols] == 'r').any(axis=1).map({True:'r',False:'np.nan'})
    >>> df
    
      A  B abc1 abc2 abc3 abc4  newcol
    0  1  4    x    r    a    d       r
    1  1  3    y    d    b    e  np.nan
    2  2  4    z    e    c    r       r
    3  3  5    r    g    d    f       r
    4  4  8    z    z    z    z  np.nan
    

    This should be pretty fast; I think even the use of map here will be a Cythonized call. If a boleen vector is sufficient for the newcol, you could just simplify it to the following:

    >>> df['newcol'] = (df[cols] == 'r').any(axis=1)
    >>> df
    
       A  B abc1 abc2 abc3 abc4 newcol
    0  1  4    x    r    a    d   True
    1  1  3    y    d    b    e  False
    2  2  4    z    e    c    r   True
    3  3  5    r    g    d    f   True
    4  4  8    z    z    z    z  False
    

    Now, if you need to check if the strings contain 'r' instead of equalling 'r', you could do as follows:

    >>> df
    
      A  B abc1  abc2 abc3 abc4
    0  1  4    x  root    a    d
    1  1  3    y     d    b    e
    2  2  4    z     e    c  bar
    3  3  5    r     g    d    f
    4  4  8    z     z    z    z
    
    >>> cols = [x for x in df.columns if 'abc' in x]
    >>> df['newcol'] = df[cols].apply(lambda x: x.str.contains('r'),axis=0).any(axis=1)
    >>> df['newcol'] = df['newcol'].map({True:'r',False:'np.nan'}) 
    >>> df
    
       A  B abc1  abc2 abc3 abc4  newcol
    0  1  4    x  root    a    d       r
    1  1  3    y     d    b    e  np.nan
    2  2  4    z     e    c  bar       r
    3  3  5    r     g    d    f       r
    4  4  8    z     z    z    z  np.nan
    

    This should still be pretty fast because it uses pandas' vectorized string methods for each of the columns (the apply is across the columns, not an iteration over the rows).

    qid & accept id: (23732057, 23735758) query: Remove word extension in python soup:

    Instead stemmer you can use lemmatizer. Here's an example with python NLTK:

    \n
    from nltk.stem import WordNetLemmatizer\n\ns = """\n You all are so beautiful soooo beautiful\n Thought that was a really awesome quote\n Beautiful things don't ask for attention\n """\n\nwnl = WordNetLemmatizer()\nprint " ".join([wnl.lemmatize(i) for i in s.split()]) #You all are so beautiful soooo beautiful Thought that wa a really awesome quote Beautiful thing don't ask for attention\n
    \n

    In some cases, it may not do what you expect:

    \n
    print wnl.lemmatize('going') #going\n
    \n

    Then you can combine both approaches: stemming and lemmatization.

    \n soup wrap:

    Instead stemmer you can use lemmatizer. Here's an example with python NLTK:

    from nltk.stem import WordNetLemmatizer
    
    s = """
     You all are so beautiful soooo beautiful
     Thought that was a really awesome quote
     Beautiful things don't ask for attention
     """
    
    wnl = WordNetLemmatizer()
    print " ".join([wnl.lemmatize(i) for i in s.split()]) #You all are so beautiful soooo beautiful Thought that wa a really awesome quote Beautiful thing don't ask for attention
    

    In some cases, it may not do what you expect:

    print wnl.lemmatize('going') #going
    

    Then you can combine both approaches: stemming and lemmatization.

    qid & accept id: (23733922, 23734024) query: How to Parse an orderedDict? soup:

    It is still a dictionary, just use the key:

    \n
    your_ordered_dict['clicks__c']\n
    \n

    Demo:

    \n
    >>> from collections import OrderedDict\n>>> od = OrderedDict([(u'attributes', OrderedDict([(u'type', u'Campaign__c'), (u'url', u'/services/data/v29.0/sobjects/Campaign__c/a0B9000000I6CDUEA3')])), (u'clicks__c', 0.0)])\n>>> od.keys()\n[u'attributes', u'clicks__c']\n>>> od['clicks__c']\n0.0\n
    \n

    If you parsed this from a JSON object, in the vast majority of cases order won't matter. It certainly doesn't with your values here. You could just have parsed it to regular dictionaries and not lost functionality.

    \n soup wrap:

    It is still a dictionary, just use the key:

    your_ordered_dict['clicks__c']
    

    Demo:

    >>> from collections import OrderedDict
    >>> od = OrderedDict([(u'attributes', OrderedDict([(u'type', u'Campaign__c'), (u'url', u'/services/data/v29.0/sobjects/Campaign__c/a0B9000000I6CDUEA3')])), (u'clicks__c', 0.0)])
    >>> od.keys()
    [u'attributes', u'clicks__c']
    >>> od['clicks__c']
    0.0
    

    If you parsed this from a JSON object, in the vast majority of cases order won't matter. It certainly doesn't with your values here. You could just have parsed it to regular dictionaries and not lost functionality.

    qid & accept id: (23795777, 23797060) query: python tkinter calender, placing the numbers soup:

    Could you post a code snippet? I'm not sure I get the question, but my first thought would be to generate a grid with tk.Labels assigned to each spot, and keep references to them in a list (or 2-D list). Then, when you need to update, just call

    \n
    self.labels[i][j].config(text='foo')\n
    \n

    Then you'd loop over i and j to set the values depending on how the calendar is laid out.

    \n

    Edit:\nNo, I wouldn't hard-code it, it should be straightforward to do with a loop, e.g.:

    \n
    import math\n\n# Initialize the calendar matrix\ncal = []\nfor i in range(5): # 5 weeks\n    cal.append([])\n    for j in range(7): # 7 days per week\n        cal[i].append('')\n\n# Set the calendar for some month\nstart = 3 # Wed\nfor day in range(31):\n    row = math.floor( (day+start) / 7)\n    col = (day+start) - 7*row\n    cal[row][col] = str(day+1)\n\nprint(cal)\n
    \n

    which spits out

    \n
    [['', '', '', '1', '2', '3', '4'], ['5', '6', '7', '8', '9', '10', '11'], ['12', '13', '14', '15', '16', '17', '18'], ['19', '20', '21', '22', '23', '24', '25'], ['26', '27', '28', '29', '30', '31', '']]\n
    \n

    though in the GUI of course, you'd want to have a matrix of tk.Label or whatever.

    \n soup wrap:

    Could you post a code snippet? I'm not sure I get the question, but my first thought would be to generate a grid with tk.Labels assigned to each spot, and keep references to them in a list (or 2-D list). Then, when you need to update, just call

    self.labels[i][j].config(text='foo')
    

    Then you'd loop over i and j to set the values depending on how the calendar is laid out.

    Edit: No, I wouldn't hard-code it, it should be straightforward to do with a loop, e.g.:

    import math
    
    # Initialize the calendar matrix
    cal = []
    for i in range(5): # 5 weeks
        cal.append([])
        for j in range(7): # 7 days per week
            cal[i].append('')
    
    # Set the calendar for some month
    start = 3 # Wed
    for day in range(31):
        row = math.floor( (day+start) / 7)
        col = (day+start) - 7*row
        cal[row][col] = str(day+1)
    
    print(cal)
    

    which spits out

    [['', '', '', '1', '2', '3', '4'], ['5', '6', '7', '8', '9', '10', '11'], ['12', '13', '14', '15', '16', '17', '18'], ['19', '20', '21', '22', '23', '24', '25'], ['26', '27', '28', '29', '30', '31', '']]
    

    though in the GUI of course, you'd want to have a matrix of tk.Label or whatever.

    qid & accept id: (23832259, 23832284) query: Python - diff-like order comparision of 2 lists with unequal sizes, soup:

    You can use itertools.izip_longest for this (note that this is renamed zip_longest in Python 3.x):

    \n
     [i for i, j in izip_longest(a, b) if i == j]\n
    \n

    Once the shorter itertable is exhausted, izip_longest will use whatever is entered as the fillvalue keyword argument (in this, None, since we didn't provide the parameter).

    \n

    Edit:

    \n

    If you want output that matches what you have in your original question, you can do this:

    \n
    >>> print("\n".join(["{}, {} {}OK".format(i, j, "" if i == j else "N") for i, j in izip_longest(a, b, fillvalue="null")]))\ne, e OK\nf, f OK\ng, h NOK\nh, i NOK\ni, j NOK\nnull, g NOK\n
    \n soup wrap:

    You can use itertools.izip_longest for this (note that this is renamed zip_longest in Python 3.x):

     [i for i, j in izip_longest(a, b) if i == j]
    

    Once the shorter itertable is exhausted, izip_longest will use whatever is entered as the fillvalue keyword argument (in this, None, since we didn't provide the parameter).

    Edit:

    If you want output that matches what you have in your original question, you can do this:

    >>> print("\n".join(["{}, {} {}OK".format(i, j, "" if i == j else "N") for i, j in izip_longest(a, b, fillvalue="null")]))
    e, e OK
    f, f OK
    g, h NOK
    h, i NOK
    i, j NOK
    null, g NOK
    
    qid & accept id: (23833763, 23833925) query: Pandas count number of elements in each column less than x soup:
    In [96]:\n\ndf = pd.DataFrame({'a':randn(10), 'b':randn(10), 'c':randn(10)})\ndf\nOut[96]:\n          a         b         c\n0 -0.849903  0.944912  1.285790\n1 -1.038706  1.445381  0.251002\n2  0.683135 -0.539052 -0.622439\n3 -1.224699 -0.358541  1.361618\n4 -0.087021  0.041524  0.151286\n5 -0.114031 -0.201018 -0.030050\n6  0.001891  1.601687 -0.040442\n7  0.024954 -1.839793  0.917328\n8 -1.480281  0.079342 -0.405370\n9  0.167295 -1.723555 -0.033937\n\n[10 rows x 3 columns]\nIn [97]:\n\ndf[df > 1.0].count()\n\nOut[97]:\na    0\nb    2\nc    2\ndtype: int64\n
    \n

    So in your case:

    \n
    df[df < 2.0 ].count() \n
    \n

    should work

    \n

    EDIT

    \n

    some timings

    \n
    In [3]:\n\n%timeit df[df < 1.0 ].count() \n%timeit (df < 1.0).sum()\n%timeit (df < 1.0).apply(np.count_nonzero)\n1000 loops, best of 3: 1.47 ms per loop\n1000 loops, best of 3: 560 us per loop\n1000 loops, best of 3: 529 us per loop\n
    \n

    So @DSM's suggestions are correct and much faster than my suggestion

    \n soup wrap:
    In [96]:
    
    df = pd.DataFrame({'a':randn(10), 'b':randn(10), 'c':randn(10)})
    df
    Out[96]:
              a         b         c
    0 -0.849903  0.944912  1.285790
    1 -1.038706  1.445381  0.251002
    2  0.683135 -0.539052 -0.622439
    3 -1.224699 -0.358541  1.361618
    4 -0.087021  0.041524  0.151286
    5 -0.114031 -0.201018 -0.030050
    6  0.001891  1.601687 -0.040442
    7  0.024954 -1.839793  0.917328
    8 -1.480281  0.079342 -0.405370
    9  0.167295 -1.723555 -0.033937
    
    [10 rows x 3 columns]
    In [97]:
    
    df[df > 1.0].count()
    
    Out[97]:
    a    0
    b    2
    c    2
    dtype: int64
    

    So in your case:

    df[df < 2.0 ].count() 
    

    should work

    EDIT

    some timings

    In [3]:
    
    %timeit df[df < 1.0 ].count() 
    %timeit (df < 1.0).sum()
    %timeit (df < 1.0).apply(np.count_nonzero)
    1000 loops, best of 3: 1.47 ms per loop
    1000 loops, best of 3: 560 us per loop
    1000 loops, best of 3: 529 us per loop
    

    So @DSM's suggestions are correct and much faster than my suggestion

    qid & accept id: (23837696, 23837715) query: Using regex to find a string starting with /team/ and ending with /Euro_2012 soup:

    Simple enough:

    \n
    re.findall(r'/team/.*?/Euro_2012', inputtext)\n
    \n

    You may want to limit the permissible characters between /team/ and /Euro_2012 to reduce the chances of false positives in larger text:

    \n
    re.findall(r'/team/[\w\d%.~+-/]*?/Euro_2012', inputtext)\n
    \n

    which only allows for valid URI characters.

    \n

    Demo:

    \n
    >>> import re\n>>> sample = '''\\n... /team/Croatia/Euro_2012\n... /team/Netherlands/Euro_2012\n... /team/Netherlands/WC2014\n... '''\n>>> re.findall(r'/team/.*?/Euro_2012', sample)\n['/team/Croatia/Euro_2012', '/team/Netherlands/Euro_2012']\n>>> re.findall(r'/team/[\w\d%.~+-/]*?/Euro_2012', sample)\n['/team/Croatia/Euro_2012', '/team/Netherlands/Euro_2012']\n
    \n soup wrap:

    Simple enough:

    re.findall(r'/team/.*?/Euro_2012', inputtext)
    

    You may want to limit the permissible characters between /team/ and /Euro_2012 to reduce the chances of false positives in larger text:

    re.findall(r'/team/[\w\d%.~+-/]*?/Euro_2012', inputtext)
    

    which only allows for valid URI characters.

    Demo:

    >>> import re
    >>> sample = '''\
    ... /team/Croatia/Euro_2012
    ... /team/Netherlands/Euro_2012
    ... /team/Netherlands/WC2014
    ... '''
    >>> re.findall(r'/team/.*?/Euro_2012', sample)
    ['/team/Croatia/Euro_2012', '/team/Netherlands/Euro_2012']
    >>> re.findall(r'/team/[\w\d%.~+-/]*?/Euro_2012', sample)
    ['/team/Croatia/Euro_2012', '/team/Netherlands/Euro_2012']
    
    qid & accept id: (23866378, 23866689) query: Compare unequal lists soup:

    If you are looking for the elements which exist in both lists, the following list comprehension should work:

    \n
    c = [item for item in b if item in a]\n
    \n

    Like so:

    \n
    >>> a = [6]\n>>> b = [6,7,8]\n>>> c = [item for item in b if item in a]\n>>> c\n[6]\n>>> \n
    \n

    If you want to, say print something every time the values match, use the following for loop:

    \n
    for i in b:\n    if i in a:\n        print '%d in both sets!' %(i)\n    else:\n        print '%d does not match!' %(i)\n
    \n

    This runs as:

    \n
    >>> a = [6, 7]\n>>> b = [6, 7, 8]\n>>> for i in b:\n...     if i in a:\n...             print '%d in both sets!' %(i)\n...     else:\n...             print '%d does not match!' %(i)\n... \n6 in both sets!\n7 in both sets!\n8 does not match!\n>>> \n
    \n soup wrap:

    If you are looking for the elements which exist in both lists, the following list comprehension should work:

    c = [item for item in b if item in a]
    

    Like so:

    >>> a = [6]
    >>> b = [6,7,8]
    >>> c = [item for item in b if item in a]
    >>> c
    [6]
    >>> 
    

    If you want to, say print something every time the values match, use the following for loop:

    for i in b:
        if i in a:
            print '%d in both sets!' %(i)
        else:
            print '%d does not match!' %(i)
    

    This runs as:

    >>> a = [6, 7]
    >>> b = [6, 7, 8]
    >>> for i in b:
    ...     if i in a:
    ...             print '%d in both sets!' %(i)
    ...     else:
    ...             print '%d does not match!' %(i)
    ... 
    6 in both sets!
    7 in both sets!
    8 does not match!
    >>> 
    
    qid & accept id: (23918947, 23919298) query: Using Random Module to Administer DNA Mutations soup:

    Its quite simple: go through the string and whenever you find a possible mutation point, mutate if the random number says to:

    \n
    import random\ndef mutate(string, mutation, threshold):\n    dna = list(string)\n    for index, char in enumerate(dna):\n        if char in mutation:\n            if random.random() < threshold:\n                dna[index] = mutation[char]\n\n    return ''.join(dna)\n
    \n

    If you wanted to be fancier, you could use a list comprehension:

    \n
    import random\ndef mutate(string, mutation, threshold):\n    return ''.join([mutation[char] if random.random() < threshold \n                                       and char in mutation else char\n                                       for char in string])\n
    \n soup wrap:

    Its quite simple: go through the string and whenever you find a possible mutation point, mutate if the random number says to:

    import random
    def mutate(string, mutation, threshold):
        dna = list(string)
        for index, char in enumerate(dna):
            if char in mutation:
                if random.random() < threshold:
                    dna[index] = mutation[char]
    
        return ''.join(dna)
    

    If you wanted to be fancier, you could use a list comprehension:

    import random
    def mutate(string, mutation, threshold):
        return ''.join([mutation[char] if random.random() < threshold 
                                           and char in mutation else char
                                           for char in string])
    
    qid & accept id: (23923658, 23923764) query: Exposing python daemon as a service soup:

    Just because I like zmq and gevent, I would probably do something like this:

    \n

    server.py

    \n
    import gevent\nimport gevent.monkey\ngevent.monkey.patch_all()\nimport zmq.green as zmq\nimport json\n\ncontext = zmq.Context()\nsocket = context.socket(zmq.ROUTER)\nsocket.bind("ipc:///tmp/myapp.ipc")\n\ndef do_something(parsed):\n    return sum(parsed.get("values"))\n\ndef handle(msg):\n    data = msg[1]\n    parsed = json.loads(data)\n    total = do_something(parsed)\n    msg[1] = json.dumps({"response": total})\n    socket.send_multipart(msg)\n\ndef handle_zmq():\n    while True:\n        msg = socket.recv_multipart()\n        gevent.spawn(handle, msg)\n\nif __name__ == "__main__":\n    handle_zmq()\n
    \n

    And then you would have a client.py for your command line tool, like

    \n
    import json\nimport zmq\n\nrequest_data = {\n        "values": [10, 20, 30 , 40],\n        }\n\ncontext = zmq.Context()\nsocket = context.socket(zmq.DEALER)\nsocket.connect("ipc:///tmp/myapp.ipc")\nsocket.send(json.dumps(request_data))\nprint socket.recv()\n
    \n

    Obviously this is a contrived example, but you should get the idea. Alternatively you could use something like xmlrpc or jsonrpc for this as well.

    \n soup wrap:

    Just because I like zmq and gevent, I would probably do something like this:

    server.py

    import gevent
    import gevent.monkey
    gevent.monkey.patch_all()
    import zmq.green as zmq
    import json
    
    context = zmq.Context()
    socket = context.socket(zmq.ROUTER)
    socket.bind("ipc:///tmp/myapp.ipc")
    
    def do_something(parsed):
        return sum(parsed.get("values"))
    
    def handle(msg):
        data = msg[1]
        parsed = json.loads(data)
        total = do_something(parsed)
        msg[1] = json.dumps({"response": total})
        socket.send_multipart(msg)
    
    def handle_zmq():
        while True:
            msg = socket.recv_multipart()
            gevent.spawn(handle, msg)
    
    if __name__ == "__main__":
        handle_zmq()
    

    And then you would have a client.py for your command line tool, like

    import json
    import zmq
    
    request_data = {
            "values": [10, 20, 30 , 40],
            }
    
    context = zmq.Context()
    socket = context.socket(zmq.DEALER)
    socket.connect("ipc:///tmp/myapp.ipc")
    socket.send(json.dumps(request_data))
    print socket.recv()
    

    Obviously this is a contrived example, but you should get the idea. Alternatively you could use something like xmlrpc or jsonrpc for this as well.

    qid & accept id: (23961648, 23963088) query: openCV Thresholding negative values soup:

    Numpy offers very powerful indexing capabilities. One of these is indexing using Boolean arrays. You can assign elements matching a condition to a certain value, which appears to be what you want. For example,

    \n
    threshold = 2\nflow[np.abs(flow) < threshold] = 0\n
    \n

    will assign each element in flow whose absolute value is too close to zero. Supposing that flow looked like this:

    \n
    [ 1  2  3]\n[-1 -2 -3]\n
    \n

    The result of applying this operation would be:

    \n
    [0  2  3]\n[0 -2 -3]\n
    \n

    Which has correctly removed positive and negative elements with a small magnitude, but retaining the sign of the negative elements.

    \n soup wrap:

    Numpy offers very powerful indexing capabilities. One of these is indexing using Boolean arrays. You can assign elements matching a condition to a certain value, which appears to be what you want. For example,

    threshold = 2
    flow[np.abs(flow) < threshold] = 0
    

    will assign each element in flow whose absolute value is too close to zero. Supposing that flow looked like this:

    [ 1  2  3]
    [-1 -2 -3]
    

    The result of applying this operation would be:

    [0  2  3]
    [0 -2 -3]
    

    Which has correctly removed positive and negative elements with a small magnitude, but retaining the sign of the negative elements.

    qid & accept id: (24002551, 24005333) query: wait() on a group of Popen objects soup:
      \n
    1. The "restart crashed server" task is really common, and probably shouldn't be handled by custom code unless there's a concrete reason. See upstart and systemd and monit.

    2. \n
    3. The multiprocessing.Pool object sounds like a win -- it automatically starts processes, and even restarts them if needed. Unfortunately it's not very configurable.

    4. \n
    \n

    Here's one solution with good old Popen:

    \n
    import random, time\nfrom subprocess import Popen\n\n\ndef work_diligently():\n    cmd = ["/bin/sleep", str(random.randrange(2,4))]\n    proc = Popen(cmd)\n    print '\t{}\t{}'.format(proc.pid, cmd) # pylint: disable=E1101\n    return proc\n\n\ndef spawn(num):\n    return [ work_diligently() for _ in xrange(num) ]\n\n\nNUM_PROCS = 3\nprocs = spawn(NUM_PROCS)\nwhile True:\n    print time.ctime(), 'scan'\n    procs = [ \n        proc for proc in procs\n        if proc.poll() is None\n    ]\n    num_exited = NUM_PROCS - len(procs)\n    if num_exited:\n        print 'Uhoh! Restarting {} procs'.format(num_exited)\n        procs.extend( spawn(num_exited) )\n    time.sleep(1)\n
    \n

    Output:

    \n
        2340    ['/bin/sleep', '2']\n    2341    ['/bin/sleep', '2']\n    2342    ['/bin/sleep', '3']\nMon Jun  2 18:01:42 2014 scan\nMon Jun  2 18:01:43 2014 scan\nMon Jun  2 18:01:44 2014 scan\nUhoh! Restarting 2 procs\n    2343    ['/bin/sleep', '3']\n    2344    ['/bin/sleep', '2']\nMon Jun  2 18:01:45 2014 scan\nUhoh! Restarting 1 procs\n    2345    ['/bin/sleep', '2']\nMon Jun  2 18:01:46 2014 scan\nUhoh! Restarting 1 procs\n    2346    ['/bin/sleep', '2']\nMon Jun  2 18:01:47 2014 scan\nUhoh! Restarting 2 procs\n    2347    ['/bin/sleep', '3']\n    2349    ['/bin/sleep', '2']\n
    \n soup wrap:
    1. The "restart crashed server" task is really common, and probably shouldn't be handled by custom code unless there's a concrete reason. See upstart and systemd and monit.

    2. The multiprocessing.Pool object sounds like a win -- it automatically starts processes, and even restarts them if needed. Unfortunately it's not very configurable.

    Here's one solution with good old Popen:

    import random, time
    from subprocess import Popen
    
    
    def work_diligently():
        cmd = ["/bin/sleep", str(random.randrange(2,4))]
        proc = Popen(cmd)
        print '\t{}\t{}'.format(proc.pid, cmd) # pylint: disable=E1101
        return proc
    
    
    def spawn(num):
        return [ work_diligently() for _ in xrange(num) ]
    
    
    NUM_PROCS = 3
    procs = spawn(NUM_PROCS)
    while True:
        print time.ctime(), 'scan'
        procs = [ 
            proc for proc in procs
            if proc.poll() is None
        ]
        num_exited = NUM_PROCS - len(procs)
        if num_exited:
            print 'Uhoh! Restarting {} procs'.format(num_exited)
            procs.extend( spawn(num_exited) )
        time.sleep(1)
    

    Output:

        2340    ['/bin/sleep', '2']
        2341    ['/bin/sleep', '2']
        2342    ['/bin/sleep', '3']
    Mon Jun  2 18:01:42 2014 scan
    Mon Jun  2 18:01:43 2014 scan
    Mon Jun  2 18:01:44 2014 scan
    Uhoh! Restarting 2 procs
        2343    ['/bin/sleep', '3']
        2344    ['/bin/sleep', '2']
    Mon Jun  2 18:01:45 2014 scan
    Uhoh! Restarting 1 procs
        2345    ['/bin/sleep', '2']
    Mon Jun  2 18:01:46 2014 scan
    Uhoh! Restarting 1 procs
        2346    ['/bin/sleep', '2']
    Mon Jun  2 18:01:47 2014 scan
    Uhoh! Restarting 2 procs
        2347    ['/bin/sleep', '3']
        2349    ['/bin/sleep', '2']
    
    qid & accept id: (24004106, 24004249) query: Summarize a list of dictionaries based on common key values soup:

    If you don't need the exact format that you provide you could use defaultdict

    \n
    dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},\n            {'day': 1, 'start': '10:00am', 'end': '7:00pm'},\n            {'day': 2, 'start': '8:00am', 'end': '5:00pm'},\n            {'day': 3, 'start': '10:00am', 'end': '7:00pm'},\n            {'day': 4, 'start': '8:00am', 'end': '5:00pm'},\n            {'day': 5, 'start': '11:00am', 'end': '1:00pm'}]\n\nfrom collections import defaultdict\n\ndd = defaultdict(list)\n\nfor d in dictlist:\n    dd[(d['start'],d['end'])].append(d['day'])\n
    \n

    Result:

    \n
    >>> dd\ndefaultdict(, {('11:00am', '1:00pm'): [5], ('10:00am', '7:00pm'): [1, 3], ('8:00am', '5:00pm'): [0, 2, 4]})\n
    \n

    And if format is important to you could do:

    \n
    >>> my_list = [(v, k[0], k[1]) for k,v in dd.iteritems()]\n>>> my_list\n[([5], '11:00am', '1:00pm'), ([1, 3], '10:00am', '7:00pm'), ([0, 2, 4], '8:00am', '5:00pm')]\n>>> # If you need the output sorted:  \n>>> sorted_my_list = sorted(my_list, key = lambda k : len(k[0]), reverse=True)\n>>> sorted_my_list\n[([0, 2, 4], '8:00am', '5:00pm'), ([1, 3], '10:00am', '7:00pm'), ([5], '11:00am', '1:00pm')]\n
    \n soup wrap:

    If you don't need the exact format that you provide you could use defaultdict

    dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},
                {'day': 1, 'start': '10:00am', 'end': '7:00pm'},
                {'day': 2, 'start': '8:00am', 'end': '5:00pm'},
                {'day': 3, 'start': '10:00am', 'end': '7:00pm'},
                {'day': 4, 'start': '8:00am', 'end': '5:00pm'},
                {'day': 5, 'start': '11:00am', 'end': '1:00pm'}]
    
    from collections import defaultdict
    
    dd = defaultdict(list)
    
    for d in dictlist:
        dd[(d['start'],d['end'])].append(d['day'])
    

    Result:

    >>> dd
    defaultdict(, {('11:00am', '1:00pm'): [5], ('10:00am', '7:00pm'): [1, 3], ('8:00am', '5:00pm'): [0, 2, 4]})
    

    And if format is important to you could do:

    >>> my_list = [(v, k[0], k[1]) for k,v in dd.iteritems()]
    >>> my_list
    [([5], '11:00am', '1:00pm'), ([1, 3], '10:00am', '7:00pm'), ([0, 2, 4], '8:00am', '5:00pm')]
    >>> # If you need the output sorted:  
    >>> sorted_my_list = sorted(my_list, key = lambda k : len(k[0]), reverse=True)
    >>> sorted_my_list
    [([0, 2, 4], '8:00am', '5:00pm'), ([1, 3], '10:00am', '7:00pm'), ([5], '11:00am', '1:00pm')]
    
    qid & accept id: (24018965, 24019005) query: How to add a another value to a key in python soup:

    append isn't working because the dictionary's values are not lists. If you make them lists here by placing them in [...]:

    \n
    phonebook = {}\nphonebook ['ana'] = ['12345']\nphonebook ['maria'] = ['23456' , 'maria@gmail.com']\n
    \n

    append will now work:

    \n
    def add_contact():\n   name = raw_input("Please enter a name:")\n   number = raw_input("Please enter a number:")\n   phonebook[name].append(number)\n
    \n soup wrap:

    append isn't working because the dictionary's values are not lists. If you make them lists here by placing them in [...]:

    phonebook = {}
    phonebook ['ana'] = ['12345']
    phonebook ['maria'] = ['23456' , 'maria@gmail.com']
    

    append will now work:

    def add_contact():
       name = raw_input("Please enter a name:")
       number = raw_input("Please enter a number:")
       phonebook[name].append(number)
    
    qid & accept id: (24024966, 24025175) query: Try/Except Every Method in Class? soup:

    Decorators are perfect for this. You can decorate each relevant method with a decorator like this one:

    \n

    (Note using recursion for retries is probably not a great idea ...)

    \n
    def Http500Resistant(func):\n    num_retries = 5\n    @functools.wraps(func)\n    def wrapper(*a, **kw):\n        sleep_interval = 2\n        for i in range(num_retries):\n            try:\n                return func(*a, **kw)\n            except apiclient.errors.HttpError, e:\n                if e.resp.status == 500 and i < num_retries-1:\n                    sleep(sleep_interval)\n                    sleep_interval = min(2*sleep_interval, 60)\n                else:\n                    raise e    \n    return wrapper\n\nclass A(object):\n\n    @Http500Resistant\n    def f1(self): ...\n\n    @Http500Resistant\n    def f2(self): ...\n
    \n
    \n

    To apply the decorator to all methods automatically, you can use yet-another-decorator, this time, decorating the class:

    \n
    import inspect\ndef decorate_all_methods(decorator):\n    def apply_decorator(cls):\n        for k, f in cls.__dict__.items():\n            if inspect.isfunction(f):\n                setattr(cls, k, decorator(f))\n        return cls\n    return apply_decorator\n
    \n

    and apply like this:

    \n
    @decorate_all_methods(Http500Resistant)\nclass A(object):\n    ...\n
    \n

    Or like:

    \n
    class A(object): ...\nA = decorate_all_methods(Http500Resistant)(A)\n
    \n soup wrap:

    Decorators are perfect for this. You can decorate each relevant method with a decorator like this one:

    (Note using recursion for retries is probably not a great idea ...)

    def Http500Resistant(func):
        num_retries = 5
        @functools.wraps(func)
        def wrapper(*a, **kw):
            sleep_interval = 2
            for i in range(num_retries):
                try:
                    return func(*a, **kw)
                except apiclient.errors.HttpError, e:
                    if e.resp.status == 500 and i < num_retries-1:
                        sleep(sleep_interval)
                        sleep_interval = min(2*sleep_interval, 60)
                    else:
                        raise e    
        return wrapper
    
    class A(object):
    
        @Http500Resistant
        def f1(self): ...
    
        @Http500Resistant
        def f2(self): ...
    

    To apply the decorator to all methods automatically, you can use yet-another-decorator, this time, decorating the class:

    import inspect
    def decorate_all_methods(decorator):
        def apply_decorator(cls):
            for k, f in cls.__dict__.items():
                if inspect.isfunction(f):
                    setattr(cls, k, decorator(f))
            return cls
        return apply_decorator
    

    and apply like this:

    @decorate_all_methods(Http500Resistant)
    class A(object):
        ...
    

    Or like:

    class A(object): ...
    A = decorate_all_methods(Http500Resistant)(A)
    
    qid & accept id: (24042596, 24042948) query: Transform string in a list with elements separated on Python soup:

    A more-useful transformation is likely to turn it into bytes:

    \n
    rem_spaces = str.maketrans({' ':None})\n\nfrom binascii import unhexlify\n\nunhexlify(u.translate(rem_spaces))\nOut[13]: b'\x01\xa02\x00\x00\x00\x00\xfe\x12o\x04'\n
    \n

    Or alternately a bytearray:

    \n
    bytearray(int(x,16) for x in u.split())\nOut[14]: bytearray(b'\x01\xa02\x00\x00\x00\x00\xfe\x12o\x04')\n
    \n

    If you really want a list of ints:

    \n
    [int(x, 16) for x in u.split()]\nOut[22]: [1, 160, 50, 0, 0, 0, 0, 254, 18, 111, 4]\n
    \n

    But in my experience you will very likely want to just work with sending/receiving bytes.

    \n soup wrap:

    A more-useful transformation is likely to turn it into bytes:

    rem_spaces = str.maketrans({' ':None})
    
    from binascii import unhexlify
    
    unhexlify(u.translate(rem_spaces))
    Out[13]: b'\x01\xa02\x00\x00\x00\x00\xfe\x12o\x04'
    

    Or alternately a bytearray:

    bytearray(int(x,16) for x in u.split())
    Out[14]: bytearray(b'\x01\xa02\x00\x00\x00\x00\xfe\x12o\x04')
    

    If you really want a list of ints:

    [int(x, 16) for x in u.split()]
    Out[22]: [1, 160, 50, 0, 0, 0, 0, 254, 18, 111, 4]
    

    But in my experience you will very likely want to just work with sending/receiving bytes.

    qid & accept id: (24055067, 24060347) query: Looking for a concise way to check for point collision in a list of Rects soup:

    You could use a simple generator expression and collidepoint(), like

    \n
    >>> rects = [pygame.Rect(0,0,100,100), pygame.Rect(30,30,30,30)]\n>>> next((r for r in rects if r.collidepoint(10, 10)), None)\n\n>>> next((r for r in rects if r.collidepoint(200, 200)), None)\n>>>\n
    \n

    or, if you really want the index instead of the Rect itself:

    \n
    >>> rects = [pygame.Rect(0,0,100,100), pygame.Rect(30,30,30,30)]\n>>> next((i for (i, r) in enumerate(rects) if r.collidepoint(10, 10)), -1)\n0\n>>> next((i for (i, r) in enumerate(rects) if r.collidepoint(100, 200)), -1)\n-1\n>>>\n
    \n soup wrap:

    You could use a simple generator expression and collidepoint(), like

    >>> rects = [pygame.Rect(0,0,100,100), pygame.Rect(30,30,30,30)]
    >>> next((r for r in rects if r.collidepoint(10, 10)), None)
    
    >>> next((r for r in rects if r.collidepoint(200, 200)), None)
    >>>
    

    or, if you really want the index instead of the Rect itself:

    >>> rects = [pygame.Rect(0,0,100,100), pygame.Rect(30,30,30,30)]
    >>> next((i for (i, r) in enumerate(rects) if r.collidepoint(10, 10)), -1)
    0
    >>> next((i for (i, r) in enumerate(rects) if r.collidepoint(100, 200)), -1)
    -1
    >>>
    
    qid & accept id: (24065575, 24065677) query: Byte formatting in python 3 soup:

    You can't format a bytes literal. You also can't concatenate bytes objects with str objects. Instead, put the whole thing together as a str, and then convert it to bytes using the proper encoding.

    \n
    msg = 'hi there'\nprefix = '{:0>5d}'.format(len(msg)) # No b at the front--this is a str\nstr_message = prefix + msg # still a str\nencoded_message = str_message.encode('utf-8') # or whatever encoding\n\nprint(encoded_message) # prints: b'00008hi there'\n
    \n
    \n

    Or if you're a fan of one-liners:

    \n
    encoded_message = bytes('{:0>5d}{:1}'.format(len(msg), msg), 'utf-8')\n
    \n
    \n

    According your comment on @Jan-Philip's answer, you need to specify how many bytes you're about to transfer? Given that, you'll need to encode the message first, so you can properly determine how many bytes it will be when you send it. The len function produces a proper byte-count when called on bytes, so something like this should work for arbitrary text:

    \n
    msg = 'ü' # len(msg) is 1 character\nencoded_msg = msg.encode('utf-8') # len(encoded_msg) is 2 bytes\nencoded_prefix = '{:0>5d}'.format(len(encoded_msg)).encode('utf-8')\nfull_message = encoded_prefix + encoded_msg # both are bytes, so we can concat\n\nprint(full_message) # prints: b'00002\xc3\xbc'\n
    \n soup wrap:

    You can't format a bytes literal. You also can't concatenate bytes objects with str objects. Instead, put the whole thing together as a str, and then convert it to bytes using the proper encoding.

    msg = 'hi there'
    prefix = '{:0>5d}'.format(len(msg)) # No b at the front--this is a str
    str_message = prefix + msg # still a str
    encoded_message = str_message.encode('utf-8') # or whatever encoding
    
    print(encoded_message) # prints: b'00008hi there'
    

    Or if you're a fan of one-liners:

    encoded_message = bytes('{:0>5d}{:1}'.format(len(msg), msg), 'utf-8')
    

    According your comment on @Jan-Philip's answer, you need to specify how many bytes you're about to transfer? Given that, you'll need to encode the message first, so you can properly determine how many bytes it will be when you send it. The len function produces a proper byte-count when called on bytes, so something like this should work for arbitrary text:

    msg = 'ü' # len(msg) is 1 character
    encoded_msg = msg.encode('utf-8') # len(encoded_msg) is 2 bytes
    encoded_prefix = '{:0>5d}'.format(len(encoded_msg)).encode('utf-8')
    full_message = encoded_prefix + encoded_msg # both are bytes, so we can concat
    
    print(full_message) # prints: b'00002\xc3\xbc'
    
    qid & accept id: (24072567, 24072653) query: Remove focus from Entry widget soup:

    By default, Frames do not take keyboard focus. However, if you want to give them keyboard focus when clicked on, you can do so by binding the focus_set method to a mouse click event:

    \n

    Option 1

    \n
    from tkinter import *\n\ntop = Tk()\n\nEntry(top, width="20").pack()\nb = Frame(top, width=200, height=200, bg='blue')\ng = Frame(top, width=200, height=200, bg='green')\ny = Frame(top, width=200, height=200, bg='yellow')\n\nb.pack()\ng.pack()\ny.pack()\n\nb.bind("<1>", lambda event: b.focus_set())\ng.bind("<1>", lambda event: g.focus_set())\ny.bind("<1>", lambda event: y.focus_set())\n\ntop.mainloop()\n
    \n

    Note that to do this you'll need to keep references to your widgets, as I did above with the variables b, g, and y.

    \n
    \n

    Option 2

    \n

    Here is another solution, accomplished by creating a subclass of Frame that is able to take keyboard focus:

    \n
    from tkinter import *\n\nclass FocusFrame(Frame):\n    def __init__(self, *args, **kwargs):\n        Frame.__init__(self, *args, **kwargs)\n        self.bind("<1>", lambda event: self.focus_set())\n\ntop = Tk()\n\nEntry(top, width="20").pack()\nFocusFrame(top, width=200, height=200, bg='blue').pack()\nFocusFrame(top, width=200, height=200, bg='green').pack()\nFocusFrame(top, width=200, height=200, bg='yellow').pack()    \n\ntop.mainloop()\n
    \n
    \n

    Option 3

    \n

    A third option is to just use bind_all to make every single widget take the keyboard focus when clicked (or you can use bind_class if you only want certain types of widgets to do this).

    \n

    Just add this line:

    \n
    top.bind_all("<1>", lambda event:event.widget.focus_set())\n
    \n soup wrap:

    By default, Frames do not take keyboard focus. However, if you want to give them keyboard focus when clicked on, you can do so by binding the focus_set method to a mouse click event:

    Option 1

    from tkinter import *
    
    top = Tk()
    
    Entry(top, width="20").pack()
    b = Frame(top, width=200, height=200, bg='blue')
    g = Frame(top, width=200, height=200, bg='green')
    y = Frame(top, width=200, height=200, bg='yellow')
    
    b.pack()
    g.pack()
    y.pack()
    
    b.bind("<1>", lambda event: b.focus_set())
    g.bind("<1>", lambda event: g.focus_set())
    y.bind("<1>", lambda event: y.focus_set())
    
    top.mainloop()
    

    Note that to do this you'll need to keep references to your widgets, as I did above with the variables b, g, and y.


    Option 2

    Here is another solution, accomplished by creating a subclass of Frame that is able to take keyboard focus:

    from tkinter import *
    
    class FocusFrame(Frame):
        def __init__(self, *args, **kwargs):
            Frame.__init__(self, *args, **kwargs)
            self.bind("<1>", lambda event: self.focus_set())
    
    top = Tk()
    
    Entry(top, width="20").pack()
    FocusFrame(top, width=200, height=200, bg='blue').pack()
    FocusFrame(top, width=200, height=200, bg='green').pack()
    FocusFrame(top, width=200, height=200, bg='yellow').pack()    
    
    top.mainloop()
    

    Option 3

    A third option is to just use bind_all to make every single widget take the keyboard focus when clicked (or you can use bind_class if you only want certain types of widgets to do this).

    Just add this line:

    top.bind_all("<1>", lambda event:event.widget.focus_set())
    
    qid & accept id: (24084817, 24095803) query: Log-sum-exp trick on a sparse matrix in scipy soup:

    The non-zero entries of a CSR matrix X are obtained by

    \n
    X[i].data\n
    \n

    and (a permutation of) the values of the actual row would be obtained by appending X.shape[1] - len(X[i].data) zeros to that.

    \n
    logsumexp(a) = max(a) + log(∑ exp[a - max(a)])\n
    \n

    for a vector a. Let's set b = X[i].data and k = X.shape[1] - len(X[i].data) and denote our earlier permuted row of X as

    \n
    (b, 0ₖ)\n
    \n

    using 0ₖ to denote a zero vector of length k and (⋅, ⋅) for concatenation. Then

    \n
    logsumexp((b, 0ₖ))\n = max((b, 0ₖ)) + log(∑ exp[(b, 0ₖ) - max((b, 0ₖ))])\n = max(max(b), 0) + log(∑ exp[(b, 0ₖ) - max(max(b), 0)])\n = max(max(b), 0) + log(∑ exp[b - max(max(b), 0)] + ∑ exp[0ₖ - max(max(b), 0)])\n = max(max(b), 0) + log(∑ exp[b - max(max(b), 0)] + k × exp[-max(max(b), 0)])\n
    \n

    So we get the algorithm

    \n
    def logsumexp_csr_row(x):\n    data = x.data\n    mx = max(np.max(data), 0)\n    tmp = data - mx\n    r = np.exp(tmp, out=tmp).sum()\n    k = X.shape[1] - len(data)\n    return mx + np.log(r + k * np.exp(-mx))\n
    \n

    for a CSR row vector. Extending this algorithm to the full matrix is easily done by a list comprehension, although a more efficient form would loop over the rows using the indptr:

    \n
    def logsumexp_csr_rows(X):\n    result = np.empty(X.shape[0])\n    for i in range(X.shape[0]):\n        data = X.data[X.indptr[i]:X.indptr[i+1]]\n        # fill in from logsumexp_csr_row\n        result[i] = mx + np.log(r + k * np.exp(-mx))\n    return result\n
    \n

    A column-wise version is much trickier; it's probably easiest to transpose the matrix and convert back to CSR.

    \n
    \n

    UPDATE Ok, I misunderstood the question: the OP is not interested in handling the zeros at all, so the above derivation is useless and the algorithm should be

    \n
    def logsumexp_row_nonzeros(X):\n    result = np.empty(X.shape[0])\n    for i in range(X.shape[0]):\n        result[i] = logsumexp(X.data[X.indptr[i]:X.indptr[i+1]])\n    return result\n
    \n

    This is just filling in the general scheme of row-wise operations on a CSR matrix. For column-wise, transpose, convert back to CSR and apply the above.

    \n soup wrap:

    The non-zero entries of a CSR matrix X are obtained by

    X[i].data
    

    and (a permutation of) the values of the actual row would be obtained by appending X.shape[1] - len(X[i].data) zeros to that.

    logsumexp(a) = max(a) + log(∑ exp[a - max(a)])
    

    for a vector a. Let's set b = X[i].data and k = X.shape[1] - len(X[i].data) and denote our earlier permuted row of X as

    (b, 0ₖ)
    

    using 0ₖ to denote a zero vector of length k and (⋅, ⋅) for concatenation. Then

    logsumexp((b, 0ₖ))
     = max((b, 0ₖ)) + log(∑ exp[(b, 0ₖ) - max((b, 0ₖ))])
     = max(max(b), 0) + log(∑ exp[(b, 0ₖ) - max(max(b), 0)])
     = max(max(b), 0) + log(∑ exp[b - max(max(b), 0)] + ∑ exp[0ₖ - max(max(b), 0)])
     = max(max(b), 0) + log(∑ exp[b - max(max(b), 0)] + k × exp[-max(max(b), 0)])
    

    So we get the algorithm

    def logsumexp_csr_row(x):
        data = x.data
        mx = max(np.max(data), 0)
        tmp = data - mx
        r = np.exp(tmp, out=tmp).sum()
        k = X.shape[1] - len(data)
        return mx + np.log(r + k * np.exp(-mx))
    

    for a CSR row vector. Extending this algorithm to the full matrix is easily done by a list comprehension, although a more efficient form would loop over the rows using the indptr:

    def logsumexp_csr_rows(X):
        result = np.empty(X.shape[0])
        for i in range(X.shape[0]):
            data = X.data[X.indptr[i]:X.indptr[i+1]]
            # fill in from logsumexp_csr_row
            result[i] = mx + np.log(r + k * np.exp(-mx))
        return result
    

    A column-wise version is much trickier; it's probably easiest to transpose the matrix and convert back to CSR.


    UPDATE Ok, I misunderstood the question: the OP is not interested in handling the zeros at all, so the above derivation is useless and the algorithm should be

    def logsumexp_row_nonzeros(X):
        result = np.empty(X.shape[0])
        for i in range(X.shape[0]):
            result[i] = logsumexp(X.data[X.indptr[i]:X.indptr[i+1]])
        return result
    

    This is just filling in the general scheme of row-wise operations on a CSR matrix. For column-wise, transpose, convert back to CSR and apply the above.

    qid & accept id: (24097930, 24098323) query: Skip/pass over view function so the next can execute in Flask soup:

    You want use conditional logic for routes - it's probably not best point for routes.

    \n

    However:

    \n
      \n
    1. You can call another route on condition, but in some cases request fields can be different for this call and direct route call:

      \n
      @app.route('/')\ndef feature(slug):\n    if slug_in_database(slug):\n        return "feature: " + slug\n    return catch(slug)\n\n@app.route('/')\ndef catch(url):\n    return "catch: " + url\n
    2. \n
    3. Solution with right request object:

      \n
      @app.route('/')\ndef feature(slug):\n    if slug_in_database(slug):\n        return "feature: " + slug\n    with app.test_request_context(url_for('catch', url=slug))\n        return catch(slug)\n\n@app.route('/')\ndef catch(url):\n    return "catch: " + url\n
    4. \n
    5. But I like just condition:

      \n
      @app.route('/')\ndef feature_or_catch(url):\n    slug  = url\n    if '/' not in slug and slug_in_database(slug):\n        return "feature: " + slug\n    return "catch: " + url\n
    6. \n
    \n soup wrap:

    You want use conditional logic for routes - it's probably not best point for routes.

    However:

    1. You can call another route on condition, but in some cases request fields can be different for this call and direct route call:

      @app.route('/')
      def feature(slug):
          if slug_in_database(slug):
              return "feature: " + slug
          return catch(slug)
      
      @app.route('/')
      def catch(url):
          return "catch: " + url
      
    2. Solution with right request object:

      @app.route('/')
      def feature(slug):
          if slug_in_database(slug):
              return "feature: " + slug
          with app.test_request_context(url_for('catch', url=slug))
              return catch(slug)
      
      @app.route('/')
      def catch(url):
          return "catch: " + url
      
    3. But I like just condition:

      @app.route('/')
      def feature_or_catch(url):
          slug  = url
          if '/' not in slug and slug_in_database(slug):
              return "feature: " + slug
          return "catch: " + url
      
    qid & accept id: (24103624, 24103817) query: NumPy map calculation depending on the indices soup:

    You could also look at meshgrid, mgrid, and/or indices:

    \n
    >>> H, W = 4,5\n>>> x, y = np.indices([H, W])\n>>> m\narray([[  0. ,   0.5,   2. ,   4.5,   8. ],\n       [  0.5,   1. ,   2.5,   5. ,   8.5],\n       [  2. ,   2.5,   4. ,   6.5,  10. ],\n       [  4.5,   5. ,   6.5,   9. ,  12.5]])\n
    \n

    This works because x and y are arrays with the appropriate x and y coordinates:

    \n
    >>> x\narray([[0, 0, 0, 0, 0],\n       [1, 1, 1, 1, 1],\n       [2, 2, 2, 2, 2],\n       [3, 3, 3, 3, 3]])\n>>> y\narray([[0, 1, 2, 3, 4],\n       [0, 1, 2, 3, 4],\n       [0, 1, 2, 3, 4],\n       [0, 1, 2, 3, 4]])\n
    \n

    meshgrid and mgrid allow for finer control, e.g.

    \n
    >>> x, y = np.meshgrid(np.linspace(0, 1, 5), np.linspace(0, 10, 3))\n>>> x\narray([[ 0.  ,  0.25,  0.5 ,  0.75,  1.  ],\n       [ 0.  ,  0.25,  0.5 ,  0.75,  1.  ],\n       [ 0.  ,  0.25,  0.5 ,  0.75,  1.  ]])\n>>> y\narray([[  0.,   0.,   0.,   0.,   0.],\n       [  5.,   5.,   5.,   5.,   5.],\n       [ 10.,  10.,  10.,  10.,  10.]])\n
    \n soup wrap:

    You could also look at meshgrid, mgrid, and/or indices:

    >>> H, W = 4,5
    >>> x, y = np.indices([H, W])
    >>> m
    array([[  0. ,   0.5,   2. ,   4.5,   8. ],
           [  0.5,   1. ,   2.5,   5. ,   8.5],
           [  2. ,   2.5,   4. ,   6.5,  10. ],
           [  4.5,   5. ,   6.5,   9. ,  12.5]])
    

    This works because x and y are arrays with the appropriate x and y coordinates:

    >>> x
    array([[0, 0, 0, 0, 0],
           [1, 1, 1, 1, 1],
           [2, 2, 2, 2, 2],
           [3, 3, 3, 3, 3]])
    >>> y
    array([[0, 1, 2, 3, 4],
           [0, 1, 2, 3, 4],
           [0, 1, 2, 3, 4],
           [0, 1, 2, 3, 4]])
    

    meshgrid and mgrid allow for finer control, e.g.

    >>> x, y = np.meshgrid(np.linspace(0, 1, 5), np.linspace(0, 10, 3))
    >>> x
    array([[ 0.  ,  0.25,  0.5 ,  0.75,  1.  ],
           [ 0.  ,  0.25,  0.5 ,  0.75,  1.  ],
           [ 0.  ,  0.25,  0.5 ,  0.75,  1.  ]])
    >>> y
    array([[  0.,   0.,   0.,   0.,   0.],
           [  5.,   5.,   5.,   5.,   5.],
           [ 10.,  10.,  10.,  10.,  10.]])
    
    qid & accept id: (24108842, 24108947) query: Converting and reshaping a list into a DataFrame in Pandas soup:

    You should first convert the list in a more appropriate format.

    \n

    One option is to convert it to a list of sublists (a sublist for each row) with a list comprehension:

    \n
    In [10]: x_sublists = [x[i:i+3] for i in range(0, len(x), 3)]\n\nIn [11]: pd.DataFrame(x_sublists [1:], columns=x_sublists [0])\nOut[11]: \n   Phase            Formula                Sat Indx\n0  Calcite          CaCO3            0.840931478691\n1  Aragonite        CaCO3            0.697161631298\n2  H2O(g)           H2O              -1.51011433303\n3  CO2(g)           CO2              -1.55228705787\n4  Gypsum           CaSO4:2H2O        -2.9936491424\n5  Anhydrite        CaSO4            -3.21352846684\n6  Portlandite      Ca(OH)2          -10.7380672515\n7  H2(g)            H2                        -22.6\n8  O2(g)            O2                -37.987869775\n9  CH4(g)           CH4              -66.1697168119\n
    \n

    Another option is reshaping the list as a numpy array (but this has the disadvantage to leading to a column with object dtype as noted by @DSM, so to end up with the same result as above, the column should be set as float manually):

    \n
    In [67]: x_reshaped = np.array(x[3:], dtype=object).reshape((-1, 3))\n\nIn [68]: df = pd.DataFrame(x_reshaped, columns=x[:3])\n\nIn [69]: df['Sat Indx'] = df['Sat Indx'].astype(float)\n
    \n soup wrap:

    You should first convert the list in a more appropriate format.

    One option is to convert it to a list of sublists (a sublist for each row) with a list comprehension:

    In [10]: x_sublists = [x[i:i+3] for i in range(0, len(x), 3)]
    
    In [11]: pd.DataFrame(x_sublists [1:], columns=x_sublists [0])
    Out[11]: 
       Phase            Formula                Sat Indx
    0  Calcite          CaCO3            0.840931478691
    1  Aragonite        CaCO3            0.697161631298
    2  H2O(g)           H2O              -1.51011433303
    3  CO2(g)           CO2              -1.55228705787
    4  Gypsum           CaSO4:2H2O        -2.9936491424
    5  Anhydrite        CaSO4            -3.21352846684
    6  Portlandite      Ca(OH)2          -10.7380672515
    7  H2(g)            H2                        -22.6
    8  O2(g)            O2                -37.987869775
    9  CH4(g)           CH4              -66.1697168119
    

    Another option is reshaping the list as a numpy array (but this has the disadvantage to leading to a column with object dtype as noted by @DSM, so to end up with the same result as above, the column should be set as float manually):

    In [67]: x_reshaped = np.array(x[3:], dtype=object).reshape((-1, 3))
    
    In [68]: df = pd.DataFrame(x_reshaped, columns=x[:3])
    
    In [69]: df['Sat Indx'] = df['Sat Indx'].astype(float)
    
    qid & accept id: (24127569, 24131698) query: Sort individual components of a list in python soup:

    If you have a 2d numpy array, you can use the sort method on arrays, and specify axis=0:

    \n
    >>> d\narray([[ 0.17 ,  0.045,  0.01 ],\n       [ 0.28 ,  0.1  ,  0.19 ],\n       [ 0.31 ,  0.19 ,  0.09 ],\n       [ 0.36 ,  0.42 ,  0.38 ],\n       [ 0.62 ,  0.02 ,  0.03 ],\n       [ 0.32 ,  0.12 ,  0.26 ]])\n>>> d2 = d.copy()\n>>> d2.sort(axis=0)\n>>> d2\narray([[ 0.17 ,  0.02 ,  0.01 ],\n       [ 0.28 ,  0.045,  0.03 ],\n       [ 0.31 ,  0.1  ,  0.09 ],\n       [ 0.32 ,  0.12 ,  0.19 ],\n       [ 0.36 ,  0.19 ,  0.26 ],\n       [ 0.62 ,  0.42 ,  0.38 ]])\n
    \n

    And then you can transpose this if you want to work with the rows and not the columns:

    \n
    >>> d2.T\narray([[ 0.17 ,  0.28 ,  0.31 ,  0.32 ,  0.36 ,  0.62 ],\n       [ 0.02 ,  0.045,  0.1  ,  0.12 ,  0.19 ,  0.42 ],\n       [ 0.01 ,  0.03 ,  0.09 ,  0.19 ,  0.26 ,  0.38 ]])\n>>> for row in d2.T:\n...     print(row)\n...     \n[ 0.17  0.28  0.31  0.32  0.36  0.62]\n[ 0.02   0.045  0.1    0.12   0.19   0.42 ]\n[ 0.01  0.03  0.09  0.19  0.26  0.38]\n
    \n

    Et cetera.

    \n soup wrap:

    If you have a 2d numpy array, you can use the sort method on arrays, and specify axis=0:

    >>> d
    array([[ 0.17 ,  0.045,  0.01 ],
           [ 0.28 ,  0.1  ,  0.19 ],
           [ 0.31 ,  0.19 ,  0.09 ],
           [ 0.36 ,  0.42 ,  0.38 ],
           [ 0.62 ,  0.02 ,  0.03 ],
           [ 0.32 ,  0.12 ,  0.26 ]])
    >>> d2 = d.copy()
    >>> d2.sort(axis=0)
    >>> d2
    array([[ 0.17 ,  0.02 ,  0.01 ],
           [ 0.28 ,  0.045,  0.03 ],
           [ 0.31 ,  0.1  ,  0.09 ],
           [ 0.32 ,  0.12 ,  0.19 ],
           [ 0.36 ,  0.19 ,  0.26 ],
           [ 0.62 ,  0.42 ,  0.38 ]])
    

    And then you can transpose this if you want to work with the rows and not the columns:

    >>> d2.T
    array([[ 0.17 ,  0.28 ,  0.31 ,  0.32 ,  0.36 ,  0.62 ],
           [ 0.02 ,  0.045,  0.1  ,  0.12 ,  0.19 ,  0.42 ],
           [ 0.01 ,  0.03 ,  0.09 ,  0.19 ,  0.26 ,  0.38 ]])
    >>> for row in d2.T:
    ...     print(row)
    ...     
    [ 0.17  0.28  0.31  0.32  0.36  0.62]
    [ 0.02   0.045  0.1    0.12   0.19   0.42 ]
    [ 0.01  0.03  0.09  0.19  0.26  0.38]
    

    Et cetera.

    qid & accept id: (24129445, 24129594) query: Clean way to manage parse-dictionaries that contain function names soup:

    Store the dictionary in the JSON format in file, with the function names as ordinary strings. Demo how to load a JSON file:

    \n

    Content of sample file:

    \n
    {"somestring":"myfunction"}\n
    \n

    Code:

    \n
    import json\nd = json.load(open('very_small_dic.txt', 'r'))\nprint(d) # {'somestring': 'myfunction'}\n
    \n

    How to get the string:function mapping:

    \n

    First you load the dictionary from a file as illustrated in the code above. After that, you build a new dictionary where the strings of the function names are replaced by the actual functions. Demo:

    \n
    def myfunction(x):\n    return 2*x\n\nd = {'somestring': 'myfunction'} # in the real code this came from json.load\nd = {k:globals()[v] for k,v in d.items()}\nprint(d) # {'somestring': }\nprint(d['somestring'](42)) # 84    \n
    \n

    You could also store your functions in a separate file myfunctions.py and use getattr. This is probably a cleaner way than using globals.

    \n
    import myfunctions # for this demo, this module only contains the function myfunction\n\nd = {'somestring': 'myfunction'} # in the real code this came from json.load\nd = {k:getattr(myfunctions,v) for k,v in d.items()}\nprint(d) # {'somestring': }\nprint(d['somestring'](42)) # 84    \n
    \n soup wrap:

    Store the dictionary in the JSON format in file, with the function names as ordinary strings. Demo how to load a JSON file:

    Content of sample file:

    {"somestring":"myfunction"}
    

    Code:

    import json
    d = json.load(open('very_small_dic.txt', 'r'))
    print(d) # {'somestring': 'myfunction'}
    

    How to get the string:function mapping:

    First you load the dictionary from a file as illustrated in the code above. After that, you build a new dictionary where the strings of the function names are replaced by the actual functions. Demo:

    def myfunction(x):
        return 2*x
    
    d = {'somestring': 'myfunction'} # in the real code this came from json.load
    d = {k:globals()[v] for k,v in d.items()}
    print(d) # {'somestring': }
    print(d['somestring'](42)) # 84    
    

    You could also store your functions in a separate file myfunctions.py and use getattr. This is probably a cleaner way than using globals.

    import myfunctions # for this demo, this module only contains the function myfunction
    
    d = {'somestring': 'myfunction'} # in the real code this came from json.load
    d = {k:getattr(myfunctions,v) for k,v in d.items()}
    print(d) # {'somestring': }
    print(d['somestring'](42)) # 84    
    
    qid & accept id: (24144766, 24144832) query: What does a class need to implement in order to be used as an argument tuple? soup:

    There are two ways to control the behavior of the * operator when it is used like that:

    \n
      \n
    1. Overload the __iter__ special method:

      \n
      >>> class C(object):\n...     def __init__(self, lst):\n...         self.lst = lst\n...     def __iter__(self):\n...         return iter(self.lst)\n...\n>>> def f(a, b, c):\n...     print "Arguments: ", a, b, c\n...\n>>> c = C([1, 2, 3])\n>>> f(*c)\nArguments:  1 2 3\n>>>\n
    2. \n
    3. Overload the __getitem__ special method:

      \n
      >>> class C(object):\n...     def __init__(self, lst):\n...         self.lst = lst\n...     def __getitem__(self, key):\n...         return self.lst[key]\n...\n>>> def f(a, b, c):\n...     print "Arguments: ", a, b, c\n...\n>>> c = C([1, 2, 3])\n>>> f(*c)\nArguments:  1 2 3\n>>>\n
    4. \n
    \n soup wrap:

    There are two ways to control the behavior of the * operator when it is used like that:

    1. Overload the __iter__ special method:

      >>> class C(object):
      ...     def __init__(self, lst):
      ...         self.lst = lst
      ...     def __iter__(self):
      ...         return iter(self.lst)
      ...
      >>> def f(a, b, c):
      ...     print "Arguments: ", a, b, c
      ...
      >>> c = C([1, 2, 3])
      >>> f(*c)
      Arguments:  1 2 3
      >>>
      
    2. Overload the __getitem__ special method:

      >>> class C(object):
      ...     def __init__(self, lst):
      ...         self.lst = lst
      ...     def __getitem__(self, key):
      ...         return self.lst[key]
      ...
      >>> def f(a, b, c):
      ...     print "Arguments: ", a, b, c
      ...
      >>> c = C([1, 2, 3])
      >>> f(*c)
      Arguments:  1 2 3
      >>>
      
    qid & accept id: (24145957, 24146047) query: Python regular expression: get result without the search string used soup:

    You can include a capture group (using parentheses) to select the part you want:

    \n
     StartTime = re.findall(r"StartTime (\d.\d.)", text)\n                                  # ^ capture this part\n
    \n

    but your regex seems odd - it gets a digit ('\d') followed by any character ('.' - not a full stop) followed by a digit followed by any character. You may be better with:

    \n
    StartTime = re.findall(r"StartTime (\d\.\d{2})", text)\n
    \n

    which is a digit followed by a full stop ('\.' - note backslash to escape) followed by two digits.

    \n
    >>> import re\n>>> s = """[AC 2 StartTime 3.29 s   32912KB -> 27720KB   24.54 ms]\n[AC 3 StartTime 3.35 s   39404KB -> 36252KB   11.05 ms]\n[AC 4 StartTime 3.55 s   44592KB -> 39316KB   1.91 ms]"""\n>>> re.findall(r"StartTime (\d\.\d{2})", s)\n['3.29', '3.35', '3.55']\n
    \n soup wrap:

    You can include a capture group (using parentheses) to select the part you want:

     StartTime = re.findall(r"StartTime (\d.\d.)", text)
                                      # ^ capture this part
    

    but your regex seems odd - it gets a digit ('\d') followed by any character ('.' - not a full stop) followed by a digit followed by any character. You may be better with:

    StartTime = re.findall(r"StartTime (\d\.\d{2})", text)
    

    which is a digit followed by a full stop ('\.' - note backslash to escape) followed by two digits.

    >>> import re
    >>> s = """[AC 2 StartTime 3.29 s   32912KB -> 27720KB   24.54 ms]
    [AC 3 StartTime 3.35 s   39404KB -> 36252KB   11.05 ms]
    [AC 4 StartTime 3.55 s   44592KB -> 39316KB   1.91 ms]"""
    >>> re.findall(r"StartTime (\d\.\d{2})", s)
    ['3.29', '3.35', '3.55']
    
    qid & accept id: (24162868, 24165901) query: Use a class method as an integrand to GSL QAGS soup:

    In Python you can use functools.partial to pass the method like a function pre-defining self argument:

    \n
    from functools import partial\n\nfoo = Foo()\nintegrand_func = partial(Foo._integrand, foo)\n
    \n

    But you cannot define such a partial function in C. For this reason I would externalize the integrand and define it as one parameter of your class, instead of an extra method. Note that this is like using a static method, but Cython does not support it yet. Another important tip is to use log form math.h. See the prototype below:

    \n
    #cython: wraparound=False\n#cython: boundscheck=False\n#cython: cdivision=True\n#cython: nonecheck=False\ncdef extern from "math.h":\n    double log(double x) nogil\n\nctypedef double (*function)(double x, void *params)\n\ncdef struct integrandFoo_p:\n    double *a\n    double *b\n\ncdef struct gsl_function:\n    function function\n    void *params\n\ncdef double integrandFoo(double x, void *params):\n    cdef integrandFoo_p *p=params\n    cdef double a, b\n    a = p.a[0]\n    b = p.b[0]\n    return a*log(x)+b/(x*x*x)\n\ncdef void trapzd(gsl_function *F, double lower_limit, double upper_limit,\n                 int num, double *result, double *error):\n    cdef int i\n    cdef double x\n    f = F[0].function\n    p = F[0].params\n    result[0] = 0.\n    for i in range(num+1):\n        x = lower_limit + (upper_limit - lower_limit)*i/num\n        if i==0 or i==num:\n            result[0] += f(x, p)*0.5\n        else:\n            result[0] += f(x, p)\n\n    error[0] = 0.\n\ncdef class Foo(object):\n    cdef double a\n    cdef double b\n    cdef double result\n    cdef double error\n    cdef function f\n    def __init__(self):\n        self.a = 1.2\n        self.b = 0.6\n        self.f = integrandFoo\n    cdef void integrate(self, double lower_limit=1, double upper_limit=10):\n        #cdef gsl_integration_workspace * w\n        cdef double result, error, expected, alpha\n        cdef integrandFoo_p p\n        cdef gsl_function F\n\n        #w = gsl_integration_workspace_alloc(1000)\n        p.a = &self.a\n        p.b = &self.b\n        expected = -4.0\n\n        F.function = self.f\n        F.params = &p\n\n        trapzd(&F, lower_limit, upper_limit, 1000, &self.result, &self.error)\n        #gsl_integration_qags(&F, lower_limit, upper_limit, 0, 1e-7,\n                             #1000, w, &result, &error)\ndef main():\n    foo = Foo()\n    print 'HERE', foo.result, foo.error\n    foo.integrate()\n    print 'HERE', foo.result, foo.error\n
    \n soup wrap:

    In Python you can use functools.partial to pass the method like a function pre-defining self argument:

    from functools import partial
    
    foo = Foo()
    integrand_func = partial(Foo._integrand, foo)
    

    But you cannot define such a partial function in C. For this reason I would externalize the integrand and define it as one parameter of your class, instead of an extra method. Note that this is like using a static method, but Cython does not support it yet. Another important tip is to use log form math.h. See the prototype below:

    #cython: wraparound=False
    #cython: boundscheck=False
    #cython: cdivision=True
    #cython: nonecheck=False
    cdef extern from "math.h":
        double log(double x) nogil
    
    ctypedef double (*function)(double x, void *params)
    
    cdef struct integrandFoo_p:
        double *a
        double *b
    
    cdef struct gsl_function:
        function function
        void *params
    
    cdef double integrandFoo(double x, void *params):
        cdef integrandFoo_p *p=params
        cdef double a, b
        a = p.a[0]
        b = p.b[0]
        return a*log(x)+b/(x*x*x)
    
    cdef void trapzd(gsl_function *F, double lower_limit, double upper_limit,
                     int num, double *result, double *error):
        cdef int i
        cdef double x
        f = F[0].function
        p = F[0].params
        result[0] = 0.
        for i in range(num+1):
            x = lower_limit + (upper_limit - lower_limit)*i/num
            if i==0 or i==num:
                result[0] += f(x, p)*0.5
            else:
                result[0] += f(x, p)
    
        error[0] = 0.
    
    cdef class Foo(object):
        cdef double a
        cdef double b
        cdef double result
        cdef double error
        cdef function f
        def __init__(self):
            self.a = 1.2
            self.b = 0.6
            self.f = integrandFoo
        cdef void integrate(self, double lower_limit=1, double upper_limit=10):
            #cdef gsl_integration_workspace * w
            cdef double result, error, expected, alpha
            cdef integrandFoo_p p
            cdef gsl_function F
    
            #w = gsl_integration_workspace_alloc(1000)
            p.a = &self.a
            p.b = &self.b
            expected = -4.0
    
            F.function = self.f
            F.params = &p
    
            trapzd(&F, lower_limit, upper_limit, 1000, &self.result, &self.error)
            #gsl_integration_qags(&F, lower_limit, upper_limit, 0, 1e-7,
                                 #1000, w, &result, &error)
    def main():
        foo = Foo()
        print 'HERE', foo.result, foo.error
        foo.integrate()
        print 'HERE', foo.result, foo.error
    
    qid & accept id: (24172896, 24173883) query: Pandas, to_csv () to a specific format soup:

    First off just to demonstrate that reading this in is fine:

    \n
    In [11]: df = pd.read_clipboard(sep=',', index_col=0)\n\nIn [12]: df\nOut[12]:\n   pgtime  pgstat  age  eet     g2  grade  gleason      ploidy\n1     6.1       0   64    2  10.26      2        4     diploid\n2     9.4       0   62    1    NaN      3        8   aneuploid\n3     5.2       1   59    2   9.99      3        7     diploid\n4     3.2       1   62    2   3.57      2        4     diploid\n5     1.9       1   64    2  22.56      4        8  tetraploid\n6     4.8       0   69    1   6.14      3        7     diploid\n7     5.8       0   75    2  13.69      2      NaN  tetraploid\n8     7.3       0   71    2    NaN      3        7   aneuploid\n9     3.7       1   73    2  11.77      3        6     diploid\n
    \n

    You have to use quoting=csv.QUOTING_NONNUMERIC* when outputing the csv:

    \n
    In [21]: s = StringIO()\n\nIn [22]: df.to_csv(s, quoting=2)  # or output to file instead\n\nIn [23]: s.getvalue()\nOut[23]: '"","pgtime","pgstat","age","eet","g2","grade","gleason","ploidy"\n1,6.1,0,64,2,10.26,2,4.0,"diploid"\n2,9.4,0,62,1,"",3,8.0,"aneuploid"\n3,5.2,1,59,2,9.99,3,7.0,"diploid"\n4,3.2,1,62,2,3.57,2,4.0,"diploid"\n5,1.9,1,64,2,22.56,4,8.0,"tetraploid"\n6,4.8,0,69,1,6.14,3,7.0,"diploid"\n7,5.8,0,75,2,13.69,2,"","tetraploid"\n8,7.3,0,71,2,"",3,7.0,"aneuploid"\n9,3.7,1,73,2,11.77,3,6.0,"diploid"\n'\n
    \n

    * QUOTING_NONNUMERIC is 2.

    \n

    Now, this isn't quite what you want, since the index column is not quoted, I would just modify the index:

    \n
    In [24]: df.index = df.index.astype(str)  # unicode in python 3?\n\nIn [25]: s = StringIO()\n\nIn [26]: df.to_csv(s, quoting=2)\n\nIn [27]: s.getvalue()\nOut[27]: '"","pgtime","pgstat","age","eet","g2","grade","gleason","ploidy"\n"1",6.1,0,64,2,10.26,2,4.0,"diploid"\n"2",9.4,0,62,1,"",3,8.0,"aneuploid"\n"3",5.2,1,59,2,9.99,3,7.0,"diploid"\n"4",3.2,1,62,2,3.57,2,4.0,"diploid"\n"5",1.9,1,64,2,22.56,4,8.0,"tetraploid"\n"6",4.8,0,69,1,6.14,3,7.0,"diploid"\n"7",5.8,0,75,2,13.69,2,"","tetraploid"\n"8",7.3,0,71,2,"",3,7.0,"aneuploid"\n"9",3.7,1,73,2,11.77,3,6.0,"diploid"\n'\n
    \n

    As required.

    \n soup wrap:

    First off just to demonstrate that reading this in is fine:

    In [11]: df = pd.read_clipboard(sep=',', index_col=0)
    
    In [12]: df
    Out[12]:
       pgtime  pgstat  age  eet     g2  grade  gleason      ploidy
    1     6.1       0   64    2  10.26      2        4     diploid
    2     9.4       0   62    1    NaN      3        8   aneuploid
    3     5.2       1   59    2   9.99      3        7     diploid
    4     3.2       1   62    2   3.57      2        4     diploid
    5     1.9       1   64    2  22.56      4        8  tetraploid
    6     4.8       0   69    1   6.14      3        7     diploid
    7     5.8       0   75    2  13.69      2      NaN  tetraploid
    8     7.3       0   71    2    NaN      3        7   aneuploid
    9     3.7       1   73    2  11.77      3        6     diploid
    

    You have to use quoting=csv.QUOTING_NONNUMERIC* when outputing the csv:

    In [21]: s = StringIO()
    
    In [22]: df.to_csv(s, quoting=2)  # or output to file instead
    
    In [23]: s.getvalue()
    Out[23]: '"","pgtime","pgstat","age","eet","g2","grade","gleason","ploidy"\n1,6.1,0,64,2,10.26,2,4.0,"diploid"\n2,9.4,0,62,1,"",3,8.0,"aneuploid"\n3,5.2,1,59,2,9.99,3,7.0,"diploid"\n4,3.2,1,62,2,3.57,2,4.0,"diploid"\n5,1.9,1,64,2,22.56,4,8.0,"tetraploid"\n6,4.8,0,69,1,6.14,3,7.0,"diploid"\n7,5.8,0,75,2,13.69,2,"","tetraploid"\n8,7.3,0,71,2,"",3,7.0,"aneuploid"\n9,3.7,1,73,2,11.77,3,6.0,"diploid"\n'
    

    * QUOTING_NONNUMERIC is 2.

    Now, this isn't quite what you want, since the index column is not quoted, I would just modify the index:

    In [24]: df.index = df.index.astype(str)  # unicode in python 3?
    
    In [25]: s = StringIO()
    
    In [26]: df.to_csv(s, quoting=2)
    
    In [27]: s.getvalue()
    Out[27]: '"","pgtime","pgstat","age","eet","g2","grade","gleason","ploidy"\n"1",6.1,0,64,2,10.26,2,4.0,"diploid"\n"2",9.4,0,62,1,"",3,8.0,"aneuploid"\n"3",5.2,1,59,2,9.99,3,7.0,"diploid"\n"4",3.2,1,62,2,3.57,2,4.0,"diploid"\n"5",1.9,1,64,2,22.56,4,8.0,"tetraploid"\n"6",4.8,0,69,1,6.14,3,7.0,"diploid"\n"7",5.8,0,75,2,13.69,2,"","tetraploid"\n"8",7.3,0,71,2,"",3,7.0,"aneuploid"\n"9",3.7,1,73,2,11.77,3,6.0,"diploid"\n'
    

    As required.

    qid & accept id: (24191793, 24192441) query: Store input based on computer and change stored input on command soup:

    You may try to do this in such way that first simply ask the user's name, then check if the file names.txt exists or not, if not then create a new file with the name names.txt and append the user's name to it. If the file exists now check if it contains user's name or not, if it contains then say 'Hi+ name', else append the name to the file.

    \n

    Here is a quick and dirty fix for your code (can be improved further!):

    \n
    import os\n#hard code the path to the external file\nexternal_file = 'names.txt'\n#Ask the user's name\nname = raw_input("What's your name?")\n#if file exists, use it to load name, else create a new file\nif not os.path.exists(external_file):\n    with open(external_file, "a") as f: # using "a" will append to the file\n        f.write(name)\n        f.write("\n")\n        f.close()\nelse:\n    #if file exists, use it to load name, else ask user\n    with open(external_file, "r+") as f:# r+ open a file for reading & writing\n        lines = f.read().split('\n') # split the names \n        #print lines\n        if name in lines:\n            print "Hi {}".format(name)\n        else:\n            f.seek(0,2) # Resolves an issue in Windows\n            f.write(name)\n            f.write("\n")\n            f.close()\n
    \n

    Update: Modified version to check for a harcoded name only:

    \n
    import os\n#hard code the path to the external file\nexternal_file = 'names.txt'\nusername = 'testuser'# Our hardcoded name\n\n#if file doesn' exists, create a new file\nif not os.path.exists(external_file):\n    #Ask the user's name\n    name = raw_input("What's your name?")\n    with open(external_file, "a") as f: # using "a" will append to the file\n        f.write(name)# Write the name to names.txt\n        f.write("\n")\n        f.close()\nelse:\n    #if file exists, use it to load name, else ask user\n    with open(external_file, "r+") as f:# r+ open a file for reading & writing\n        lines = f.read().split('\n') # split the names \n        print lines\n        if username in lines: #Check if the file has any username as 'testuser'\n            print "Hi {}".format(username)\n        else: # If there is no username as 'testuser' then ask for a name\n            name = raw_input("What's your name?")\n            f.seek(0,2) # Resolves an issue in Windows\n            f.write(name)# Write the name to names.txt\n            f.write("\n")\n            f.close()\n
    \n

    Reason for using file.seek() is here.

    \n soup wrap:

    You may try to do this in such way that first simply ask the user's name, then check if the file names.txt exists or not, if not then create a new file with the name names.txt and append the user's name to it. If the file exists now check if it contains user's name or not, if it contains then say 'Hi+ name', else append the name to the file.

    Here is a quick and dirty fix for your code (can be improved further!):

    import os
    #hard code the path to the external file
    external_file = 'names.txt'
    #Ask the user's name
    name = raw_input("What's your name?")
    #if file exists, use it to load name, else create a new file
    if not os.path.exists(external_file):
        with open(external_file, "a") as f: # using "a" will append to the file
            f.write(name)
            f.write("\n")
            f.close()
    else:
        #if file exists, use it to load name, else ask user
        with open(external_file, "r+") as f:# r+ open a file for reading & writing
            lines = f.read().split('\n') # split the names 
            #print lines
            if name in lines:
                print "Hi {}".format(name)
            else:
                f.seek(0,2) # Resolves an issue in Windows
                f.write(name)
                f.write("\n")
                f.close()
    

    Update: Modified version to check for a harcoded name only:

    import os
    #hard code the path to the external file
    external_file = 'names.txt'
    username = 'testuser'# Our hardcoded name
    
    #if file doesn' exists, create a new file
    if not os.path.exists(external_file):
        #Ask the user's name
        name = raw_input("What's your name?")
        with open(external_file, "a") as f: # using "a" will append to the file
            f.write(name)# Write the name to names.txt
            f.write("\n")
            f.close()
    else:
        #if file exists, use it to load name, else ask user
        with open(external_file, "r+") as f:# r+ open a file for reading & writing
            lines = f.read().split('\n') # split the names 
            print lines
            if username in lines: #Check if the file has any username as 'testuser'
                print "Hi {}".format(username)
            else: # If there is no username as 'testuser' then ask for a name
                name = raw_input("What's your name?")
                f.seek(0,2) # Resolves an issue in Windows
                f.write(name)# Write the name to names.txt
                f.write("\n")
                f.close()
    

    Reason for using file.seek() is here.

    qid & accept id: (24195234, 24195282) query: Dynamically pass parameters to function soup:

    Write a decorator, and use a splat operator to handle arbitrary arguments.

    \n

    Example:

    \n
    def pause_wrapper(x, n):\n    def decorator(f):\n        config = [x, time.time()+n]\n        def wrapped(*args, **kwargs):\n            if config[0] == 0:\n                time.sleep(config[1] - time.time())\n                config = [x, time.time() + n]\n\n            return f(*args, **kwargs)\n        return wrapped\n    return decorator\n
    \n

    and usage:

    \n
    @pause_wrapper(x, n)\ndef function(a, b, c):\n    ...\n
    \n
    \n

    The *args and **kwargs are informally called "splat" arguments. A function that takes *args, **kwargs receives all positional parameters in the tuple args and all keyword arguments in the dictionary kwargs. (You can have other arguments besides the splats, in which case the splats soak up all arguments not sent to named arguments).

    \n

    Passing *args and **kwargs has the opposite effect, passing the contents of args as extra positional parameters, and kwargs as keyword parameters.

    \n

    Using both allows you to handle any set of arguments, in or out, letting you do transparent wrapping (like this example).

    \n soup wrap:

    Write a decorator, and use a splat operator to handle arbitrary arguments.

    Example:

    def pause_wrapper(x, n):
        def decorator(f):
            config = [x, time.time()+n]
            def wrapped(*args, **kwargs):
                if config[0] == 0:
                    time.sleep(config[1] - time.time())
                    config = [x, time.time() + n]
    
                return f(*args, **kwargs)
            return wrapped
        return decorator
    

    and usage:

    @pause_wrapper(x, n)
    def function(a, b, c):
        ...
    

    The *args and **kwargs are informally called "splat" arguments. A function that takes *args, **kwargs receives all positional parameters in the tuple args and all keyword arguments in the dictionary kwargs. (You can have other arguments besides the splats, in which case the splats soak up all arguments not sent to named arguments).

    Passing *args and **kwargs has the opposite effect, passing the contents of args as extra positional parameters, and kwargs as keyword parameters.

    Using both allows you to handle any set of arguments, in or out, letting you do transparent wrapping (like this example).

    qid & accept id: (24195898, 24195969) query: generate lists from 3 dimensional array soup:

    Since you are already using numpy:

    \n
    np.swapaxes(np.swapaxes(myArray,0,2),0,1)\n
    \n

    outputs:

    \n
    array([[[ 1,  0,  2],\n        [ 2,  1,  3],\n        [ 3,  2,  4]],\n\n       [[ 0,  1,  5],\n        [ 0,  1,  6],\n        [ 0,  1,  7]],\n\n       [[ 2,  3,  8],\n        [ 2,  3,  9],\n        [ 2,  3, 10]]])\n
    \n soup wrap:

    Since you are already using numpy:

    np.swapaxes(np.swapaxes(myArray,0,2),0,1)
    

    outputs:

    array([[[ 1,  0,  2],
            [ 2,  1,  3],
            [ 3,  2,  4]],
    
           [[ 0,  1,  5],
            [ 0,  1,  6],
            [ 0,  1,  7]],
    
           [[ 2,  3,  8],
            [ 2,  3,  9],
            [ 2,  3, 10]]])
    
    qid & accept id: (24216198, 24216256) query: Storing a string and a set in a dictionary soup:

    Try the following code:

    \n
    def unique_words(input_file):\n    file = open(input_file)\n    wordlist = {}\n    dups = []\n    copy = []\n    for index, value in enumerate(file):\n        words = value.split()\n        for word in words:\n            wordlist[word] = index\n            dups.append(word)\n    for word in dups:\n        if dups.count(word) != 1 and word not in copy:\n            del(wordlist[word])\n            copy.append(word)\n    for item in wordlist:\n        print 'The unique word '+item+' occurs on line '+str(wordlist[item])\n
    \n

    It adds all the values to a dict and to a list, and then runs to the list to make sure each value only occurs once. If not, we delete it from the dict, leaving us with only the unique data.

    \n

    This runs as:

    \n
    >>> unique_words('test.txt')\nThe unique word them occurs on line 2\nThe unique word I occurs on line 1\nThe unique word there occurs on line 0\nThe unique word some occurs on line 2\nThe unique word times occurs on line 3\nThe unique word say occurs on line 2\nThe unique word too occurs on line 3\nThe unique word have occurs on line 1\nThe unique word of occurs on line 2\n>>> \n
    \n soup wrap:

    Try the following code:

    def unique_words(input_file):
        file = open(input_file)
        wordlist = {}
        dups = []
        copy = []
        for index, value in enumerate(file):
            words = value.split()
            for word in words:
                wordlist[word] = index
                dups.append(word)
        for word in dups:
            if dups.count(word) != 1 and word not in copy:
                del(wordlist[word])
                copy.append(word)
        for item in wordlist:
            print 'The unique word '+item+' occurs on line '+str(wordlist[item])
    

    It adds all the values to a dict and to a list, and then runs to the list to make sure each value only occurs once. If not, we delete it from the dict, leaving us with only the unique data.

    This runs as:

    >>> unique_words('test.txt')
    The unique word them occurs on line 2
    The unique word I occurs on line 1
    The unique word there occurs on line 0
    The unique word some occurs on line 2
    The unique word times occurs on line 3
    The unique word say occurs on line 2
    The unique word too occurs on line 3
    The unique word have occurs on line 1
    The unique word of occurs on line 2
    >>> 
    
    qid & accept id: (24243929, 24581675) query: Display menu bar items of IE using Python soup:

    You just need to know the target window title. Then you can simulate a mouse button click or use the Alt / Menu + F shortcut to open the menu.

    \n
    #!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom ctypes import windll\n\ndef main():\n    find_window = windll.user32.FindWindowW\n    send_message = windll.user32.SendMessageW\n\n    window_name = u""\n    window_handle = find_window(0, window_name)\n\n    if window_handle > 0:\n        show_menu(handle)\n    else:\n        print("Window handle not found")\n\n\nif __name__ == "__main__":\n    main()\n
    \n

    If you want the first solution (mouse clicks), then add:

    \n
    WM_KEYDOWN = 0x0100\nWM_KEYUP = 0x0101\n\ndef point_to_long(x, y):\n    return (y * 0x10000) + x\n\ndef show_menu(handle):\n    target_pos = point_to_long(30, 40)\n    send_message(window_handle, WM_LBUTTONDOWN, 0, target_pos)\n    send_message(window_handle, WM_LBUTTONUP, 0, target_pos)\n
    \n

    Otherwise (key pressing), use:

    \n
    VK_MENU = 0x12\nVK_F = 0x46\n\ndef show_menu(handle):\n    for key in (VK_MENU, VK_F):\n        send_message(window_handle, WM_KEYDOWN, key, 0)\n        send_message(window_handle, WM_KEYUP, key, 0)\n
    \n

    Hope this helps.

    \n soup wrap:

    You just need to know the target window title. Then you can simulate a mouse button click or use the Alt / Menu + F shortcut to open the menu.

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    from ctypes import windll
    
    def main():
        find_window = windll.user32.FindWindowW
        send_message = windll.user32.SendMessageW
    
        window_name = u""
        window_handle = find_window(0, window_name)
    
        if window_handle > 0:
            show_menu(handle)
        else:
            print("Window handle not found")
    
    
    if __name__ == "__main__":
        main()
    

    If you want the first solution (mouse clicks), then add:

    WM_KEYDOWN = 0x0100
    WM_KEYUP = 0x0101
    
    def point_to_long(x, y):
        return (y * 0x10000) + x
    
    def show_menu(handle):
        target_pos = point_to_long(30, 40)
        send_message(window_handle, WM_LBUTTONDOWN, 0, target_pos)
        send_message(window_handle, WM_LBUTTONUP, 0, target_pos)
    

    Otherwise (key pressing), use:

    VK_MENU = 0x12
    VK_F = 0x46
    
    def show_menu(handle):
        for key in (VK_MENU, VK_F):
            send_message(window_handle, WM_KEYDOWN, key, 0)
            send_message(window_handle, WM_KEYUP, key, 0)
    

    Hope this helps.

    qid & accept id: (24269756, 24269930) query: Turning binary string into an image with PIL soup:

    You could use img.putdata:

    \n
    import Image\n\nvalue = "0110100001100101011011000110110001101111"\n\ncmap = {'0': (255,255,255),\n        '1': (0,0,0)}\n\ndata = [cmap[letter] for letter in value]\nimg = Image.new('RGB', (8, len(value)//8), "white")\nimg.putdata(data)\nimg.show()        \n
    \n

    enter image description here

    \n
    \n

    If you have NumPy, you could instead use Image.fromarray:

    \n
    import Image\nimport numpy as np\n\nvalue = "0110100001100101011011000110110001101111"\n\ncarr = np.array([(255,255,255), (0,0,0)], dtype='uint8')\ndata = carr[np.array(map(int, list(value)))].reshape(-1, 8, 3)\nimg = Image.fromarray(data, 'RGB')\nimg.save('/tmp/out.png', 'PNG')\n
    \n

    but this timeit test suggests using putdata is faster:

    \n
    value = "0110100001100101011011000110110001101111"*10**5\n\ndef using_fromarray():\n    carr = np.array([(255,255,255), (0,0,0)], dtype='uint8')\n    data = carr[np.array(map(int, list(value)))].reshape(-1, 8, 3)\n    img = Image.fromarray(data, 'RGB')\n    return img\n\ndef using_putdata():\n    cmap = {'0': (255,255,255),\n            '1': (0,0,0)}\n\n    data = [cmap[letter] for letter in value]\n    img = Image.new('RGB', (8, len(value)//8), "white")\n    img.putdata(data)\n    return img\n
    \n
    \n
    In [79]: %timeit using_fromarray()\n1 loops, best of 3: 1.67 s per loop\n\nIn [80]: %timeit using_putdata()\n1 loops, best of 3: 632 ms per loop\n
    \n soup wrap:

    You could use img.putdata:

    import Image
    
    value = "0110100001100101011011000110110001101111"
    
    cmap = {'0': (255,255,255),
            '1': (0,0,0)}
    
    data = [cmap[letter] for letter in value]
    img = Image.new('RGB', (8, len(value)//8), "white")
    img.putdata(data)
    img.show()        
    

    enter image description here


    If you have NumPy, you could instead use Image.fromarray:

    import Image
    import numpy as np
    
    value = "0110100001100101011011000110110001101111"
    
    carr = np.array([(255,255,255), (0,0,0)], dtype='uint8')
    data = carr[np.array(map(int, list(value)))].reshape(-1, 8, 3)
    img = Image.fromarray(data, 'RGB')
    img.save('/tmp/out.png', 'PNG')
    

    but this timeit test suggests using putdata is faster:

    value = "0110100001100101011011000110110001101111"*10**5
    
    def using_fromarray():
        carr = np.array([(255,255,255), (0,0,0)], dtype='uint8')
        data = carr[np.array(map(int, list(value)))].reshape(-1, 8, 3)
        img = Image.fromarray(data, 'RGB')
        return img
    
    def using_putdata():
        cmap = {'0': (255,255,255),
                '1': (0,0,0)}
    
        data = [cmap[letter] for letter in value]
        img = Image.new('RGB', (8, len(value)//8), "white")
        img.putdata(data)
        return img
    

    In [79]: %timeit using_fromarray()
    1 loops, best of 3: 1.67 s per loop
    
    In [80]: %timeit using_putdata()
    1 loops, best of 3: 632 ms per loop
    
    qid & accept id: (24285311, 24285831) query: how to access nargs of optparse-add_action? soup:

    AFAIK optparse doesn't provide that value in the public API via the result of parse_args, but you don't need it.\nYou can simply name the constant before using it:

    \n
    NUM_CATEGORIES = 4\n\n# ...\n\nparser.add_option('-c', '--categories', dest='categories', nargs=NUM_CATEGORIES)\n\n# later\n\nif not options.categories:\n    options.categories = [raw_input('Enter input: ') for _ in range(NUM_CATEGORIES)]\n
    \n

    In fact the add_option method returns the Option object which does have the nargs field, so you could do:

    \n
    categories_opt = parser.add_option(..., nargs=4)\n\n# ...\n\nif not options.categories:\n    options.categories = [raw_input('Enter input: ') for _ in range(categories_opt.nargs)]\n
    \n

    However I really don't see how this is better than using a costant in the first place.

    \n soup wrap:

    AFAIK optparse doesn't provide that value in the public API via the result of parse_args, but you don't need it. You can simply name the constant before using it:

    NUM_CATEGORIES = 4
    
    # ...
    
    parser.add_option('-c', '--categories', dest='categories', nargs=NUM_CATEGORIES)
    
    # later
    
    if not options.categories:
        options.categories = [raw_input('Enter input: ') for _ in range(NUM_CATEGORIES)]
    

    In fact the add_option method returns the Option object which does have the nargs field, so you could do:

    categories_opt = parser.add_option(..., nargs=4)
    
    # ...
    
    if not options.categories:
        options.categories = [raw_input('Enter input: ') for _ in range(categories_opt.nargs)]
    

    However I really don't see how this is better than using a costant in the first place.

    qid & accept id: (24314839, 24315659) query: How to string format OptionParser() help message? soup:

    Might I suggest argparse?

    \n

    I'm not sure if this is supported in OptionParser, but I would suggest using a triple quote
    \ni.e:

    \n
    parser = OptionParser()\nparser.add_option('--s',\n                  dest='s'\n                  type='string'\n                  help='''\nWith triple quotes I can directly put in anything including line spaces.\n\n will appear as a string rather than a newline.''')\n
    \n

    argparse example:

    \n
    import argparse\nparser = argparse.ArgumentParser()\nparser.add_argument('--s',\n                  help='''first line\nsecond line''')\nargs = parser.parse_args()\nprint args.s\n
    \n soup wrap:

    Might I suggest argparse?

    I'm not sure if this is supported in OptionParser, but I would suggest using a triple quote
    i.e:

    parser = OptionParser()
    parser.add_option('--s',
                      dest='s'
                      type='string'
                      help='''
    With triple quotes I can directly put in anything including line spaces.
    \n will appear as a string rather than a newline.''')
    

    argparse example:

    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('--s',
                      help='''first line
    second line''')
    args = parser.parse_args()
    print args.s
    
    qid & accept id: (24319973, 24321434) query: How to handle multiple user type in Django soup:

    The better approach for achieving your requirement is to use the inbuilt Group and Permissions model in Django. But since Permissions can be a little tricky, an alternative approach is to create a UserProfile model like below:

    \n
    from django.contrib.auth.models import User\nclass UserProfile(models.Model):\n    user = models.ForeignKey(User)\n    type = models.CharField(max_length=15)\n
    \n

    Then use decorators for controlling access to the views like this:

    \n
    from django.contrib.auth.decorators import user_passes_test\n@user_pass_test(lambda u: u.get_profile().type == 'client')\ndef view_for_client(request):\n    ...\n
    \n

    The UserProfile model will also be useful to save all of the preferences of your user. Also you would need to set the following setting:

    \n
    AUTH_PROFILE_MODULE = 'accounts.UserProfile'\n
    \n soup wrap:

    The better approach for achieving your requirement is to use the inbuilt Group and Permissions model in Django. But since Permissions can be a little tricky, an alternative approach is to create a UserProfile model like below:

    from django.contrib.auth.models import User
    class UserProfile(models.Model):
        user = models.ForeignKey(User)
        type = models.CharField(max_length=15)
    

    Then use decorators for controlling access to the views like this:

    from django.contrib.auth.decorators import user_passes_test
    @user_pass_test(lambda u: u.get_profile().type == 'client')
    def view_for_client(request):
        ...
    

    The UserProfile model will also be useful to save all of the preferences of your user. Also you would need to set the following setting:

    AUTH_PROFILE_MODULE = 'accounts.UserProfile'
    
    qid & accept id: (24325232, 24551140) query: Two corresponding y-axis soup:

    So I finally managed to do this by creating a new scale in matplotlib. It can be improved but here is my class definition, based on http://matplotlib.org/examples/api/custom_scale_example.html :

    \n
    import numpy as np\nimport matplotlib.pyplot as plt\n\nfrom matplotlib import scale as mscale\nfrom matplotlib import transforms as mtransforms\n\nclass MagScale(mscale.ScaleBase):\n    name = 'mag'\n\n    def __init__(self, axis, **kwargs):\n        mscale.ScaleBase.__init__(self)\n        self.thresh = None #thresh\n\n    def get_transform(self):\n        return self.MagTransform(self.thresh)\n\n    def set_default_locators_and_formatters(self, axis):\n        pass\n\n    class MagTransform(mtransforms.Transform):\n        input_dims = 1\n        output_dims = 1\n        is_separable = True\n\n        def __init__(self, thresh):\n            mtransforms.Transform.__init__(self)\n            self.thresh = thresh\n\n        def transform_non_affine(self, mag):\n            return 10**((np.array(mag) -1)/(-2.5))\n\n        def inverted(self):\n            return MagScale.InvertedMagTransform(self.thresh)\n\n    class InvertedMagTransform(mtransforms.Transform):\n        input_dims = 1\n        output_dims = 1\n        is_separable = True\n\n        def __init__(self, thresh):\n            mtransforms.Transform.__init__(self)\n            self.thresh = thresh\n\n        def transform_non_affine(self, flux):\n            return -2.5 * np.log10(np.array(flux)) + 1.\n\n        def inverted(self):\n            return MagScale.MagTransform(self.thresh)\n\n\n\ndef flux_to_mag(flux):\n    return  -2.5 * np.log10(flux) + 1\n\n\nmscale.register_scale(MagScale)\n
    \n

    And here is a working example :

    \n
    x    = np.arange(20.)\nflux = x * 2 + 1\nmag  = flux_to_mag(flux)\n\nMagTransform = MagScale.InvertedMagTransform(0)\n\n\nfig = plt.figure()\nax_flux = fig.add_subplot(111)\n\nax_flux.plot(x, flux,'-')\nax_flux.set_ylim([1,40])\nax_flux.set_ylabel('flux')\n\nax_mag  = ax_flux.twinx()\nax_mag.set_ylim(MagTransform.transform_non_affine(ax_flux.get_ylim())) #There may be an easier to do this.\nax_mag.set_yscale('mag')\n\nax_mag.plot(x,mag,'+')\nplt.show()\n
    \n soup wrap:

    So I finally managed to do this by creating a new scale in matplotlib. It can be improved but here is my class definition, based on http://matplotlib.org/examples/api/custom_scale_example.html :

    import numpy as np
    import matplotlib.pyplot as plt
    
    from matplotlib import scale as mscale
    from matplotlib import transforms as mtransforms
    
    class MagScale(mscale.ScaleBase):
        name = 'mag'
    
        def __init__(self, axis, **kwargs):
            mscale.ScaleBase.__init__(self)
            self.thresh = None #thresh
    
        def get_transform(self):
            return self.MagTransform(self.thresh)
    
        def set_default_locators_and_formatters(self, axis):
            pass
    
        class MagTransform(mtransforms.Transform):
            input_dims = 1
            output_dims = 1
            is_separable = True
    
            def __init__(self, thresh):
                mtransforms.Transform.__init__(self)
                self.thresh = thresh
    
            def transform_non_affine(self, mag):
                return 10**((np.array(mag) -1)/(-2.5))
    
            def inverted(self):
                return MagScale.InvertedMagTransform(self.thresh)
    
        class InvertedMagTransform(mtransforms.Transform):
            input_dims = 1
            output_dims = 1
            is_separable = True
    
            def __init__(self, thresh):
                mtransforms.Transform.__init__(self)
                self.thresh = thresh
    
            def transform_non_affine(self, flux):
                return -2.5 * np.log10(np.array(flux)) + 1.
    
            def inverted(self):
                return MagScale.MagTransform(self.thresh)
    
    
    
    def flux_to_mag(flux):
        return  -2.5 * np.log10(flux) + 1
    
    
    mscale.register_scale(MagScale)
    

    And here is a working example :

    x    = np.arange(20.)
    flux = x * 2 + 1
    mag  = flux_to_mag(flux)
    
    MagTransform = MagScale.InvertedMagTransform(0)
    
    
    fig = plt.figure()
    ax_flux = fig.add_subplot(111)
    
    ax_flux.plot(x, flux,'-')
    ax_flux.set_ylim([1,40])
    ax_flux.set_ylabel('flux')
    
    ax_mag  = ax_flux.twinx()
    ax_mag.set_ylim(MagTransform.transform_non_affine(ax_flux.get_ylim())) #There may be an easier to do this.
    ax_mag.set_yscale('mag')
    
    ax_mag.plot(x,mag,'+')
    plt.show()
    
    qid & accept id: (24327683, 24328048) query: Pandas Dataframe row by row fill new column soup:

    You can directly use column indexing (http://pandas.pydata.org/pandas-docs/stable/indexing.html) to compare and filter ratios.

    \n
    buy_ratio = (abs(df["Buy"])  > abs(df["Sell"])) * df["Price"] / df["Buy"]\nsell_ratio = (abs(df["Buy"])  <= abs(df["Sell"])) * df["Price"] / df["Sell"]\ndf["Ratio"] = buy_ratio + sell_ratio\n
    \n

    In this case,

    \n
      \n
    1. The condition (abs(df["Buy"]) > abs(df["Sell"])) gives a 0/1 valued column depending on whether buy or sell is greater. You multiply that column by Price/Buy. If Sell price is high, the multiplication will be zero.
    2. \n
    3. Perform a symmetric operation for Sell
    4. \n
    5. Finally, add them together and directly set the column named "Ratio" using indexing.
    6. \n
    \n

    Edit

    \n

    Here is the solution using apply - First define a function operating in rows of the DataFrame.

    \n
    def f(row):\n  if abs(row["Buy"]) > abs(row["Sell"]):\n    return row["Price"] / row["Buy"]\n  else:\n    return row["Price"] / row["Sell"]\n
    \n

    Finally, set the Ratio column appropriately using apply.

    \n

    df["Ratio"] = df.apply(f, axis=1)

    \n soup wrap:

    You can directly use column indexing (http://pandas.pydata.org/pandas-docs/stable/indexing.html) to compare and filter ratios.

    buy_ratio = (abs(df["Buy"])  > abs(df["Sell"])) * df["Price"] / df["Buy"]
    sell_ratio = (abs(df["Buy"])  <= abs(df["Sell"])) * df["Price"] / df["Sell"]
    df["Ratio"] = buy_ratio + sell_ratio
    

    In this case,

    1. The condition (abs(df["Buy"]) > abs(df["Sell"])) gives a 0/1 valued column depending on whether buy or sell is greater. You multiply that column by Price/Buy. If Sell price is high, the multiplication will be zero.
    2. Perform a symmetric operation for Sell
    3. Finally, add them together and directly set the column named "Ratio" using indexing.

    Edit

    Here is the solution using apply - First define a function operating in rows of the DataFrame.

    def f(row):
      if abs(row["Buy"]) > abs(row["Sell"]):
        return row["Price"] / row["Buy"]
      else:
        return row["Price"] / row["Sell"]
    

    Finally, set the Ratio column appropriately using apply.

    df["Ratio"] = df.apply(f, axis=1)

    qid & accept id: (24406677, 24407171) query: Python array from CSV file soup:

    Consider using other format, e.g. YAML

    \n

    data.yaml

    \n
    MyHome:\n- "10.0.0.3"\n- "10.0.0.9"\n- "10.0.0.234"\n
    \n

    Which can be read as follows:

    \n
    >>> import yaml\n>>> fname = "data.yaml"\n>>> with open(fname) as f:\n...     cfg = yaml.load(f)\n...\n>>> cfg\n{'MyHome': ['10.0.0.3', '10.0.0.9', '10.0.0.234']}\n
    \n

    Before you run it, you have to install PyYaml

    \n
    $ pip install pyyaml\n
    \n

    Other formats

    \n

    Very similar approach works for JSON.

    \n

    A bit more work but rather simple is using INI file format (see ConfigParser)

    \n

    With CSV you can read the data, but CSV fits well for tabular structures and you seem to have rather tree like data.

    \n soup wrap:

    Consider using other format, e.g. YAML

    data.yaml

    MyHome:
    - "10.0.0.3"
    - "10.0.0.9"
    - "10.0.0.234"
    

    Which can be read as follows:

    >>> import yaml
    >>> fname = "data.yaml"
    >>> with open(fname) as f:
    ...     cfg = yaml.load(f)
    ...
    >>> cfg
    {'MyHome': ['10.0.0.3', '10.0.0.9', '10.0.0.234']}
    

    Before you run it, you have to install PyYaml

    $ pip install pyyaml
    

    Other formats

    Very similar approach works for JSON.

    A bit more work but rather simple is using INI file format (see ConfigParser)

    With CSV you can read the data, but CSV fits well for tabular structures and you seem to have rather tree like data.

    qid & accept id: (24418864, 24419127) query: Pygame How to use walking animations soup:

    You can create a circular buffer with the names.

    \n
    images = ['data/down1.png','data/down2.png','data/down3.png']\n
    \n

    and then change a counter

    \n
    ...\n\nif event.type == pygame.KEYDOWN and event.key == pygame.K_DOWN:\n    player = pygame.image.load(images[counter])\n    counter = (counter + 1) % len(images)\n    playerY = playerY + 5\n...\n
    \n

    It will change the image while the button is pressed.

    \n soup wrap:

    You can create a circular buffer with the names.

    images = ['data/down1.png','data/down2.png','data/down3.png']
    

    and then change a counter

    ...
    
    if event.type == pygame.KEYDOWN and event.key == pygame.K_DOWN:
        player = pygame.image.load(images[counter])
        counter = (counter + 1) % len(images)
        playerY = playerY + 5
    ...
    

    It will change the image while the button is pressed.

    qid & accept id: (24427828, 24429798) query: Calculate point based on distance and direction soup:

    Edit 2

    \n

    Okay, there is an out-of-the-box solution with geopy, it is just not well-documented:

    \n
    import geopy\nimport geopy.distance\n\n# Define starting point.\nstart = geopy.Point(48.853, 2.349)\n\n# Define a general distance object, initialized with a distance of 1 km.\nd = geopy.distance.VincentyDistance(kilometers = 1)\n\n# Use the `destination` method with a bearing of 0 degrees (which is north)\n# in order to go from point `start` 1 km to north.\nprint d.destination(point=start, bearing=0)\n
    \n

    The output is 48 52m 0.0s N, 2 21m 0.0s E (or Point(48.861992239749355, 2.349, 0.0)).

    \n

    A bearing of 90 degrees corresponds to East, 180 degrees is South, and so on.

    \n

    Older answers:

    \n

    A simple solution would be:

    \n
    def get_new_point():\n    # After going 1 km North, 1 km East, 1 km South and 1 km West\n    # we are back where we were before.\n    return (-24680.1613, 6708860.65389)\n
    \n

    However, I am not sure if that serves your purposes in all generality.

    \n

    Okay, seriously, you can get started using geopy. First of all, you need to define your starting point in a coordinate system known to geopy. At the first glance, it seems that you cannot just "add" a certain distance into a certain direction. The reason, I think, is that calculation of the distance is a problem without simple inverse solution. Or how would we invert the measure function defined in https://code.google.com/p/geopy/source/browse/trunk/geopy/distance.py#217?

    \n

    Hence, you might want to take an iterative approach.

    \n

    As stated here: https://stackoverflow.com/a/9078861/145400 you can calculate the distance between two given points like that:

    \n
    pt1 = geopy.Point(48.853, 2.349)\npt2 = geopy.Point(52.516, 13.378)\n# distance.distance() is the  VincentyDistance by default.\ndist = geopy.distance.distance(pt1, pt2).km\n
    \n

    For going north by one kilometer you would iteratively change the latitude into a positive direction and check against the distance. You can automate this approach using a simple iterative solver from e.g. SciPy: just find the root of geopy.distance.distance().km - 1 via one of the optimizers listed in http://docs.scipy.org/doc/scipy/reference/optimize.html#root-finding.

    \n

    I think it is clear that you go south by changing the latitude into a negative direction, and west and east by changing the longitude.

    \n

    I have no experience with such geo calculations, this iterative approach only makes sense if there is no simple direct way for "going north" by a certain distance.

    \n

    Edit: an example implementation of my proposal:

    \n
    import geopy\nimport geopy.distance\nimport scipy.optimize\n\n\ndef north(startpoint, distance_km):\n    """Return target function whose argument is a positive latitude\n    change (in degrees) relative to `startpoint`, and that has a root\n    for a latitude offset that corresponds to a point that is \n    `distance_km` kilometers away from the start point.\n    """\n    def target(latitude_positive_offset):\n        return geopy.distance.distance(\n            startpoint, geopy.Point(\n                latitude=startpoint.latitude + latitude_positive_offset,\n                longitude=startpoint.longitude)\n            ).km - distance_km\n    return target\n\n\nstart = geopy.Point(48.853, 2.349)\nprint "Start: %s" % start\n\n# Find the root of the target function, vary the positve latitude offset between\n# 0 and 2 degrees (which is for sure enough for finding a 1 km distance, but must\n# be adjusted for larger distances).\nlatitude_positive_offset = scipy.optimize.bisect(north(start, 1),  0, 2)\n\n\n# Build Point object for identified point in space.\nend = geopy.Point(\n    latitude=start.latitude + latitude_positive_offset,\n    longitude=start.longitude\n    )\n\nprint "1 km north: %s" % end\n\n# Make the control.\nprint "Control distance between both points: %.4f km." % (\n     geopy.distance.distance(start, end).km)\n
    \n

    Output:

    \n
    $ python test.py \nStart: 48 51m 0.0s N, 2 21m 0.0s E\n1 km north: 48 52m 0.0s N, 2 21m 0.0s E\nControl distance between both points: 1.0000 km.\n
    \n soup wrap:

    Edit 2

    Okay, there is an out-of-the-box solution with geopy, it is just not well-documented:

    import geopy
    import geopy.distance
    
    # Define starting point.
    start = geopy.Point(48.853, 2.349)
    
    # Define a general distance object, initialized with a distance of 1 km.
    d = geopy.distance.VincentyDistance(kilometers = 1)
    
    # Use the `destination` method with a bearing of 0 degrees (which is north)
    # in order to go from point `start` 1 km to north.
    print d.destination(point=start, bearing=0)
    

    The output is 48 52m 0.0s N, 2 21m 0.0s E (or Point(48.861992239749355, 2.349, 0.0)).

    A bearing of 90 degrees corresponds to East, 180 degrees is South, and so on.

    Older answers:

    A simple solution would be:

    def get_new_point():
        # After going 1 km North, 1 km East, 1 km South and 1 km West
        # we are back where we were before.
        return (-24680.1613, 6708860.65389)
    

    However, I am not sure if that serves your purposes in all generality.

    Okay, seriously, you can get started using geopy. First of all, you need to define your starting point in a coordinate system known to geopy. At the first glance, it seems that you cannot just "add" a certain distance into a certain direction. The reason, I think, is that calculation of the distance is a problem without simple inverse solution. Or how would we invert the measure function defined in https://code.google.com/p/geopy/source/browse/trunk/geopy/distance.py#217?

    Hence, you might want to take an iterative approach.

    As stated here: https://stackoverflow.com/a/9078861/145400 you can calculate the distance between two given points like that:

    pt1 = geopy.Point(48.853, 2.349)
    pt2 = geopy.Point(52.516, 13.378)
    # distance.distance() is the  VincentyDistance by default.
    dist = geopy.distance.distance(pt1, pt2).km
    

    For going north by one kilometer you would iteratively change the latitude into a positive direction and check against the distance. You can automate this approach using a simple iterative solver from e.g. SciPy: just find the root of geopy.distance.distance().km - 1 via one of the optimizers listed in http://docs.scipy.org/doc/scipy/reference/optimize.html#root-finding.

    I think it is clear that you go south by changing the latitude into a negative direction, and west and east by changing the longitude.

    I have no experience with such geo calculations, this iterative approach only makes sense if there is no simple direct way for "going north" by a certain distance.

    Edit: an example implementation of my proposal:

    import geopy
    import geopy.distance
    import scipy.optimize
    
    
    def north(startpoint, distance_km):
        """Return target function whose argument is a positive latitude
        change (in degrees) relative to `startpoint`, and that has a root
        for a latitude offset that corresponds to a point that is 
        `distance_km` kilometers away from the start point.
        """
        def target(latitude_positive_offset):
            return geopy.distance.distance(
                startpoint, geopy.Point(
                    latitude=startpoint.latitude + latitude_positive_offset,
                    longitude=startpoint.longitude)
                ).km - distance_km
        return target
    
    
    start = geopy.Point(48.853, 2.349)
    print "Start: %s" % start
    
    # Find the root of the target function, vary the positve latitude offset between
    # 0 and 2 degrees (which is for sure enough for finding a 1 km distance, but must
    # be adjusted for larger distances).
    latitude_positive_offset = scipy.optimize.bisect(north(start, 1),  0, 2)
    
    
    # Build Point object for identified point in space.
    end = geopy.Point(
        latitude=start.latitude + latitude_positive_offset,
        longitude=start.longitude
        )
    
    print "1 km north: %s" % end
    
    # Make the control.
    print "Control distance between both points: %.4f km." % (
         geopy.distance.distance(start, end).km)
    

    Output:

    $ python test.py 
    Start: 48 51m 0.0s N, 2 21m 0.0s E
    1 km north: 48 52m 0.0s N, 2 21m 0.0s E
    Control distance between both points: 1.0000 km.
    
    qid & accept id: (24438325, 24438855) query: parsing single text items from xml with Python soup:
    xmlstr = """\n\n\n  \n    KeyNotOfInterest\n    ValueNotOfInterest\n  \n  \n    ConnectionFile\n    THE TEXT I WANT, IN THIS CASE A FILE PATH\n  \n  \n    KeyAlsoNotOfInterest\n    ValueAlsoNotOfInterest\n  \n\n"""\nfrom lxml import etree\n\ndoc = etree.fromstring(xmlstr.strip())\n#doc = etree.parse("xmlfilename.xml")\n\nxp = "//PropertySetProperty[Key/text()='ConnectionFile']/Value/text()"\nwanted = doc.xpath(xp)[0]\nprint wanted\n
    \n

    or possibly using xpath with a parameter:

    \n
    xp = "//PropertySetProperty[Key/text()=$key]/Value/text()"\nwanted = doc.xpath(xp, key="ConnectionFile")[0]\n
    \n

    The XPath translates to:
    \n"find element PropertSetProperty anywhere in the document, having subelement Key with text value being 'ConnectionFile', and get text value of Value subelement."

    \n

    This assumes, you have lxml installed:

    \n
    $ pip install lxml\n
    \n

    On windows better use:

    \n
    $ easy_install lxml\n
    \n

    as it will install from downloaded exe installer and will not try to compile from sources.

    \n soup wrap:
    xmlstr = """
    
    
      
        KeyNotOfInterest
        ValueNotOfInterest
      
      
        ConnectionFile
        THE TEXT I WANT, IN THIS CASE A FILE PATH
      
      
        KeyAlsoNotOfInterest
        ValueAlsoNotOfInterest
      
    
    """
    from lxml import etree
    
    doc = etree.fromstring(xmlstr.strip())
    #doc = etree.parse("xmlfilename.xml")
    
    xp = "//PropertySetProperty[Key/text()='ConnectionFile']/Value/text()"
    wanted = doc.xpath(xp)[0]
    print wanted
    

    or possibly using xpath with a parameter:

    xp = "//PropertySetProperty[Key/text()=$key]/Value/text()"
    wanted = doc.xpath(xp, key="ConnectionFile")[0]
    

    The XPath translates to:
    "find element PropertSetProperty anywhere in the document, having subelement Key with text value being 'ConnectionFile', and get text value of Value subelement."

    This assumes, you have lxml installed:

    $ pip install lxml
    

    On windows better use:

    $ easy_install lxml
    

    as it will install from downloaded exe installer and will not try to compile from sources.

    qid & accept id: (24445991, 24447253) query: Getting Variable from Applescript and using in Python soup:

    You can also run a python script with command line input from AppleScript:

    \n
    --make sure to escape properly if needed\nset pythonvar to "whatever"\nset outputvar to (do shell script "python '/path/to/script' '" & pythonvar & "'")\n
    \n

    Ned's example has python calling AppleScript, then returning control to python, this is the other way around. Then in Python access list of parameters:

    \n
    import sys\nvar_from_as = sys.argv[1] # for 1rst parameter cause argv[0] is file name\nprint 'this gets returned to AppleScript' # this gets set to outputvar\n
    \n soup wrap:

    You can also run a python script with command line input from AppleScript:

    --make sure to escape properly if needed
    set pythonvar to "whatever"
    set outputvar to (do shell script "python '/path/to/script' '" & pythonvar & "'")
    

    Ned's example has python calling AppleScript, then returning control to python, this is the other way around. Then in Python access list of parameters:

    import sys
    var_from_as = sys.argv[1] # for 1rst parameter cause argv[0] is file name
    print 'this gets returned to AppleScript' # this gets set to outputvar
    
    qid & accept id: (24458430, 24538374) query: make python wait for stored procedure to finish executing soup:

    Here's my workaround:

    \n

    In the database, I make a table called RunningStatus with just one field, status, which is a bit, and just one row, initially set to 0.

    \n

    At the beginning of my stored procedure, I execute the line

    \n
    update RunningStatus set status = 1;\n
    \n

    And at the end of the stored procedure,

    \n
    update RunningStatus set status = 0;\n
    \n

    In my Python script, I open a new connection and cursor to the same database. After my execute line, I simply add

    \n
    while 1:\n    q = status_check_cursor.execute('select status from RunningStatus').fetchone()\n    if q[0] == 0:\n        break\n
    \n

    You need to make a new connection and cursor, because any calls from the old connection will interrupt the stored procedure and potentially cause status to never go back to 0.

    \n

    It's a little janky but it's working great for me!

    \n soup wrap:

    Here's my workaround:

    In the database, I make a table called RunningStatus with just one field, status, which is a bit, and just one row, initially set to 0.

    At the beginning of my stored procedure, I execute the line

    update RunningStatus set status = 1;
    

    And at the end of the stored procedure,

    update RunningStatus set status = 0;
    

    In my Python script, I open a new connection and cursor to the same database. After my execute line, I simply add

    while 1:
        q = status_check_cursor.execute('select status from RunningStatus').fetchone()
        if q[0] == 0:
            break
    

    You need to make a new connection and cursor, because any calls from the old connection will interrupt the stored procedure and potentially cause status to never go back to 0.

    It's a little janky but it's working great for me!

    qid & accept id: (24465953, 24466053) query: python read files and stop when a condition satisfies soup:

    You're looking for the break statement.

    \n
    ...\n    if "color=brown" in part:\n        print part\n        # set some variable to check at the last thing before your other for loops\n        # turnover.\n        br = True\n        break\n
    \n

    and then use that to break out of every other two for loops you've initiated.

    \n
        if br == True:\n        break\n    else:\n        pass\nif br == True:\n    break\nelse:\n    pass\n
    \n soup wrap:

    You're looking for the break statement.

    ...
        if "color=brown" in part:
            print part
            # set some variable to check at the last thing before your other for loops
            # turnover.
            br = True
            break
    

    and then use that to break out of every other two for loops you've initiated.

        if br == True:
            break
        else:
            pass
    if br == True:
        break
    else:
        pass
    
    qid & accept id: (24481614, 24481882) query: SqlAlchemy Dynamic Where soup:

    Rather than accessing the table.c by attribute, use the get method.

    \n
    >>> from sqlalchemy import MetaData, Table, Integer, Column, create_engine\n>>> engine = create_engine('sqlite://')\n>>> metadata = MetaData(bind=engine)\n>>> table = Table("Bookings", metadata,\n...     Column('id', Integer(), primary_key=True,),\n...     Column('value', Integer(),),\n... )\n>>> field = 'value'\n>>> table.c.get(field)\nColumn('value', Integer(), table=)\n>>> table.c[field]\nColumn('value', Integer(), table=)\n
    \n

    So for your example, the code would be something like booking_table.c[field].

    \n

    Remember to sanitize your inputs; you can probably check for whether field in table.c

    \n
    >>> field in table.c\nTrue\n>>> 'id' in table.c\nTrue\n>>> 'nothere' in table.c\nFalse\n
    \n

    Looks like this isn't officially documented, but it works.

    \n soup wrap:

    Rather than accessing the table.c by attribute, use the get method.

    >>> from sqlalchemy import MetaData, Table, Integer, Column, create_engine
    >>> engine = create_engine('sqlite://')
    >>> metadata = MetaData(bind=engine)
    >>> table = Table("Bookings", metadata,
    ...     Column('id', Integer(), primary_key=True,),
    ...     Column('value', Integer(),),
    ... )
    >>> field = 'value'
    >>> table.c.get(field)
    Column('value', Integer(), table=)
    >>> table.c[field]
    Column('value', Integer(), table=)
    

    So for your example, the code would be something like booking_table.c[field].

    Remember to sanitize your inputs; you can probably check for whether field in table.c

    >>> field in table.c
    True
    >>> 'id' in table.c
    True
    >>> 'nothere' in table.c
    False
    

    Looks like this isn't officially documented, but it works.

    qid & accept id: (24495059, 24517823) query: Comparing items in large list - finding items differing in 1 letter by length - Python soup:

    Update: Modified to handle a variable number of "id" sub-fields and print the results as a single string. Note several test cases were added to the end of the input to have some with a different number of leading fields making up the id (i.e. 2 instead of 3).

    \n

    I also renamed thenum_mismatches()functionhamming_distance()because that's what it is.

    \n

    Using the following input:

    \n
    IGHV3-23-IGHJ4-CAKDRGYTGYGVYFDYW\nIGHV4-39-IGHJ4-CARHDILTGYSYYFDYW\nIGHV3-23-IGHJ3-CAKSGGWYLSDAFDIW\nIGHV4-39-IGHJ4-CARTGFGELGFDYW\nIGHV1-2-IGHJ2-CARDSDYDWYFDLW\nIGHV1-8-IGHJ3-CARGQTYYDILTGPSDAFDIW\nIGHV4-39-IGHJ5-CARSTGDWFDPW\nIGHV3-9-IGHJ3-CANVPIYSSSYDAFDIW\nIGHV3-23-IGHJ4-CAKDWELYYFDYW\nIGHV3-23-IGHJ4-CAKDRGYTGFGVYFDYW\nIGHV4-39-IGHJ4-CARHLGYNNSWYPFDYW\nIGHV1-2-IGHJ4-CAREGYNWNDEGRFDYW\nIGHV3-23-IGHJ3-CAKSSGWYLSDAFDIW\nIGHV4-39-IGHJ4-CARYLGYNSNWYPFDYW\nIGHV3-23-IGHJ6-CAKEGCSSGCPYYYYGMDVW\nIGHV3-23-IGHJ3-CAKWGPDAFDIW\nIGHV3-11-IGHJ-CATSGGSP\nIGHV3-11-IGHJ4-CARDGDGYNDYW\nIGHV1-2-IGHJ4-CARRIGYSSGSEDYW\nIGHV1-2-IGHJ4-CARDIAVPGHGDYW\nIGHV6-1-IGHJ4-CASGGAVPGYYFDYW\nIGHV1-2-CAREGYNWNDEGRFDYW\nIGHV4-39-CARSTGDWFDPW\nIGHV1-2-CARDSDYDWYFDLW\n
    \n

    and this script:

    \n
    from collections import defaultdict\nfrom itertools import izip, tee\nimport os\nimport sys\n\n# http://en.wikipedia.org/wiki/Hamming_distance#Algorithm_example\ndef hamming_distance(s1, s2):\n    """ Count number of mismatched characters in equal length strings. """\n    if not isinstance(s1, basestring): raise ValueError('s1 is not a string')\n    if not isinstance(s2, basestring): raise ValueError('s2 is not a string')\n    if len(s1) != len(s2): raise ValueError('string lengths do not match')\n    return sum(a != b for a, b in izip(s1, s2))\n\ndef pairwise(iterable):  # itertools recipe\n    "s -> (s0,s1), (s1,s2), (s2, s3), ..."\n    a, b = tee(iterable)\n    next(b, None)\n    return izip(a, b)\n\ninp = sys.argv[1]  # Input file\n\nunique = defaultdict(list)\nwith open(inp, 'rb') as file:\n    for fields in (line.strip().split('-') for line in file):\n        id = '-'.join(fields[:-1])  # recombine all but last field into an id\n        unique[id].append(fields[-1])  # accumulate ending fields with same id\n\nfor id in sorted(unique):\n    final_fields = unique[id]\n    final_fields.sort(key=lambda field: len(field))  # sort by length\n    print id + ':' + '-'.join(final_fields)\n    if len(final_fields) > 1:  # at least one pair to compare for mismatches?\n        for a, b in pairwise(final_fields):\n            if len(a) == len(b) and hamming_distance(a, b) < 2:\n                print '  {!r} and {!r} differ by < 2 characters'.format(a, b)\n
    \n

    Output:

    \n
    IGHV1-2:CARDSDYDWYFDLW-CAREGYNWNDEGRFDYW\nIGHV1-2-IGHJ2:CARDSDYDWYFDLW\nIGHV1-2-IGHJ4:CARDIAVPGHGDYW-CARRIGYSSGSEDYW-CAREGYNWNDEGRFDYW\nIGHV1-8-IGHJ3:CARGQTYYDILTGPSDAFDIW\nIGHV3-11-IGHJ:CATSGGSP\nIGHV3-11-IGHJ4:CARDGDGYNDYW\nIGHV3-23-IGHJ3:CAKWGPDAFDIW-CAKSGGWYLSDAFDIW-CAKSSGWYLSDAFDIW\n  'CAKSGGWYLSDAFDIW' and 'CAKSSGWYLSDAFDIW' differ by < 2 characters\nIGHV3-23-IGHJ4:CAKDWELYYFDYW-CAKDRGYTGYGVYFDYW-CAKDRGYTGFGVYFDYW\n  'CAKDRGYTGYGVYFDYW' and 'CAKDRGYTGFGVYFDYW' differ by < 2 characters\nIGHV3-23-IGHJ6:CAKEGCSSGCPYYYYGMDVW\nIGHV3-9-IGHJ3:CANVPIYSSSYDAFDIW\nIGHV4-39:CARSTGDWFDPW\nIGHV4-39-IGHJ4:CARTGFGELGFDYW-CARHDILTGYSYYFDYW-CARHLGYNNSWYPFDYW-CARYLGYNSNWYPFDYW\nIGHV4-39-IGHJ5:CARSTGDWFDPW\nIGHV6-1-IGHJ4:CASGGAVPGYYFDYW\n
    \n

    Hope this update is also helpful...

    \n soup wrap:

    Update: Modified to handle a variable number of "id" sub-fields and print the results as a single string. Note several test cases were added to the end of the input to have some with a different number of leading fields making up the id (i.e. 2 instead of 3).

    I also renamed thenum_mismatches()functionhamming_distance()because that's what it is.

    Using the following input:

    IGHV3-23-IGHJ4-CAKDRGYTGYGVYFDYW
    IGHV4-39-IGHJ4-CARHDILTGYSYYFDYW
    IGHV3-23-IGHJ3-CAKSGGWYLSDAFDIW
    IGHV4-39-IGHJ4-CARTGFGELGFDYW
    IGHV1-2-IGHJ2-CARDSDYDWYFDLW
    IGHV1-8-IGHJ3-CARGQTYYDILTGPSDAFDIW
    IGHV4-39-IGHJ5-CARSTGDWFDPW
    IGHV3-9-IGHJ3-CANVPIYSSSYDAFDIW
    IGHV3-23-IGHJ4-CAKDWELYYFDYW
    IGHV3-23-IGHJ4-CAKDRGYTGFGVYFDYW
    IGHV4-39-IGHJ4-CARHLGYNNSWYPFDYW
    IGHV1-2-IGHJ4-CAREGYNWNDEGRFDYW
    IGHV3-23-IGHJ3-CAKSSGWYLSDAFDIW
    IGHV4-39-IGHJ4-CARYLGYNSNWYPFDYW
    IGHV3-23-IGHJ6-CAKEGCSSGCPYYYYGMDVW
    IGHV3-23-IGHJ3-CAKWGPDAFDIW
    IGHV3-11-IGHJ-CATSGGSP
    IGHV3-11-IGHJ4-CARDGDGYNDYW
    IGHV1-2-IGHJ4-CARRIGYSSGSEDYW
    IGHV1-2-IGHJ4-CARDIAVPGHGDYW
    IGHV6-1-IGHJ4-CASGGAVPGYYFDYW
    IGHV1-2-CAREGYNWNDEGRFDYW
    IGHV4-39-CARSTGDWFDPW
    IGHV1-2-CARDSDYDWYFDLW
    

    and this script:

    from collections import defaultdict
    from itertools import izip, tee
    import os
    import sys
    
    # http://en.wikipedia.org/wiki/Hamming_distance#Algorithm_example
    def hamming_distance(s1, s2):
        """ Count number of mismatched characters in equal length strings. """
        if not isinstance(s1, basestring): raise ValueError('s1 is not a string')
        if not isinstance(s2, basestring): raise ValueError('s2 is not a string')
        if len(s1) != len(s2): raise ValueError('string lengths do not match')
        return sum(a != b for a, b in izip(s1, s2))
    
    def pairwise(iterable):  # itertools recipe
        "s -> (s0,s1), (s1,s2), (s2, s3), ..."
        a, b = tee(iterable)
        next(b, None)
        return izip(a, b)
    
    inp = sys.argv[1]  # Input file
    
    unique = defaultdict(list)
    with open(inp, 'rb') as file:
        for fields in (line.strip().split('-') for line in file):
            id = '-'.join(fields[:-1])  # recombine all but last field into an id
            unique[id].append(fields[-1])  # accumulate ending fields with same id
    
    for id in sorted(unique):
        final_fields = unique[id]
        final_fields.sort(key=lambda field: len(field))  # sort by length
        print id + ':' + '-'.join(final_fields)
        if len(final_fields) > 1:  # at least one pair to compare for mismatches?
            for a, b in pairwise(final_fields):
                if len(a) == len(b) and hamming_distance(a, b) < 2:
                    print '  {!r} and {!r} differ by < 2 characters'.format(a, b)
    

    Output:

    IGHV1-2:CARDSDYDWYFDLW-CAREGYNWNDEGRFDYW
    IGHV1-2-IGHJ2:CARDSDYDWYFDLW
    IGHV1-2-IGHJ4:CARDIAVPGHGDYW-CARRIGYSSGSEDYW-CAREGYNWNDEGRFDYW
    IGHV1-8-IGHJ3:CARGQTYYDILTGPSDAFDIW
    IGHV3-11-IGHJ:CATSGGSP
    IGHV3-11-IGHJ4:CARDGDGYNDYW
    IGHV3-23-IGHJ3:CAKWGPDAFDIW-CAKSGGWYLSDAFDIW-CAKSSGWYLSDAFDIW
      'CAKSGGWYLSDAFDIW' and 'CAKSSGWYLSDAFDIW' differ by < 2 characters
    IGHV3-23-IGHJ4:CAKDWELYYFDYW-CAKDRGYTGYGVYFDYW-CAKDRGYTGFGVYFDYW
      'CAKDRGYTGYGVYFDYW' and 'CAKDRGYTGFGVYFDYW' differ by < 2 characters
    IGHV3-23-IGHJ6:CAKEGCSSGCPYYYYGMDVW
    IGHV3-9-IGHJ3:CANVPIYSSSYDAFDIW
    IGHV4-39:CARSTGDWFDPW
    IGHV4-39-IGHJ4:CARTGFGELGFDYW-CARHDILTGYSYYFDYW-CARHLGYNNSWYPFDYW-CARYLGYNSNWYPFDYW
    IGHV4-39-IGHJ5:CARSTGDWFDPW
    IGHV6-1-IGHJ4:CASGGAVPGYYFDYW
    

    Hope this update is also helpful...

    qid & accept id: (24495452, 24498335) query: How can I find the right gaussian curve given some data? soup:

    To avoid guessing the amplitude, call hist() with normed=True, then the amplitude corresponds to normpdf().

    \n

    For doing a curve fit, I suggest to use not the density but the cumulative distribution: Each sample has a height of 1/N, which successively sum up to 1. This has the advantage that you don't need to group samples in bins.

    \n
    import numpy as np\nfrom scipy.stats import norm\nfrom scipy.optimize import curve_fit\nimport matplotlib.pyplot as plt\n\n# Beginning in one dimension:\nmean = 0; Var = 1; N = 100\nscatter = np.random.normal(mean,np.sqrt(Var),N)\nscatter = np.sort(scatter)\nmu1,sigma1 = norm.fit(scatter) # classical fit\n\nscat_sum = np.cumsum(np.ones(scatter.shape))/N # cumulative samples\n[mu2,sigma2],Cx = curve_fit(norm.cdf, scatter, scat_sum, p0=[0,1]) # curve fit\nprint(u"norm.fit():  µ1= {:+.4f}, σ1={:.4f}".format(mu1, sigma1))\nprint(u"curve_fit(): µ2= {:+.4f}, σ2={:.4f}".format(mu2, sigma2))\n\nfg = plt.figure(1); fg.clf()\nax = fg.add_subplot(1, 1, 1)\nt = np.linspace(-4,4, 1000)\nax.plot(t, norm.cdf(t, mu1, sigma1), alpha=.5, label="norm.fit()")\nax.plot(t, norm.cdf(t, mu2, sigma2), alpha=.5, label="curve_fit()")\nax.step(scatter, scat_sum, 'x-', where='post', alpha=.5, label="Samples")\nax.legend(loc="best")\nax.grid(True)\nax.set_xlabel("$x$")\nax.set_ylabel("Cumulative Probability Density")\nax.set_title("Fit to Normal Distribution")\n\nfg.canvas.draw()\nplt.show()\n
    \n

    prints

    \n
    norm.fit():  µ1= +0.1534, σ1=1.0203\ncurve_fit(): µ2= +0.1135, σ2=1.0444\n
    \n

    and plots

    \n

    enter image description here

    \n soup wrap:

    To avoid guessing the amplitude, call hist() with normed=True, then the amplitude corresponds to normpdf().

    For doing a curve fit, I suggest to use not the density but the cumulative distribution: Each sample has a height of 1/N, which successively sum up to 1. This has the advantage that you don't need to group samples in bins.

    import numpy as np
    from scipy.stats import norm
    from scipy.optimize import curve_fit
    import matplotlib.pyplot as plt
    
    # Beginning in one dimension:
    mean = 0; Var = 1; N = 100
    scatter = np.random.normal(mean,np.sqrt(Var),N)
    scatter = np.sort(scatter)
    mu1,sigma1 = norm.fit(scatter) # classical fit
    
    scat_sum = np.cumsum(np.ones(scatter.shape))/N # cumulative samples
    [mu2,sigma2],Cx = curve_fit(norm.cdf, scatter, scat_sum, p0=[0,1]) # curve fit
    print(u"norm.fit():  µ1= {:+.4f}, σ1={:.4f}".format(mu1, sigma1))
    print(u"curve_fit(): µ2= {:+.4f}, σ2={:.4f}".format(mu2, sigma2))
    
    fg = plt.figure(1); fg.clf()
    ax = fg.add_subplot(1, 1, 1)
    t = np.linspace(-4,4, 1000)
    ax.plot(t, norm.cdf(t, mu1, sigma1), alpha=.5, label="norm.fit()")
    ax.plot(t, norm.cdf(t, mu2, sigma2), alpha=.5, label="curve_fit()")
    ax.step(scatter, scat_sum, 'x-', where='post', alpha=.5, label="Samples")
    ax.legend(loc="best")
    ax.grid(True)
    ax.set_xlabel("$x$")
    ax.set_ylabel("Cumulative Probability Density")
    ax.set_title("Fit to Normal Distribution")
    
    fg.canvas.draw()
    plt.show()
    

    prints

    norm.fit():  µ1= +0.1534, σ1=1.0203
    curve_fit(): µ2= +0.1135, σ2=1.0444
    

    and plots

    enter image description here

    qid & accept id: (24496320, 24496659) query: Combine dict with same keys into one dict with list soup:

    You could write a merge function like this, but you really should consider rewriting your SQL query. Here is a simple solution:

    \n
    def merge_books(books):\n    merged = {}\n\n    for book in books:\n        authorId = book['authorId']\n\n        # Create article attribute\n        book['articles'] = [{\n            'articles.id': book['articles.id'],\n            'authorId':    book['authorId'],\n            'Title':       book['Title'],\n        }]\n\n        # Remove redundant information\n        del book['articles.id']\n        del book['authorId']\n        del book['Title']\n\n        if authorId in merged:\n            merged[authorId]['articles'].append(book['articles'][0])\n        else:\n            merged[authorId] = book\n\n    # Convert dict into a tuple, but why not a list?\n    return tuple(merged.values())\n
    \n

    A better way would be to use two select statements and merge their results together:

    \n
    import MySQLdb\n\ndef get_authors_with_articles(connection):\n    cursor = connection.cursor()\n\n    authors = {}\n    for author in cursor.execute('SELECT * FROM Authors'):\n        # Initialize empty article list that will be popluated with the next select\n        author['articles'] = []\n        authors[author['id']] = author\n\n    for article in cursor.execute('SELECT * FROM Articles').fetchall():\n        # Fetch and delete redundant information\n        author_id = article['authorId']\n        del article['authorId']\n\n        authors[author_id]['articles'].append(article)\n\n    return list(authors.values())\n\n\nif __name__ == '__main__':\n    connection = MySQLdb.connect(\n        mysql_host,\n        mysql_user,\n        mysql_pass,\n        mysql_base,\n        cursorclass=MySQLdb.cursors.DictCursor\n    )\n    print(get_authors_with_articles(connection))\n
    \n soup wrap:

    You could write a merge function like this, but you really should consider rewriting your SQL query. Here is a simple solution:

    def merge_books(books):
        merged = {}
    
        for book in books:
            authorId = book['authorId']
    
            # Create article attribute
            book['articles'] = [{
                'articles.id': book['articles.id'],
                'authorId':    book['authorId'],
                'Title':       book['Title'],
            }]
    
            # Remove redundant information
            del book['articles.id']
            del book['authorId']
            del book['Title']
    
            if authorId in merged:
                merged[authorId]['articles'].append(book['articles'][0])
            else:
                merged[authorId] = book
    
        # Convert dict into a tuple, but why not a list?
        return tuple(merged.values())
    

    A better way would be to use two select statements and merge their results together:

    import MySQLdb
    
    def get_authors_with_articles(connection):
        cursor = connection.cursor()
    
        authors = {}
        for author in cursor.execute('SELECT * FROM Authors'):
            # Initialize empty article list that will be popluated with the next select
            author['articles'] = []
            authors[author['id']] = author
    
        for article in cursor.execute('SELECT * FROM Articles').fetchall():
            # Fetch and delete redundant information
            author_id = article['authorId']
            del article['authorId']
    
            authors[author_id]['articles'].append(article)
    
        return list(authors.values())
    
    
    if __name__ == '__main__':
        connection = MySQLdb.connect(
            mysql_host,
            mysql_user,
            mysql_pass,
            mysql_base,
            cursorclass=MySQLdb.cursors.DictCursor
        )
        print(get_authors_with_articles(connection))
    
    qid & accept id: (24518769, 24520345) query: Average inter signout time in pandas dataframe soup:

    You can do this for example :

    \n

    First I a create your dataframe:

    \n
    import pandas as pd\nfrom StringIO import StringIO\ntext = """site date time\n1   Google.com 2012-05-01 19:16:08.070000\n2   Google.com 2012-05-01 19:20:07.880000\n3   Google.com 2012-05-01 19:33:02.200000\n4   Google.com 2012-05-01 19:35:09.173000\n5   Google.com 2012-05-01 20:18:55.610000\n6   Google.com 2012-05-01 20:26:27.577000\n8   Google.com 2012-05-02 12:51:12.013000\n9   Google.com 2012-05-02 12:56:52.013000\n10  Google.com 2012-05-02 12:59:55.167000\n11  Google.com 2012-05-02 13:04:25.687000\n12  Google.com 2012-05-02 13:16:36.263000\n"""\ntab = pd.read_table(StringIO(text),index_col=0,sep='\s+')\n
    \n

    Then split data by date , and compute the mean of time lag for each date.

    \n
    for group,value in tab.groupby('date'):\n    print group\n    print pd.to_datetime(value.time).diff().mean()\n\n## 2012-05-01\n## 0   00:14:03.901400\n## dtype: timedelta64[ns]\n## 2012-05-02\n## 0   00:06:21.062500\n## dtype: timedelta64[ns]\n
    \n soup wrap:

    You can do this for example :

    First I a create your dataframe:

    import pandas as pd
    from StringIO import StringIO
    text = """site date time
    1   Google.com 2012-05-01 19:16:08.070000
    2   Google.com 2012-05-01 19:20:07.880000
    3   Google.com 2012-05-01 19:33:02.200000
    4   Google.com 2012-05-01 19:35:09.173000
    5   Google.com 2012-05-01 20:18:55.610000
    6   Google.com 2012-05-01 20:26:27.577000
    8   Google.com 2012-05-02 12:51:12.013000
    9   Google.com 2012-05-02 12:56:52.013000
    10  Google.com 2012-05-02 12:59:55.167000
    11  Google.com 2012-05-02 13:04:25.687000
    12  Google.com 2012-05-02 13:16:36.263000
    """
    tab = pd.read_table(StringIO(text),index_col=0,sep='\s+')
    

    Then split data by date , and compute the mean of time lag for each date.

    for group,value in tab.groupby('date'):
        print group
        print pd.to_datetime(value.time).diff().mean()
    
    ## 2012-05-01
    ## 0   00:14:03.901400
    ## dtype: timedelta64[ns]
    ## 2012-05-02
    ## 0   00:06:21.062500
    ## dtype: timedelta64[ns]
    
    qid & accept id: (24584784, 24584876) query: Use argparse to call different functions soup:

    Default subcommands

    \n

    If you want to have subcommands, and make one of them default if no subcommand is specified, then you can't use the typical subparser method.

    \n

    You need to do your argparse in two passes:

    \n
    parser = ArgumentParser()\nparser.add_argument("function", \n                    nargs="?",\n                    choices=['function1', 'function2', 'function2'],\n                    default='function1',\n                    )\nargs, sub_args = parser.parse_known_args()\n\nif args.function == "function1":\n    parser = ArgumentParser()\n    parser.add_argument('-a','--a')\n    parser.add_argument('-b','--b')\n    parser.add_argument('-c','--c')\n    args = parser.parse_args(sub_args)\n    function1(args.a, args.b, args.c)\nelif args.function == "function2":\n    ...\nelif args.function == "function3":\n    ...\n
    \n

    Handling --help

    \n

    If you want --help option to be useful, you need to do a bit more work:

    \n
      \n
    • We need to manually handle help because sometimes we want the overall help, and sometimes we want the subcommand's help
    • \n
    • We can't give a default subcommand straight away because we need to be able to tell if it was specified or not
    • \n
    \n

    This should do the trick:

    \n
    # Parse the subcommand argument first\nparser = ArgumentParser(add_help=False)\nparser.add_argument("function", \n                    nargs="?",\n                    choices=['function1', 'function2', 'function2'],\n                    )\nparser.add_argument('--help', action='store_true')\nargs, sub_args = parser.parse_known_args(['--help'])\n\n# Manually handle help\nif args.help:\n    # If no subcommand was specified, give general help\n    if args.function is None: \n        print parser.format_help()\n        sys.exit(1)\n    # Otherwise pass the help option on to the subcommand\n    sub_args.append('--help')\n\n# Manually handle the default for "function"\nfunction = "function1" if args.function is None else args.function\n\n# Parse the remaining args as per the selected subcommand\nparser = ArgumentParser(prog="%s %s" % (os.path.basename(sys.argv[0]), function))\nif function == "function1":\n    parser.add_argument('-a','--a')\n    parser.add_argument('-b','--b')\n    parser.add_argument('-c','--c')\n    args = parser.parse_args(sub_args)\n    function1(args.a, args.b, args.c)\nelif function == "function2":\n    ...\nelif function == "function3":\n    ...\n
    \n soup wrap:

    Default subcommands

    If you want to have subcommands, and make one of them default if no subcommand is specified, then you can't use the typical subparser method.

    You need to do your argparse in two passes:

    parser = ArgumentParser()
    parser.add_argument("function", 
                        nargs="?",
                        choices=['function1', 'function2', 'function2'],
                        default='function1',
                        )
    args, sub_args = parser.parse_known_args()
    
    if args.function == "function1":
        parser = ArgumentParser()
        parser.add_argument('-a','--a')
        parser.add_argument('-b','--b')
        parser.add_argument('-c','--c')
        args = parser.parse_args(sub_args)
        function1(args.a, args.b, args.c)
    elif args.function == "function2":
        ...
    elif args.function == "function3":
        ...
    

    Handling --help

    If you want --help option to be useful, you need to do a bit more work:

    • We need to manually handle help because sometimes we want the overall help, and sometimes we want the subcommand's help
    • We can't give a default subcommand straight away because we need to be able to tell if it was specified or not

    This should do the trick:

    # Parse the subcommand argument first
    parser = ArgumentParser(add_help=False)
    parser.add_argument("function", 
                        nargs="?",
                        choices=['function1', 'function2', 'function2'],
                        )
    parser.add_argument('--help', action='store_true')
    args, sub_args = parser.parse_known_args(['--help'])
    
    # Manually handle help
    if args.help:
        # If no subcommand was specified, give general help
        if args.function is None: 
            print parser.format_help()
            sys.exit(1)
        # Otherwise pass the help option on to the subcommand
        sub_args.append('--help')
    
    # Manually handle the default for "function"
    function = "function1" if args.function is None else args.function
    
    # Parse the remaining args as per the selected subcommand
    parser = ArgumentParser(prog="%s %s" % (os.path.basename(sys.argv[0]), function))
    if function == "function1":
        parser.add_argument('-a','--a')
        parser.add_argument('-b','--b')
        parser.add_argument('-c','--c')
        args = parser.parse_args(sub_args)
        function1(args.a, args.b, args.c)
    elif function == "function2":
        ...
    elif function == "function3":
        ...
    
    qid & accept id: (24593478, 24593488) query: Python and appending items to text and excel file soup:

    The correct way is as follows:

    \n
    Yvalues = [1, 2, 3, 4, 5]\nfile_out = open('file.csv','wb')\nmywriter=csv.writer(file_out, delimiter = '\n')\nmywriter.writerow(Yvalues)\nfile_out.close()\n
    \n

    This will give you:

    \n
    1\n\n2\n\n3\n\n4\n\n5\n
    \n soup wrap:

    The correct way is as follows:

    Yvalues = [1, 2, 3, 4, 5]
    file_out = open('file.csv','wb')
    mywriter=csv.writer(file_out, delimiter = '\n')
    mywriter.writerow(Yvalues)
    file_out.close()
    

    This will give you:

    1
    
    2
    
    3
    
    4
    
    5
    
    qid & accept id: (24638781, 24639154) query: How to align and compare two elements (sequence) in a list using python soup:

    You can break this up into three distinct parts:

    \n
      \n
    1. parse the input;
    2. \n
    3. construct the new disorder string;
    4. \n
    5. output the new file.
    6. \n
    \n

    (1) and (3) are pretty simple, so I'll focus on (2). The main thing you need to do is iterate through your "disorder string" where you can access the character at each position, as well as the position itself. One way to do this is to use enumerate:

    \n
    for i, x in enumerate(S)\n
    \n

    which gives you a generator for each position (stored in i) and character (stored in x) in string S. Once you have that, all you need to do is record the position and the character in seq whenever the disorder string has an "X". In Python, this could look like:

    \n
    if (x == 'X'):\n    new_disorder.append( "{} {}".format(i, seq[i]) )\n
    \n

    where we are formatting the result as a string, e.g. "34 R".

    \n

    Here's a complete example:

    \n
    # Parse the file which was already split into split_list\nsplit_list = ['>103L', 'Sequence:', 'MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL', 'Disorder:', '----------------------------------XXXXXX-----------------------------------------------------------------------------------------------------------------------------XX']\nheader   = split_list[0] + " " + split_list[1]\nseq      = split_list[2]\ndisorder = split_list[4]\n\n# Create the new disorder string\nnew_disorder = ["Disorder: Posi R"]\nfor i, x in enumerate(disorder):\n    if x == "X":\n        # Appends of the form: "AminoAcid Position"\n        new_disorder.append( "{} {}".format(i, seq[i]) )\n\nnew_disorder = " ".join(new_disorder)\n\n# Output the modified file\nopen("seq2.txt", "w").write( "\n".join([header, seq, new_disorder]))\n
    \n

    Note that I get slightly different output than the example you gave:

    \n
    103L Sequence:\nMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL\nDisorder: Posi R 34 K 35 S 36 P 37 S 38 L 39 N 165 N 166 L\n
    \n soup wrap:

    You can break this up into three distinct parts:

    1. parse the input;
    2. construct the new disorder string;
    3. output the new file.

    (1) and (3) are pretty simple, so I'll focus on (2). The main thing you need to do is iterate through your "disorder string" where you can access the character at each position, as well as the position itself. One way to do this is to use enumerate:

    for i, x in enumerate(S)
    

    which gives you a generator for each position (stored in i) and character (stored in x) in string S. Once you have that, all you need to do is record the position and the character in seq whenever the disorder string has an "X". In Python, this could look like:

    if (x == 'X'):
        new_disorder.append( "{} {}".format(i, seq[i]) )
    

    where we are formatting the result as a string, e.g. "34 R".

    Here's a complete example:

    # Parse the file which was already split into split_list
    split_list = ['>103L', 'Sequence:', 'MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL', 'Disorder:', '----------------------------------XXXXXX-----------------------------------------------------------------------------------------------------------------------------XX']
    header   = split_list[0] + " " + split_list[1]
    seq      = split_list[2]
    disorder = split_list[4]
    
    # Create the new disorder string
    new_disorder = ["Disorder: Posi R"]
    for i, x in enumerate(disorder):
        if x == "X":
            # Appends of the form: "AminoAcid Position"
            new_disorder.append( "{} {}".format(i, seq[i]) )
    
    new_disorder = " ".join(new_disorder)
    
    # Output the modified file
    open("seq2.txt", "w").write( "\n".join([header, seq, new_disorder]))
    

    Note that I get slightly different output than the example you gave:

    103L Sequence:
    MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL
    Disorder: Posi R 34 K 35 S 36 P 37 S 38 L 39 N 165 N 166 L
    
    qid & accept id: (24642669, 24642866) query: Python Quickest way to round every float in nested list of tuples soup:

    I don't know about "quickest" (quickest to write? read? runtime?), but this is how I'd write it recursively:

    \n
    def re_round(li, _prec=5):\n     try:\n         return round(li, _prec)\n     except TypeError:\n         return type(li)(re_round(x, _prec) for x in li)\n
    \n

    demo:

    \n
    x = [[(-88.99716274669669, 45.13003508233472), (-88.46889143213836, 45.12912220841379), (-88.47075415770517, 44.84090409706577), (-88.75033424251002, 44.84231949526811), (-88.75283245650954, 44.897062864942406), (-88.76794136151051, 44.898020801741716), (-88.77994787408718, 44.93415662283567), (-88.99624763048942, 44.93474749747682), (-88.99716274669669, 45.13003508233472)]]\n\nre_round(x)\nOut[6]: \n[[(-88.99716, 45.13004),\n  (-88.46889, 45.12912),\n  (-88.47075, 44.8409),\n  (-88.75033, 44.84232),\n  (-88.75283, 44.89706),\n  (-88.76794, 44.89802),\n  (-88.77995, 44.93416),\n  (-88.99625, 44.93475),\n  (-88.99716, 45.13004)]]\n
    \n

    (old generator version of the function, for posterity:)

    \n
    def re_round(li, _prec=5):\n    for x in li:\n        try:\n            yield round(x, _prec)\n        except TypeError:\n            yield type(x)(re_round(x, _prec))\n
    \n soup wrap:

    I don't know about "quickest" (quickest to write? read? runtime?), but this is how I'd write it recursively:

    def re_round(li, _prec=5):
         try:
             return round(li, _prec)
         except TypeError:
             return type(li)(re_round(x, _prec) for x in li)
    

    demo:

    x = [[(-88.99716274669669, 45.13003508233472), (-88.46889143213836, 45.12912220841379), (-88.47075415770517, 44.84090409706577), (-88.75033424251002, 44.84231949526811), (-88.75283245650954, 44.897062864942406), (-88.76794136151051, 44.898020801741716), (-88.77994787408718, 44.93415662283567), (-88.99624763048942, 44.93474749747682), (-88.99716274669669, 45.13003508233472)]]
    
    re_round(x)
    Out[6]: 
    [[(-88.99716, 45.13004),
      (-88.46889, 45.12912),
      (-88.47075, 44.8409),
      (-88.75033, 44.84232),
      (-88.75283, 44.89706),
      (-88.76794, 44.89802),
      (-88.77995, 44.93416),
      (-88.99625, 44.93475),
      (-88.99716, 45.13004)]]
    

    (old generator version of the function, for posterity:)

    def re_round(li, _prec=5):
        for x in li:
            try:
                yield round(x, _prec)
            except TypeError:
                yield type(x)(re_round(x, _prec))
    
    qid & accept id: (24660923, 24661010) query: How to apply parameters/for loop soup:
    import operator\n\nvector1 = (1, 2, 3)\n\n# get a list of vectors\nvectors = [\n    (4, 5, 6),\n    (7, 8, 9)\n]\n\n# for loop through the vectors,\n# assignig the current vector to vector2 in every iteration\nfor vector2 in vectors:\n    dotProduct = reduce(operator.add, map(operator.mul, vector1, vector2))\n    print dotProduct\n
    \n

    Using your l, nat and a variables:

    \n
    vector1 = (int(l[0][0]), int(l[0][1]), int(l[0][2]))\n\nfor a in range(1, nat):\n    vector2 = (int(l[a][0]), int(l[a][1]), int(l[a][2]))\n    dotProduct = reduce(operator.add, map(operator.mul, vector1, vector2))\n    print(dotProduct)\n
    \n soup wrap:
    import operator
    
    vector1 = (1, 2, 3)
    
    # get a list of vectors
    vectors = [
        (4, 5, 6),
        (7, 8, 9)
    ]
    
    # for loop through the vectors,
    # assignig the current vector to vector2 in every iteration
    for vector2 in vectors:
        dotProduct = reduce(operator.add, map(operator.mul, vector1, vector2))
        print dotProduct
    

    Using your l, nat and a variables:

    vector1 = (int(l[0][0]), int(l[0][1]), int(l[0][2]))
    
    for a in range(1, nat):
        vector2 = (int(l[a][0]), int(l[a][1]), int(l[a][2]))
        dotProduct = reduce(operator.add, map(operator.mul, vector1, vector2))
        print(dotProduct)
    
    qid & accept id: (24674723, 24674754) query: get function names from a list python soup:

    You can use getattr (note that you are referring to attributes, or possibly methods, not functions):

    \n
    for alarm in alarms:\n    for attr in whitelist:\n        print getattr(alarm, attr)\n
    \n

    getattr takes an optional third argument, the default value in case attr isn't found, so you could do e.g.:

    \n
    for attr in whitelist:\n    print "{0}: {1}".format(attr, getattr(alarm, attr, ""))\n
    \n soup wrap:

    You can use getattr (note that you are referring to attributes, or possibly methods, not functions):

    for alarm in alarms:
        for attr in whitelist:
            print getattr(alarm, attr)
    

    getattr takes an optional third argument, the default value in case attr isn't found, so you could do e.g.:

    for attr in whitelist:
        print "{0}: {1}".format(attr, getattr(alarm, attr, ""))
    
    qid & accept id: (24697061, 24697091) query: How to append the second return value, directly to a list, in Python soup:

    You can index a tuple using the same indexing you would for a list []. So if you want the list, which is the second element, you can just index the element [1] from the return of the function call.

    \n
    def get_stuff():\n    return 'a string', [1,2,3,5]\n\nall_stuff = [6,7]\nall_stuff.extend(get_stuff()[1])\n
    \n

    Output

    \n
    [6, 7, 1, 2, 3, 5]\n
    \n soup wrap:

    You can index a tuple using the same indexing you would for a list []. So if you want the list, which is the second element, you can just index the element [1] from the return of the function call.

    def get_stuff():
        return 'a string', [1,2,3,5]
    
    all_stuff = [6,7]
    all_stuff.extend(get_stuff()[1])
    

    Output

    [6, 7, 1, 2, 3, 5]
    
    qid & accept id: (24717673, 24717721) query: google app engine - ndb query to only get a few columns in python soup:

    What you need is called projection query

    \n

    Exaple:

    \n
    qry = Article.query()\narticles = qry.fetch(20, projection=[Article.author, Article.tags])\nfor article in articles:\n  # code here can use article.author, article.tags\n  # but cannot use article.title\n
    \n

    Your code:

    \n
    class userData(ndb.Model):\n    id = ndb.StringProperty()\n    name = ndb.StringProperty()\n    emailAddress = ndb.StringProperty()\n\nuser = userData.query().filter(ndb.GenericProperty('id') ==  "requiredId")\\n                       .fetch(projection=[userData.id, userData.name])\n
    \n

    Though I need to quote from the docs:

    \n
    \n

    Projection can be useful; if you only need two small properties each\n from several large entities, the fetch is more efficient since it gets\n and deserializes less data.

    \n
    \n

    Think about the above when you use projection queries

    \n

    P.S.

    \n

    Also use CapWords convention for class names in python if you want to follow the PEP

    \n soup wrap:

    What you need is called projection query

    Exaple:

    qry = Article.query()
    articles = qry.fetch(20, projection=[Article.author, Article.tags])
    for article in articles:
      # code here can use article.author, article.tags
      # but cannot use article.title
    

    Your code:

    class userData(ndb.Model):
        id = ndb.StringProperty()
        name = ndb.StringProperty()
        emailAddress = ndb.StringProperty()
    
    user = userData.query().filter(ndb.GenericProperty('id') ==  "requiredId")\
                           .fetch(projection=[userData.id, userData.name])
    

    Though I need to quote from the docs:

    Projection can be useful; if you only need two small properties each from several large entities, the fetch is more efficient since it gets and deserializes less data.

    Think about the above when you use projection queries

    P.S.

    Also use CapWords convention for class names in python if you want to follow the PEP

    qid & accept id: (24728933, 24728952) query: Sort dictionary alphabetically when the key is a string (name) soup:

    simple algorithm to sort dictonary keys in alphabetical order, First sort the keys using sorted

    \n
    sortednames=sorted(dictUsers.keys(), key=lambda x:x.lower())\n
    \n

    for each key name retreive the values from the dict

    \n
    for i in sortednames:\n   values=dictUsers[i]\n   print("Name= " + i)\n   print ("   Age= " + values.age)\n   print ("   Address= " + values.address)\n   print ("   Phone Number= " + values.phone)\n
    \n soup wrap:

    simple algorithm to sort dictonary keys in alphabetical order, First sort the keys using sorted

    sortednames=sorted(dictUsers.keys(), key=lambda x:x.lower())
    

    for each key name retreive the values from the dict

    for i in sortednames:
       values=dictUsers[i]
       print("Name= " + i)
       print ("   Age= " + values.age)
       print ("   Address= " + values.address)
       print ("   Phone Number= " + values.phone)
    
    qid & accept id: (24746231, 24746399) query: Matplotlib Half color axis soup:

    If you have a specific set of colors that you want to use for you colormap, you can build it based on those. For example:

    \n
    import numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.colors import LinearSegmentedColormap\n\ncmap = LinearSegmentedColormap.from_list('name', ['green', 'yellow', 'red'])\n\n# Generate some data similar to yours\ny, x = np.mgrid[-200:1900, -300:2000]\nz = np.cos(np.hypot(x, y) / 100) + 1\n\nfig, ax = plt.subplots()\n\ncax = ax.contourf(x, y, z, cmap=cmap)\ncbar = fig.colorbar(cax)\ncbar.set_label('Z-Values')\n\nplt.show()\n
    \n

    enter image description here

    \n
    \n

    However, if you did just want the top half of some particularly complex colormap, you can copy a portion of it by evaluating the colormap over the range you're interested in. For example, if you wanted the "top" half, you'd evaluate it from 0.5 to 1:

    \n
    import numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.colors import LinearSegmentedColormap\n\n# Evaluate an existing colormap from 0.5 (midpoint) to 1 (upper end)\ncmap = plt.get_cmap('gist_earth')\ncolors = cmap(np.linspace(0.5, 1, cmap.N // 2))\n\n# Create a new colormap from those colors\ncmap2 = LinearSegmentedColormap.from_list('Upper Half', colors)\n\ny, x = np.mgrid[-200:1900, -300:2000]\nz = np.cos(np.hypot(x, y) / 100) + 1\n\nfig, axes = plt.subplots(ncols=2)\nfor ax, cmap in zip(axes.flat, [cmap, cmap2]):\n    cax = ax.imshow(z, cmap=cmap, origin='lower',\n                    extent=[x.min(), x.max(), y.min(), y.max()])\n    cbar = fig.colorbar(cax, ax=ax, orientation='horizontal')\n    cbar.set_label(cmap.name)\n\nplt.show()\n
    \n

    enter image description here

    \n soup wrap:

    If you have a specific set of colors that you want to use for you colormap, you can build it based on those. For example:

    import numpy as np
    import matplotlib.pyplot as plt
    from matplotlib.colors import LinearSegmentedColormap
    
    cmap = LinearSegmentedColormap.from_list('name', ['green', 'yellow', 'red'])
    
    # Generate some data similar to yours
    y, x = np.mgrid[-200:1900, -300:2000]
    z = np.cos(np.hypot(x, y) / 100) + 1
    
    fig, ax = plt.subplots()
    
    cax = ax.contourf(x, y, z, cmap=cmap)
    cbar = fig.colorbar(cax)
    cbar.set_label('Z-Values')
    
    plt.show()
    

    enter image description here


    However, if you did just want the top half of some particularly complex colormap, you can copy a portion of it by evaluating the colormap over the range you're interested in. For example, if you wanted the "top" half, you'd evaluate it from 0.5 to 1:

    import numpy as np
    import matplotlib.pyplot as plt
    from matplotlib.colors import LinearSegmentedColormap
    
    # Evaluate an existing colormap from 0.5 (midpoint) to 1 (upper end)
    cmap = plt.get_cmap('gist_earth')
    colors = cmap(np.linspace(0.5, 1, cmap.N // 2))
    
    # Create a new colormap from those colors
    cmap2 = LinearSegmentedColormap.from_list('Upper Half', colors)
    
    y, x = np.mgrid[-200:1900, -300:2000]
    z = np.cos(np.hypot(x, y) / 100) + 1
    
    fig, axes = plt.subplots(ncols=2)
    for ax, cmap in zip(axes.flat, [cmap, cmap2]):
        cax = ax.imshow(z, cmap=cmap, origin='lower',
                        extent=[x.min(), x.max(), y.min(), y.max()])
        cbar = fig.colorbar(cax, ax=ax, orientation='horizontal')
        cbar.set_label(cmap.name)
    
    plt.show()
    

    enter image description here

    qid & accept id: (24747191, 24748413) query: Save matches on array soup:

    If you want to use regexes, try this:

    \n
    s = 'module hi(a, b, c)'\nregex = re.compile(r'\s(\w+)\(([^\)]+)\)')\ntry:\n    module_name, parameters = regex.search(s).groups()\nexcept AttributeError as e:\n    print 'No match for: {}'.format(s)\n    raise\nparameters = parameters.split(',')\nprint module_name, parameters\nd = {'module_name':module_name,\n     'module_params':parameters[:-1],\n     'module_last_param':parameters[-1]}\nprint d\n# {'module_last_param': ' c', 'module_name': 'hi', 'module_params': ['a', ' b']}\n
    \n

    If you are confident that all of your data conforms to the pattern, you can also do this without regexes:

    \n
    name, params = s.split('(')\nname = name.split()[1]\nparams = params[:-1].split(',')\nd = {'module_name':name,\n     'module_params':params[:-1],\n     'module_last_param':params[-1]}\n
    \n soup wrap:

    If you want to use regexes, try this:

    s = 'module hi(a, b, c)'
    regex = re.compile(r'\s(\w+)\(([^\)]+)\)')
    try:
        module_name, parameters = regex.search(s).groups()
    except AttributeError as e:
        print 'No match for: {}'.format(s)
        raise
    parameters = parameters.split(',')
    print module_name, parameters
    d = {'module_name':module_name,
         'module_params':parameters[:-1],
         'module_last_param':parameters[-1]}
    print d
    # {'module_last_param': ' c', 'module_name': 'hi', 'module_params': ['a', ' b']}
    

    If you are confident that all of your data conforms to the pattern, you can also do this without regexes:

    name, params = s.split('(')
    name = name.split()[1]
    params = params[:-1].split(',')
    d = {'module_name':name,
         'module_params':params[:-1],
         'module_last_param':params[-1]}
    
    qid & accept id: (24747996, 24828768) query: How do I make pip available from command line mac? soup:

    First you need to find the path to your python installation, you can do this by typing this:

    \n
    which python\n
    \n

    In that directory there should be a directory called Scripts

    \n

    You can then add the full path of this directory to PATH by typing

    \n
    export PATH=$PATH:""\n
    \n

    So for example mine was:

    \n
    export PATH=$PATH:"/cygdrive/c/Python27/Scripts"\n
    \n

    You are only appending the new path to PATH, not replacing the old one. Be very careful to add the $PATH: after the equals. This is what keeps the old PATH intact. The : is the path separator.

    \n soup wrap:

    First you need to find the path to your python installation, you can do this by typing this:

    which python
    

    In that directory there should be a directory called Scripts

    You can then add the full path of this directory to PATH by typing

    export PATH=$PATH:""
    

    So for example mine was:

    export PATH=$PATH:"/cygdrive/c/Python27/Scripts"
    

    You are only appending the new path to PATH, not replacing the old one. Be very careful to add the $PATH: after the equals. This is what keeps the old PATH intact. The : is the path separator.

    qid & accept id: (24748179, 24769641) query: py2exe: excluding parts of a package that wants to import all its parts soup:

    All solutions seems hackish at best (see comments on question). But rather than post-fixing __init__.py I finally settled on the following strategy of going in and extricating the specific things I need, and copying them to temporary files:

    \n
    import sys, os, shutil\n\n# awkward kludge to include BigPackage.SmallSubset by hand without having to include the rest of BigPackage\nsrc = [ os.path.join( x, 'BigPackage', 'SmallSubset.py' ) for x in sys.path ]\nsrc = [ x for x in src if os.path.isfile( x ) ][ 0 ]\nshutil.copyfile( src, 'BigPackageSmallSubset.py' )\nimport BigPackageSmallSubset\n\n\noptions = { 'py2exe': { 'excludes' : [ 'BigPackage' ], 'includes' : [ 'BigPackageSmallSubset' ], 'compressed' : True, }, }\n\n# setup( ..., options=options, ... )\n\nos.remove( 'BigPackageSmallSubset.py' )\nos.remove( 'BigPackageSmallSubset.pyc' )\n
    \n

    Then I make myscript.py sensitive to the possible difference:

    \n
    try: from BigPackage.SmallSubset import TheOnlyFunctionIReallyNeed\nexcept ImportError: from BigPackageSmallSubset import TheOnlyFunctionIReallyNeed\n
    \n soup wrap:

    All solutions seems hackish at best (see comments on question). But rather than post-fixing __init__.py I finally settled on the following strategy of going in and extricating the specific things I need, and copying them to temporary files:

    import sys, os, shutil
    
    # awkward kludge to include BigPackage.SmallSubset by hand without having to include the rest of BigPackage
    src = [ os.path.join( x, 'BigPackage', 'SmallSubset.py' ) for x in sys.path ]
    src = [ x for x in src if os.path.isfile( x ) ][ 0 ]
    shutil.copyfile( src, 'BigPackageSmallSubset.py' )
    import BigPackageSmallSubset
    
    
    options = { 'py2exe': { 'excludes' : [ 'BigPackage' ], 'includes' : [ 'BigPackageSmallSubset' ], 'compressed' : True, }, }
    
    # setup( ..., options=options, ... )
    
    os.remove( 'BigPackageSmallSubset.py' )
    os.remove( 'BigPackageSmallSubset.pyc' )
    

    Then I make myscript.py sensitive to the possible difference:

    try: from BigPackage.SmallSubset import TheOnlyFunctionIReallyNeed
    except ImportError: from BigPackageSmallSubset import TheOnlyFunctionIReallyNeed
    
    qid & accept id: (24752712, 24752997) query: How to address a dictionary in a list of ordered dicts by unique key value? soup:
    \n

    The use case: reading a file of records (ordered dicts). Different key\n values from records with a re-occurring ID must be merged with the\n record with the first occurrence of that ID.

    \n
    \n

    You need to use a defaultdict for this:

    \n
    >>> from collections import defaultdict\n>>> d = defaultdict(list)\n>>> d['a'].append(1)\n>>> d['a'].append(2)\n>>> d['b'].append(3)\n>>> d['c'].append(4)\n>>> d['b'].append(5)\n>>> print(d['a'])\n[1, 2]\n>>> print(d)\ndefaultdict(, {'a': [1, 2], 'c': [4], 'b': [3, 5]})\n
    \n

    If you want to store other objects, for example a dictionary, just pass that as the callable:

    \n
    >>> d = defaultdict(dict)\n>>> d['a']['values'] = []\n>>> d['b']['values'] = []\n>>> d['a']['values'].append('a')\n>>> d['a']['values'].append('b')\n>>> print(d)\ndefaultdict(, {'a': {'values': ['a', 'b']}, 'b': {'values': []}})\n
    \n soup wrap:

    The use case: reading a file of records (ordered dicts). Different key values from records with a re-occurring ID must be merged with the record with the first occurrence of that ID.

    You need to use a defaultdict for this:

    >>> from collections import defaultdict
    >>> d = defaultdict(list)
    >>> d['a'].append(1)
    >>> d['a'].append(2)
    >>> d['b'].append(3)
    >>> d['c'].append(4)
    >>> d['b'].append(5)
    >>> print(d['a'])
    [1, 2]
    >>> print(d)
    defaultdict(, {'a': [1, 2], 'c': [4], 'b': [3, 5]})
    

    If you want to store other objects, for example a dictionary, just pass that as the callable:

    >>> d = defaultdict(dict)
    >>> d['a']['values'] = []
    >>> d['b']['values'] = []
    >>> d['a']['values'].append('a')
    >>> d['a']['values'].append('b')
    >>> print(d)
    defaultdict(, {'a': {'values': ['a', 'b']}, 'b': {'values': []}})
    
    qid & accept id: (24772845, 24772946) query: Python: Extract hrefs inside a div soup:

    Finding all href's inside class tab-pane fade in active:

    \n
    soup = BeautifulSoup(st)                                             \nfor a in soup.findAll('div', {"class":"tab-pane fade in active"}):   \n    for b in a.findAll('a'):                                         \n        print b.get('href')\n
    \n

    output

    \n
    /accounting?id=265\n/downloadpdf?id=265&type=pdf\n/downloadpdf?id=265&type=file\n
    \n soup wrap:

    Finding all href's inside class tab-pane fade in active:

    soup = BeautifulSoup(st)                                             
    for a in soup.findAll('div', {"class":"tab-pane fade in active"}):   
        for b in a.findAll('a'):                                         
            print b.get('href')
    

    output

    /accounting?id=265
    /downloadpdf?id=265&type=pdf
    /downloadpdf?id=265&type=file
    
    qid & accept id: (24780464, 24780515) query: Iterate through a list of numpy arrays soup:

    You define the function elsewhere, then call it within the loop. You don't define the function over and over again within the loop.

    \n
    def do_something(np_array):\n    # work on the array here\n\nfor i in list_of_array:\n    do_something(i)\n
    \n

    As a working example, lets just say I call the sum function on each array

    \n
    def total(np_array):\n    return sum(np_array)\n
    \n

    Now I can call it in the for loop

    \n
    for i in list_of_arrays:\n    print total(i)\n
    \n

    Output

    \n
    [ 0.  0.]\n[ 1.13075762  0.87658186]\n[ 2.34610724  0.77485066]\n[ 1.08704527  2.59122417]\n
    \n soup wrap:

    You define the function elsewhere, then call it within the loop. You don't define the function over and over again within the loop.

    def do_something(np_array):
        # work on the array here
    
    for i in list_of_array:
        do_something(i)
    

    As a working example, lets just say I call the sum function on each array

    def total(np_array):
        return sum(np_array)
    

    Now I can call it in the for loop

    for i in list_of_arrays:
        print total(i)
    

    Output

    [ 0.  0.]
    [ 1.13075762  0.87658186]
    [ 2.34610724  0.77485066]
    [ 1.08704527  2.59122417]
    
    qid & accept id: (24787224, 24787287) query: How to convert a python string soup:

    Don't reinvent the wheel here. Python has your back. Besides, handling escape syntax correctly, is harder than it looks.

    \n

    The correct way to handle this

    \n

    In Python 2, use the str-to-str string_escape codec:

    \n
    string.decode('string_escape')\n
    \n

    This interprets any Python-recognized string escape sequences for you, including \n and \t.

    \n

    Demo:

    \n
    >>> string = '\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '\n>>> string.decode('string_escape')\n'\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '\n>>> print string.decode('string_escape')\n\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n\n>>> '\\t\\n\\r\\xa0\\040'.decode('string_escape')\n'\t\n\r\xa0 '\n
    \n

    In Python 3, you'd have to use the codecs.decode() and the unicode_escape codec:

    \n
    codecs.decode(string, 'unicode_escape')\n
    \n

    as there is no str.decode() method and this is not a str -> bytes conversion.

    \n

    Demo:

    \n
    >>> import codecs\n>>> string = '\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '\n>>> codecs.decode(string, 'unicode_escape')\n'\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '\n>>> print(codecs.decode(string, 'unicode_escape'))\n\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n\n>>> codecs.decode('\\t\\n\\r\\xa0\\040', 'unicode_escape')\n'\t\n\r\xa0 '\n
    \n

    Why straightforward str.replace() won't cut it

    \n

    You could try to do this yourself with str.replace(), but then you also need to implement proper escape parsing; take \\\\n for example; this is \\n, escaped. If you naively apply str.replace() in sequence, you end up with \n or \\\n instead:

    \n
    >>> '\\\\n'.decode('string_escape')\n'\\n'\n>>> '\\\\n'.replace('\\n', '\n').replace('\\\\', '\\')\n'\\\n'\n>>> '\\\\n'.replace('\\\\', '\\').replace('\\n', '\n')\n'\n'\n
    \n

    The \\ pair should be replaced by just one \ characters, leaving the n uninterpreted. But the replace option either will end up replacing the trailing \ together with the n with a newline character, or you end up with \\ replaced by \, and then the \ and the n are replaced by a newline. Either way, you end up with the wrong output.

    \n

    The slow way to handle this, manually

    \n

    You'll have to process the characters one by one instead, pulling in more characters as needed:

    \n
    _map = {\n    '\\\\': '\\',\n    "\\'": "'",\n    '\\"': '"',\n    '\\a': '\a',\n    '\\b': '\b',\n    '\\f': '\f',\n    '\\n': '\n',\n    '\\r': '\r',\n    '\\t': '\t',\n}\n\ndef unescape_string(s):\n    output = []\n    i = 0\n    while i < len(s):\n        c = s[i]\n        i += 1\n        if c != '\\':\n            output.append(c)\n            continue\n        c += s[i]\n        i += 1\n        if c in _map:\n            output.append(_map[c])\n            continue\n        if c == '\\x' and i < len(s) - 2:  # hex escape\n            point = int(s[i] + s[i + 1], 16)\n            i += 2\n            output.append(chr(point))\n            continue\n        if c == '\\0':  # octal escape\n            while len(c) < 4 and i < len(s) and s[i].isdigit():\n                c += s[i]\n                i += 1\n            point = int(c[1:], 8)\n            output.append(chr(point))\n    return ''.join(output)\n
    \n

    This now can handle the \xhh and the standard 1-letter escapes, but not the \0.. octal escape sequences, or \uhhhh Unicode code points, or \N{name} unicode name references, nor does it handle malformed escapes in quite the same way as Python would.

    \n

    But it does handle the escaped escape properly:

    \n
    >>> unescape_string(string)\n'\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '\n>>> unescape_string('\\\\n')\n'\\n'\n
    \n

    Do know this is far slower than using the built-in codec.

    \n soup wrap:

    Don't reinvent the wheel here. Python has your back. Besides, handling escape syntax correctly, is harder than it looks.

    The correct way to handle this

    In Python 2, use the str-to-str string_escape codec:

    string.decode('string_escape')
    

    This interprets any Python-recognized string escape sequences for you, including \n and \t.

    Demo:

    >>> string = '\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '
    >>> string.decode('string_escape')
    '\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '
    >>> print string.decode('string_escape')
    
        this is a docstring for
        the main function.
        a,
        b,
        c
    
    >>> '\\t\\n\\r\\xa0\\040'.decode('string_escape')
    '\t\n\r\xa0 '
    

    In Python 3, you'd have to use the codecs.decode() and the unicode_escape codec:

    codecs.decode(string, 'unicode_escape')
    

    as there is no str.decode() method and this is not a str -> bytes conversion.

    Demo:

    >>> import codecs
    >>> string = '\\n    this is a docstring for\\n    the main function.\\n    a,\\n    b,\\n    c\\n    '
    >>> codecs.decode(string, 'unicode_escape')
    '\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '
    >>> print(codecs.decode(string, 'unicode_escape'))
    
        this is a docstring for
        the main function.
        a,
        b,
        c
    
    >>> codecs.decode('\\t\\n\\r\\xa0\\040', 'unicode_escape')
    '\t\n\r\xa0 '
    

    Why straightforward str.replace() won't cut it

    You could try to do this yourself with str.replace(), but then you also need to implement proper escape parsing; take \\\\n for example; this is \\n, escaped. If you naively apply str.replace() in sequence, you end up with \n or \\\n instead:

    >>> '\\\\n'.decode('string_escape')
    '\\n'
    >>> '\\\\n'.replace('\\n', '\n').replace('\\\\', '\\')
    '\\\n'
    >>> '\\\\n'.replace('\\\\', '\\').replace('\\n', '\n')
    '\n'
    

    The \\ pair should be replaced by just one \ characters, leaving the n uninterpreted. But the replace option either will end up replacing the trailing \ together with the n with a newline character, or you end up with \\ replaced by \, and then the \ and the n are replaced by a newline. Either way, you end up with the wrong output.

    The slow way to handle this, manually

    You'll have to process the characters one by one instead, pulling in more characters as needed:

    _map = {
        '\\\\': '\\',
        "\\'": "'",
        '\\"': '"',
        '\\a': '\a',
        '\\b': '\b',
        '\\f': '\f',
        '\\n': '\n',
        '\\r': '\r',
        '\\t': '\t',
    }
    
    def unescape_string(s):
        output = []
        i = 0
        while i < len(s):
            c = s[i]
            i += 1
            if c != '\\':
                output.append(c)
                continue
            c += s[i]
            i += 1
            if c in _map:
                output.append(_map[c])
                continue
            if c == '\\x' and i < len(s) - 2:  # hex escape
                point = int(s[i] + s[i + 1], 16)
                i += 2
                output.append(chr(point))
                continue
            if c == '\\0':  # octal escape
                while len(c) < 4 and i < len(s) and s[i].isdigit():
                    c += s[i]
                    i += 1
                point = int(c[1:], 8)
                output.append(chr(point))
        return ''.join(output)
    

    This now can handle the \xhh and the standard 1-letter escapes, but not the \0.. octal escape sequences, or \uhhhh Unicode code points, or \N{name} unicode name references, nor does it handle malformed escapes in quite the same way as Python would.

    But it does handle the escaped escape properly:

    >>> unescape_string(string)
    '\n    this is a docstring for\n    the main function.\n    a,\n    b,\n    c\n    '
    >>> unescape_string('\\\\n')
    '\\n'
    

    Do know this is far slower than using the built-in codec.

    qid & accept id: (24833045, 24833173) query: Get specific data from a .json file and save them to a 2D matrix/dictionary in python soup:

    You need to ceate a list outside the loop and append the values.

    \n
    final = [] # add values you want saved to final\nuniq_ident = 1\nfor name in glob.glob('/Users/jorjis/Desktop/test/*'):\n     jfile = open(name, 'r')\n     values = json.load(jfile)\n     jfile.close()\n     body1 = values['article']['description']\n     tokens = nltk.wordpunct_tokenize(body1)\n     tokens = [w.lower() for w in tokens]\n     vocab = [word for word in tokens if word not in stop]\n     final.append([uniq_ident,vocab]) # append vocab or whatever values you want to keep\n     uniq_ident += 1\n     print body1\n
    \n

    You can also use make final a dict with final = {} and use final[uniq_ident] = vocab

    \n

    If you want to keep final a list and append a dict each time use:

    \n
     final.append({uniq_ident:vocab})\n
    \n soup wrap:

    You need to ceate a list outside the loop and append the values.

    final = [] # add values you want saved to final
    uniq_ident = 1
    for name in glob.glob('/Users/jorjis/Desktop/test/*'):
         jfile = open(name, 'r')
         values = json.load(jfile)
         jfile.close()
         body1 = values['article']['description']
         tokens = nltk.wordpunct_tokenize(body1)
         tokens = [w.lower() for w in tokens]
         vocab = [word for word in tokens if word not in stop]
         final.append([uniq_ident,vocab]) # append vocab or whatever values you want to keep
         uniq_ident += 1
         print body1
    

    You can also use make final a dict with final = {} and use final[uniq_ident] = vocab

    If you want to keep final a list and append a dict each time use:

     final.append({uniq_ident:vocab})
    
    qid & accept id: (24833817, 24834757) query: Aligning two combined plots - Matplotlib soup:

    As tcaswell says, your problem may be easiest to solve by defining the extent keyword for imshow.

    \n

    If you give the extent keyword, the outermost pixel edges will be at the extents. For example:

    \n
    import matplotlib.pyplot as plt\nimport numpy as np\n\nfig = plt.figure()\nax = fig.add_subplot(111)\nax.imshow(np.random.random((8, 10)), extent=(2, 6, -1, 1), interpolation='nearest', aspect='auto')\n
    \n

    enter image description here

    \n

    Now it is easy to calculate the center of each pixel. In X direction:

    \n
      \n
    • interpixel distance is (6-2) / 10 = 0.4 pixels
    • \n
    • center of the leftmost pixel is half a pixel away from the left edge, 2 + .4/2 = 2.2
    • \n
    \n

    Similarly, the Y centers are at -.875 + n * 0.25.

    \n

    So, by tuning the extent you can get your pixel centers wherever you want them.

    \n
    \n

    An example with 20x20 data:

    \n
    import matplotlib.pyplot as plt\nimport numpy\n\n# create the data to be shown with "scatter"\nyvec, xvec = np.meshgrid(np.linspace(-4.75, 4.75, 20), np.linspace(-4.75, 4.75, 20))\nsc_data = random.random((20,20))\n\n# create the data to be shown with "imshow" (20 pixels)\nim_data = random.random((20,20))\n\nfig = plt.figure()\nax = fig.add_subplot(111)\nax.imshow(im_data, extent=[-5,5,-5,5], interpolation='nearest', cmap=plt.cm.gray)\nax.scatter(xvec, yvec, 100*sc_data)\n
    \n

    enter image description here

    \n

    Notice that here the inter-pixel distance is the same for both scatter (if you have a look at xvec, all pixels are 0.5 units apart) and imshow (as the image is stretched from -5 to +5 and has 20 pixels, the pixels are .5 units apart).

    \n soup wrap:

    As tcaswell says, your problem may be easiest to solve by defining the extent keyword for imshow.

    If you give the extent keyword, the outermost pixel edges will be at the extents. For example:

    import matplotlib.pyplot as plt
    import numpy as np
    
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.imshow(np.random.random((8, 10)), extent=(2, 6, -1, 1), interpolation='nearest', aspect='auto')
    

    enter image description here

    Now it is easy to calculate the center of each pixel. In X direction:

    • interpixel distance is (6-2) / 10 = 0.4 pixels
    • center of the leftmost pixel is half a pixel away from the left edge, 2 + .4/2 = 2.2

    Similarly, the Y centers are at -.875 + n * 0.25.

    So, by tuning the extent you can get your pixel centers wherever you want them.


    An example with 20x20 data:

    import matplotlib.pyplot as plt
    import numpy
    
    # create the data to be shown with "scatter"
    yvec, xvec = np.meshgrid(np.linspace(-4.75, 4.75, 20), np.linspace(-4.75, 4.75, 20))
    sc_data = random.random((20,20))
    
    # create the data to be shown with "imshow" (20 pixels)
    im_data = random.random((20,20))
    
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.imshow(im_data, extent=[-5,5,-5,5], interpolation='nearest', cmap=plt.cm.gray)
    ax.scatter(xvec, yvec, 100*sc_data)
    

    enter image description here

    Notice that here the inter-pixel distance is the same for both scatter (if you have a look at xvec, all pixels are 0.5 units apart) and imshow (as the image is stretched from -5 to +5 and has 20 pixels, the pixels are .5 units apart).

    qid & accept id: (24834560, 24834658) query: Matplotlib: force aspect ratio in series of plots soup:

    At least one rather robust way is to do the image scaling by using the extent keyword.

    \n
    ax.imshow(image, extent=[0, 10.4, 0, 4.2], aspect=1)\n
    \n

    This should handle everything you need. (Of course, change 10.4 and 4.2 to whatever the physical dimensions are). It should be noted that in this case the aspect=1 does not necessarily make square pixels, instead it forces the units on both axis to be equal in scale - which seems to be what you want.

    \n

    Just as a small example:

    \n
    import matplotlib.pyplot as plt\nimport numpy as np\n\nfig = plt.figure()\nax = fig.add_subplot(111)\nax.imshow(np.random.random((20, 20)), extent=(0, 10.5, 2, 4.7), aspect=1, interpolation='nearest')\n
    \n

    This gives:

    \n

    enter image description here

    \n soup wrap:

    At least one rather robust way is to do the image scaling by using the extent keyword.

    ax.imshow(image, extent=[0, 10.4, 0, 4.2], aspect=1)
    

    This should handle everything you need. (Of course, change 10.4 and 4.2 to whatever the physical dimensions are). It should be noted that in this case the aspect=1 does not necessarily make square pixels, instead it forces the units on both axis to be equal in scale - which seems to be what you want.

    Just as a small example:

    import matplotlib.pyplot as plt
    import numpy as np
    
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.imshow(np.random.random((20, 20)), extent=(0, 10.5, 2, 4.7), aspect=1, interpolation='nearest')
    

    This gives:

    enter image description here

    qid & accept id: (24886310, 24886674) query: Encoding in Python - non-English characters into a URL soup:

    You'll have to encode special characters properly, as e.g. urlencode does:

    \n
    In[16]: urllib.urlencode([('postnr',4320),('vejnavn', 'Bispegårdsvej'), ('husnr',2)])\nOut[16]: 'postnr=4320&vejnavn=Bispeg%C3%A5rdsvej&husnr=2'\n
    \n

    If you then prepend the base url to this string, this should work (I at least tried it in the browser).

    \n

    If you're open to get a third party package, requests would be a popular choice.\nIt would simplify things to:

    \n
    import requests\nresponse = requests.get('http://geo.oiorest.dk/adresser.json',\n                        params = dict(postnr=4320,\n                                      vejnavn='Bispegårdsvej',\n                                      husnr=2))\n
    \n soup wrap:

    You'll have to encode special characters properly, as e.g. urlencode does:

    In[16]: urllib.urlencode([('postnr',4320),('vejnavn', 'Bispegårdsvej'), ('husnr',2)])
    Out[16]: 'postnr=4320&vejnavn=Bispeg%C3%A5rdsvej&husnr=2'
    

    If you then prepend the base url to this string, this should work (I at least tried it in the browser).

    If you're open to get a third party package, requests would be a popular choice. It would simplify things to:

    import requests
    response = requests.get('http://geo.oiorest.dk/adresser.json',
                            params = dict(postnr=4320,
                                          vejnavn='Bispegårdsvej',
                                          husnr=2))
    
    qid & accept id: (24896943, 24897057) query: How to insert a python program into a bash script? soup:
    #!/bin/bash\n\npython - 1 2 3 << 'EOF'\nimport sys\n\nprint 'Argument List:', str(sys.argv)\nEOF\n
    \n

    Output:

    \n
    Argument List: ['-', '1', '2', '3']\n
    \n soup wrap:
    #!/bin/bash
    
    python - 1 2 3 << 'EOF'
    import sys
    
    print 'Argument List:', str(sys.argv)
    EOF
    

    Output:

    Argument List: ['-', '1', '2', '3']
    
    qid & accept id: (24900064, 24900131) query: Scrapy:newbie attempts to pass the null value soup:

    I'm not a scrapy expert but it seems that it's an empty list rather than a 'null' value (which, in python, is named None)

    \n

    You can check its length with

    \n
    if ranking_list:\n    print ranking_list \n
    \n

    or

    \n
    if len(ranking_list) > 0:\n    print ranking_list \n
    \n soup wrap:

    I'm not a scrapy expert but it seems that it's an empty list rather than a 'null' value (which, in python, is named None)

    You can check its length with

    if ranking_list:
        print ranking_list 
    

    or

    if len(ranking_list) > 0:
        print ranking_list 
    
    qid & accept id: (24915181, 24916208) query: RQ - Empty & Delete Queues soup:

    Cleanup using rq

    \n

    RQ offers methods to make any queue empty:

    \n
    >>> from redis import Redis\n>>> from rq import Queue\n>>> qfail = Queue("failed", connection=Redis())\n>>> qfail.count\n8\n>>> qfail.empty()\n8L\n>>> qfail.count\n0\n
    \n

    You can do the same for test queue, if you have it still present.

    \n

    Cleanup using rq-dashboard

    \n

    Install rq-dashboard:

    \n
    $ pip install rq-dashboard\n
    \n

    Start it:

    \n
    $ rq-dashboard\nRQ Dashboard, version 0.3.4\n * Running on http://0.0.0.0:9181/\n
    \n

    Open in browser.

    \n

    Select the queue

    \n

    Click the red button "Empty"

    \n

    And you are done.

    \n

    Python function Purge jobs

    \n

    If you run too old Redis, which fails on command used by RQ, you still might sucess with deleting\njobs by python code:

    \n

    The code takes a name of a queue, where are job ids.

    \n

    Usilg LPOP we ask for job ids by one.

    \n

    Adding prefix (by default "rq:job:") to job id we have a key, where is job stored.

    \n

    Using DEL on each key we purge our database job by job.

    \n
    >>> import redis\n>>> r = redis.StrictRedis()\n>>> qname = "rq:queue:failed"\n>>> def purgeq(r, qname):\n... while True:\n...     jid = r.lpop(qname)\n...     if jid is None:\n...         break\n...     r.delete("rq:job:" + jid)\n...     print jid\n...\n>>> purge(r, qname)\na0be3624-86c1-4dc4-bb2e-2043d2734b7b\n3796c312-9b02-4a77-be89-249aa7325c25\nca65f2b8-044c-41b5-b5ac-cefd56699758\n896f70a7-9a35-4f6b-b122-a08513022bc5\n
    \n soup wrap:

    Cleanup using rq

    RQ offers methods to make any queue empty:

    >>> from redis import Redis
    >>> from rq import Queue
    >>> qfail = Queue("failed", connection=Redis())
    >>> qfail.count
    8
    >>> qfail.empty()
    8L
    >>> qfail.count
    0
    

    You can do the same for test queue, if you have it still present.

    Cleanup using rq-dashboard

    Install rq-dashboard:

    $ pip install rq-dashboard
    

    Start it:

    $ rq-dashboard
    RQ Dashboard, version 0.3.4
     * Running on http://0.0.0.0:9181/
    

    Open in browser.

    Select the queue

    Click the red button "Empty"

    And you are done.

    Python function Purge jobs

    If you run too old Redis, which fails on command used by RQ, you still might sucess with deleting jobs by python code:

    The code takes a name of a queue, where are job ids.

    Usilg LPOP we ask for job ids by one.

    Adding prefix (by default "rq:job:") to job id we have a key, where is job stored.

    Using DEL on each key we purge our database job by job.

    >>> import redis
    >>> r = redis.StrictRedis()
    >>> qname = "rq:queue:failed"
    >>> def purgeq(r, qname):
    ... while True:
    ...     jid = r.lpop(qname)
    ...     if jid is None:
    ...         break
    ...     r.delete("rq:job:" + jid)
    ...     print jid
    ...
    >>> purge(r, qname)
    a0be3624-86c1-4dc4-bb2e-2043d2734b7b
    3796c312-9b02-4a77-be89-249aa7325c25
    ca65f2b8-044c-41b5-b5ac-cefd56699758
    896f70a7-9a35-4f6b-b122-a08513022bc5
    
    qid & accept id: (24918515, 24918596) query: How to count how many data points fall in a bin soup:

    Well, first, each of your bins is just a tuple of the start and end values of that bin, so there's no way to add anything to it. You could change each bin into, say, list of [start, stop, 0] instead of a tuple of (start, stop), or, maybe even better, an object. Or, alternatively, you could keep a separate bin_counts list, parallel to the bins list, and, e.g., zip them up when needed.

    \n

    Next, if each bin goes from i * bin_width to (i+1) * bin_width, then how do you get the i value from a data value? That's easy: the opposite of multiply is divide, so it's just data_point // bin_width.

    \n

    So:

    \n
    bin_counts = [0 for bin in bins]\nfor data_point in data_points:\n    bin_number = data_point // bin_width\n    bin_counts[bin_number] += 1\n
    \n
    \n

    Showing one of the other options, because I think you were asking about it in the comments:

    \n
    bins = [[i*bin_width, (i+1)*bin_width, 0] for i in range(num_bins)]\nfor data_point in data_points:\n    bin_number = data_point // bin_width\n    bins[bin_number][2] += 1\n
    \n

    Here, each bin is a list of [start, stop, count], instead of having a list of (start, stop) bins and a separate list of count values.

    \n soup wrap:

    Well, first, each of your bins is just a tuple of the start and end values of that bin, so there's no way to add anything to it. You could change each bin into, say, list of [start, stop, 0] instead of a tuple of (start, stop), or, maybe even better, an object. Or, alternatively, you could keep a separate bin_counts list, parallel to the bins list, and, e.g., zip them up when needed.

    Next, if each bin goes from i * bin_width to (i+1) * bin_width, then how do you get the i value from a data value? That's easy: the opposite of multiply is divide, so it's just data_point // bin_width.

    So:

    bin_counts = [0 for bin in bins]
    for data_point in data_points:
        bin_number = data_point // bin_width
        bin_counts[bin_number] += 1
    

    Showing one of the other options, because I think you were asking about it in the comments:

    bins = [[i*bin_width, (i+1)*bin_width, 0] for i in range(num_bins)]
    for data_point in data_points:
        bin_number = data_point // bin_width
        bins[bin_number][2] += 1
    

    Here, each bin is a list of [start, stop, count], instead of having a list of (start, stop) bins and a separate list of count values.

    qid & accept id: (24928086, 24928181) query: Better logging system for entire package soup:

    If you really only need a single logger, then you might as well use the root one. Rather than passing around your logger, just get it from the logging module in each of your modules:

    \n
    import logging\nlogger = logging.getLogger()\nlogger.debug('Heya')\n
    \n

    However, the common, recommended pattern, is to instead use named loggers, replacing the second line with:

    \n
    logger = logging.getLogger(__name__)\n
    \n

    This keeps the door open to easily change your logging configuration later to direct certain logs elsewhere or change their logging level, etc.

    \n soup wrap:

    If you really only need a single logger, then you might as well use the root one. Rather than passing around your logger, just get it from the logging module in each of your modules:

    import logging
    logger = logging.getLogger()
    logger.debug('Heya')
    

    However, the common, recommended pattern, is to instead use named loggers, replacing the second line with:

    logger = logging.getLogger(__name__)
    

    This keeps the door open to easily change your logging configuration later to direct certain logs elsewhere or change their logging level, etc.

    qid & accept id: (24941665, 24941675) query: How can you find where python imported a particular module from? soup:

    Each module object has a __file__ attribute:

    \n
    import module\n\nprint module.__file__\n
    \n

    Some modules are part of the Python executable; these will not have the attribute set.

    \n

    Demo:

    \n
    >>> import urllib2\n>>> urllib2.__file__\n'/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib2.pyc'\n>>> import sys\n>>> sys.__file__\nTraceback (most recent call last):\n  File "", line 1, in \nAttributeError: 'module' object has no attribute '__file__'\n
    \n

    You could also run Python in verbose mode, with the -v command line switch or the PYTHONVERBOSE environment variable; Python then prints out every imported file as it takes place.

    \n soup wrap:

    Each module object has a __file__ attribute:

    import module
    
    print module.__file__
    

    Some modules are part of the Python executable; these will not have the attribute set.

    Demo:

    >>> import urllib2
    >>> urllib2.__file__
    '/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/urllib2.pyc'
    >>> import sys
    >>> sys.__file__
    Traceback (most recent call last):
      File "", line 1, in 
    AttributeError: 'module' object has no attribute '__file__'
    

    You could also run Python in verbose mode, with the -v command line switch or the PYTHONVERBOSE environment variable; Python then prints out every imported file as it takes place.

    qid & accept id: (24994284, 24994337) query: python - replace string entry with a dictionary entry soup:

    You can use str.format() out of the box for this:

    \n
    'MATERIAL 4 {clad-den} {clad-temp} 2'.format(**data)\n
    \n

    This will look up clad-den and clad-temp as keys in the data dictionary:

    \n
    >>> data = {'clad-den': 6.55, 'clad-temp': 578.0}\n>>> 'MATERIAL 4 {clad-den} {clad-temp} 2'.format(**data)\n'MATERIAL 4 6.55 578.0 2'\n
    \n

    You can do it with re.sub() too, using a function for the replacement parameter:

    \n
    re.sub(r'{([^{}]+)}', lambda m: str(data[m.group(1)]), template_text)\n
    \n

    but this doesn't offer the same flexibility and power that string formatting can offer.

    \n soup wrap:

    You can use str.format() out of the box for this:

    'MATERIAL 4 {clad-den} {clad-temp} 2'.format(**data)
    

    This will look up clad-den and clad-temp as keys in the data dictionary:

    >>> data = {'clad-den': 6.55, 'clad-temp': 578.0}
    >>> 'MATERIAL 4 {clad-den} {clad-temp} 2'.format(**data)
    'MATERIAL 4 6.55 578.0 2'
    

    You can do it with re.sub() too, using a function for the replacement parameter:

    re.sub(r'{([^{}]+)}', lambda m: str(data[m.group(1)]), template_text)
    

    but this doesn't offer the same flexibility and power that string formatting can offer.

    qid & accept id: (25000584, 25056619) query: using key presses instead of buttons in django forms soup:

    OK, Here's what I did.

    \n

    In a new voting.html, I had this code to capture the arrow buttons the joystick was mapped to:

    \n
    \n
    \n {% for entry in voting_entry_list %} \n
  • {{ entry.text }} {{ entry.score }}
  • \n

    \n \n \n {% endfor %}\n

    \n
    \n
    \n

    Then in views.py, I used GET instead of POST to capture the up or down votes:

    \n
    def voting(request):   \ncontext = {\n  'latest_entry_list': Entry.objects.order_by('-pub_date')[:10], # simple sorting by datetime, latest first, 10 items\n  'high_entry_list': Entry.objects.order_by('-score','-pub_date')[:10], # simple sorting by score high to low, 10 items\n  'high_entry': Entry.objects.order_by('-score','-pub_date')[:1], # simple sorting by score high to low, 10 items\n  'low_entry_list': Entry.objects.order_by('score','-pub_date')[:10], # simple sorting by score low to high, 10 items\n  'voting_entry_list': Entry.objects.unvoted_or_random(), # actually one item, command from extended object manager\n}\nreturn render(request, 'entries/voting.html', context); # returns when vote is accessed\n\ndef voteup(request):\nvoting_id = request.GET.get('voteid') # voting id number is brought in as var\nif request.method=='GET': #always polling, when get votes, save and redirect to /index to refresh\n    v = Entry.objects.get(pk=voting_id) # get by voting id var\n    v.score +=1 # add one to score for voteup button\n    v.voted=True # set voted boolean to true\n    v.save() # explicit save, as is not saved with change above\nelse:\n    pass\nreturn HttpResponse('done') # Only on console \n\ndef votedown(request):\nvoting_id = request.GET.get('voteid') # voting id number is brought in as var\nif request.method=='GET': #always polling, when get votes, save and redirect to /index to refresh\n    v = Entry.objects.get(pk=voting_id) # get by voting id var\n    v.score -=1 # add one to score for voteup button\n    v.voted=True # set voted boolean to true\n    v.save() # explicit save, as is not saved with change above\nelse:\n    pass\nreturn HttpResponse('done') # Only on console\n
    \n

    This seems to avoid any issues with forms and keypresses. Since it is on a separate voting page, the transparent dummy submit button makes that selection active on refresh, as opposed to the text entry box when they were on the same page. I can access the sorted entries from the voting_entry_list, and vote up or down with separate js scripts and views.py requests for each button.

    \n

    My goal was to do this with basic django and js, not being confident with installing a load of libraries or coding extra gamepad.api states and polling, so job done!.

    \n

    This works for now as a kludge, but one that seems solid. In the future, I may try to streamline it with switch for the keypresses, and perhaps try to use POST instead of GET if that is a security issue.

    \n soup wrap:

    OK, Here's what I did.

    In a new voting.html, I had this code to capture the arrow buttons the joystick was mapped to:

    {% for entry in voting_entry_list %}
  • {{ entry.text }} {{ entry.score }}
  • {% endfor %}

    Then in views.py, I used GET instead of POST to capture the up or down votes:

    def voting(request):   
    context = {
      'latest_entry_list': Entry.objects.order_by('-pub_date')[:10], # simple sorting by datetime, latest first, 10 items
      'high_entry_list': Entry.objects.order_by('-score','-pub_date')[:10], # simple sorting by score high to low, 10 items
      'high_entry': Entry.objects.order_by('-score','-pub_date')[:1], # simple sorting by score high to low, 10 items
      'low_entry_list': Entry.objects.order_by('score','-pub_date')[:10], # simple sorting by score low to high, 10 items
      'voting_entry_list': Entry.objects.unvoted_or_random(), # actually one item, command from extended object manager
    }
    return render(request, 'entries/voting.html', context); # returns when vote is accessed
    
    def voteup(request):
    voting_id = request.GET.get('voteid') # voting id number is brought in as var
    if request.method=='GET': #always polling, when get votes, save and redirect to /index to refresh
        v = Entry.objects.get(pk=voting_id) # get by voting id var
        v.score +=1 # add one to score for voteup button
        v.voted=True # set voted boolean to true
        v.save() # explicit save, as is not saved with change above
    else:
        pass
    return HttpResponse('done') # Only on console 
    
    def votedown(request):
    voting_id = request.GET.get('voteid') # voting id number is brought in as var
    if request.method=='GET': #always polling, when get votes, save and redirect to /index to refresh
        v = Entry.objects.get(pk=voting_id) # get by voting id var
        v.score -=1 # add one to score for voteup button
        v.voted=True # set voted boolean to true
        v.save() # explicit save, as is not saved with change above
    else:
        pass
    return HttpResponse('done') # Only on console
    

    This seems to avoid any issues with forms and keypresses. Since it is on a separate voting page, the transparent dummy submit button makes that selection active on refresh, as opposed to the text entry box when they were on the same page. I can access the sorted entries from the voting_entry_list, and vote up or down with separate js scripts and views.py requests for each button.

    My goal was to do this with basic django and js, not being confident with installing a load of libraries or coding extra gamepad.api states and polling, so job done!.

    This works for now as a kludge, but one that seems solid. In the future, I may try to streamline it with switch for the keypresses, and perhaps try to use POST instead of GET if that is a security issue.

    qid & accept id: (25002150, 25002301) query: dictionary of dictionaries(nested dicts) soup:

    First, how do you create each of the sub-dicts? At the point where you have the bet_pav and kk values you want, just do this:

    \n
    subdict = {bet_pav: [kk]}\n
    \n

    Now, how do you add them each to the main dict? Well, it looks like you wanted each pav to map to a tuple of sub-dicts, or maybe a list of them, but forgot the parentheses or square brackets. Either way, what you want to do is append to the existing tuple/list for each new sub-dict, starting with an empty tuple/list if this is the first sub-dict you've gotten. This is easier to do with a list than a tuple, so let's do that:

    \n
    pavdict.setdefault(pav, []).append(subdict)\n
    \n

    Or, if you prefer, you can just create pavdict as a defaultdict(list), and then this line is just:

    \n
    pavdict[pav].append(subdict)\n
    \n
    \n

    Or, even better, if you think of this as constructing a dictionary all at once, instead of building it up by mutating insertions, you can write it as a comprehension:

    \n
    def get_subdicts(i):\n    body = i.find_all("div", id=lambda x: x and x.startswith('game-wrapper-s-'))\n    for bet in body:\n        yield {get_bets_pav(bet): get_cof(bet)}\n\npavdict = {get_main_pav(i): tuple(get_subdicts(i)) for i in kofai}\n
    \n soup wrap:

    First, how do you create each of the sub-dicts? At the point where you have the bet_pav and kk values you want, just do this:

    subdict = {bet_pav: [kk]}
    

    Now, how do you add them each to the main dict? Well, it looks like you wanted each pav to map to a tuple of sub-dicts, or maybe a list of them, but forgot the parentheses or square brackets. Either way, what you want to do is append to the existing tuple/list for each new sub-dict, starting with an empty tuple/list if this is the first sub-dict you've gotten. This is easier to do with a list than a tuple, so let's do that:

    pavdict.setdefault(pav, []).append(subdict)
    

    Or, if you prefer, you can just create pavdict as a defaultdict(list), and then this line is just:

    pavdict[pav].append(subdict)
    

    Or, even better, if you think of this as constructing a dictionary all at once, instead of building it up by mutating insertions, you can write it as a comprehension:

    def get_subdicts(i):
        body = i.find_all("div", id=lambda x: x and x.startswith('game-wrapper-s-'))
        for bet in body:
            yield {get_bets_pav(bet): get_cof(bet)}
    
    pavdict = {get_main_pav(i): tuple(get_subdicts(i)) for i in kofai}
    
    qid & accept id: (25002459, 25002488) query: Writing variables with .write() Python 3 soup:

    You can format your data into a string; with string formatting, for example:

    \n
    firstline = "Log Created: {}/nLog deleted and recreated.".format(grab_date)\n
    \n

    You can also make use of print()'s ability to convert all arguments to a string and automatic newlines, and have it write to the file:

    \n
    print("Log Created:", grab_date, file=f)\nprint("Log deleted and recreated.", file=f)\n
    \n

    If you can avoid it, don't reinvent the logging wheel and use the logging module. It can be configured to take a different date format:

    \n
    >>> import logging\n>>> logging.basicConfig(datefmt="%A %d, %B %Y %I:%M:%S %p %Z", format='Log Created: %(asctime)-15s %(message)s')\n>>> logging.warn('Foo bar baz!')\nLog Created: Monday 28, July 2014 08:13:44 PM BST Foo bar baz!\n
    \n soup wrap:

    You can format your data into a string; with string formatting, for example:

    firstline = "Log Created: {}/nLog deleted and recreated.".format(grab_date)
    

    You can also make use of print()'s ability to convert all arguments to a string and automatic newlines, and have it write to the file:

    print("Log Created:", grab_date, file=f)
    print("Log deleted and recreated.", file=f)
    

    If you can avoid it, don't reinvent the logging wheel and use the logging module. It can be configured to take a different date format:

    >>> import logging
    >>> logging.basicConfig(datefmt="%A %d, %B %Y %I:%M:%S %p %Z", format='Log Created: %(asctime)-15s %(message)s')
    >>> logging.warn('Foo bar baz!')
    Log Created: Monday 28, July 2014 08:13:44 PM BST Foo bar baz!
    
    qid & accept id: (25024087, 25024190) query: Mimic curl in python soup:

    One option would be to use requests:

    \n
    import requests\n\nurl = "http://geocoding.geo.census.gov/geocoder/locations/addressbatch"\ndata = {'benchmark': 'Public_AR_Census2010'}\nfiles = {'addressFile': open('t.csv')}\n\nresponse = requests.post(url, data=data, files=files)\nprint response.content\n
    \n

    prints:

    \n
    "1"," 800 Wilshire Blvd,  Los Angeles,  CA,  90017","Match","Exact","800 Wilshire Blvd, LOS ANGELES, CA, 90017","-118.25818,34.049366","141617176","L"\n
    \n
    \n

    In case you need to handle the csv data in memory, initialize a StringIO buffer:

    \n
    from StringIO import StringIO\nimport requests\n\ncsv_data = "1, 800 Wilshire Blvd, Los Angeles, CA, 90017"\nbuffer = StringIO()\nbuffer.write(csv_data)\nbuffer.seek(0)\n\nurl = "http://geocoding.geo.census.gov/geocoder/locations/addressbatch"\ndata = {'benchmark': 'Public_AR_Census2010'}\nfiles = {'addressFile': buffer}\n\nresponse = requests.post(url, data=data, files=files)\nprint response.content\n
    \n

    This prints the same result as is with using a real file.

    \n soup wrap:

    One option would be to use requests:

    import requests
    
    url = "http://geocoding.geo.census.gov/geocoder/locations/addressbatch"
    data = {'benchmark': 'Public_AR_Census2010'}
    files = {'addressFile': open('t.csv')}
    
    response = requests.post(url, data=data, files=files)
    print response.content
    

    prints:

    "1"," 800 Wilshire Blvd,  Los Angeles,  CA,  90017","Match","Exact","800 Wilshire Blvd, LOS ANGELES, CA, 90017","-118.25818,34.049366","141617176","L"
    

    In case you need to handle the csv data in memory, initialize a StringIO buffer:

    from StringIO import StringIO
    import requests
    
    csv_data = "1, 800 Wilshire Blvd, Los Angeles, CA, 90017"
    buffer = StringIO()
    buffer.write(csv_data)
    buffer.seek(0)
    
    url = "http://geocoding.geo.census.gov/geocoder/locations/addressbatch"
    data = {'benchmark': 'Public_AR_Census2010'}
    files = {'addressFile': buffer}
    
    response = requests.post(url, data=data, files=files)
    print response.content
    

    This prints the same result as is with using a real file.

    qid & accept id: (25025291, 25026016) query: Binning data based on one column in 2D array and estimate mean in each bin using cython soup:

    I don't think you need cython for this, I think you're looking for numpy.bincount. Here is an example:

    \n
    import numpy as np\nd = np.random.random(10**5)\nnumbins = 10\n\nbins = np.linspace(d.min(), d.max(), numbins+1)\n# This line is not necessary, but without it the smallest bin only has 1 value.\nbins = bins[1:]\ndigitized = bins.searchsorted(d)\n\nbin_means = (np.bincount(digitized, weights=d, minlength=numbins) /\n             np.bincount(digitized, minlength=numbins))\n
    \n

    Update

    \n

    Lets take a second to discuss why the above code is faster than the code in your question and why cython will (probably) not help much in this case. In your code when you do [digitized == i] for i in range(numbins)], you're doing numbins passes over the digitized array. If you're familiar with big O notation, that's O(n * m). On the other hand bincount does something a little different. Bincount is equivelent, more or less, to:

    \n
    def bincount(digitized, Weights):\n   out = zeros(digitized.max() + 1)\n   for i, w = zip(digitized, Weights):\n       out[i] += w\n   return out\n
    \n

    It has 1 pass (well 2 passes if you count the max) over digitized so it has complexity O(n). Also bincount is already written in C and compiled so it already has very little overhead and is very fast. Cython is most helpful when you have code which has a lot of interpreter and type-check overhead so that declaring types and compiling the code removes that overhead. Hope that's helpful.

    \n soup wrap:

    I don't think you need cython for this, I think you're looking for numpy.bincount. Here is an example:

    import numpy as np
    d = np.random.random(10**5)
    numbins = 10
    
    bins = np.linspace(d.min(), d.max(), numbins+1)
    # This line is not necessary, but without it the smallest bin only has 1 value.
    bins = bins[1:]
    digitized = bins.searchsorted(d)
    
    bin_means = (np.bincount(digitized, weights=d, minlength=numbins) /
                 np.bincount(digitized, minlength=numbins))
    

    Update

    Lets take a second to discuss why the above code is faster than the code in your question and why cython will (probably) not help much in this case. In your code when you do [digitized == i] for i in range(numbins)], you're doing numbins passes over the digitized array. If you're familiar with big O notation, that's O(n * m). On the other hand bincount does something a little different. Bincount is equivelent, more or less, to:

    def bincount(digitized, Weights):
       out = zeros(digitized.max() + 1)
       for i, w = zip(digitized, Weights):
           out[i] += w
       return out
    

    It has 1 pass (well 2 passes if you count the max) over digitized so it has complexity O(n). Also bincount is already written in C and compiled so it already has very little overhead and is very fast. Cython is most helpful when you have code which has a lot of interpreter and type-check overhead so that declaring types and compiling the code removes that overhead. Hope that's helpful.

    qid & accept id: (25030548, 25030756) query: Passing more than two arguments in reduce function soup:

    I think what you're looking for is partial function application, which you can do using functools.

    \n
    def apply_something(something, config, some_var):\n    pass  # ...\n\nimport functools\n\nreduce(functools.partial(apply_something, some_var=True), \n       [1, 2, 3], something_initializer)\n
    \n

    Example:

    \n
    >>> def foo(a, b, c):\n...     return a + b if c else a * b\n\n>>> reduce(functools.partial(foo, c=True), [1,2,3,4,5], 0)\n15\n\n>>> reduce(functools.partial(foo, c=False), [1,2,3,4,5], 1)\n120\n
    \n soup wrap:

    I think what you're looking for is partial function application, which you can do using functools.

    def apply_something(something, config, some_var):
        pass  # ...
    
    import functools
    
    reduce(functools.partial(apply_something, some_var=True), 
           [1, 2, 3], something_initializer)
    

    Example:

    >>> def foo(a, b, c):
    ...     return a + b if c else a * b
    
    >>> reduce(functools.partial(foo, c=True), [1,2,3,4,5], 0)
    15
    
    >>> reduce(functools.partial(foo, c=False), [1,2,3,4,5], 1)
    120
    
    qid & accept id: (25047254, 25047530) query: parse blocks of text from text file using Python soup:

    Could be done in a one-liner...

    \n
    open(files[0]).read().split('1:', 1)[1].split('\n')[:19]\n
    \n

    or more readable

    \n
    txt = open(files[0]).read()           # read the file into a big string\nbefore, after = txt.split('1:', 1)    # split the file on the first "1:"\nafter_lines = after.split('\n')       # create lines from the after text\nlines_to_save = after_lines[:19]      # grab the first 19 lines after "1:"\n
    \n

    then join the lines with a newline (and add a newline to the end) before writing it to a new file:

    \n
    out_text = "1:"                       # add back "1:"\nout_text += "\n".join(lines_to_save)  # add all 19 lines with newlines between them\nout_text += "\n"                      # add a newline at the end\n\nopen("outputfile.txt", "w").write(out_text)\n
    \n

    to comply with best practice for reading and writing files you should also be using the with statement to ensure that the file handles are closed as soon as possible. You can create convenience functions for it:

    \n
    def read_file(fname):\n    "Returns contents of file with name `fname`."\n    with open(fname) as fp:\n         return fp.read()\n\ndef write_file(fname, txt):\n    "Writes `txt` to a file named `fname`."\n    with open(fname, 'w') as fp:\n         fp.write(txt)\n
    \n

    then you can replace the first line above with:

    \n
    txt = read_file(files[0])\n
    \n

    and the last line with:

    \n
    write_file("outputfile.txt", out_text)\n
    \n soup wrap:

    Could be done in a one-liner...

    open(files[0]).read().split('1:', 1)[1].split('\n')[:19]
    

    or more readable

    txt = open(files[0]).read()           # read the file into a big string
    before, after = txt.split('1:', 1)    # split the file on the first "1:"
    after_lines = after.split('\n')       # create lines from the after text
    lines_to_save = after_lines[:19]      # grab the first 19 lines after "1:"
    

    then join the lines with a newline (and add a newline to the end) before writing it to a new file:

    out_text = "1:"                       # add back "1:"
    out_text += "\n".join(lines_to_save)  # add all 19 lines with newlines between them
    out_text += "\n"                      # add a newline at the end
    
    open("outputfile.txt", "w").write(out_text)
    

    to comply with best practice for reading and writing files you should also be using the with statement to ensure that the file handles are closed as soon as possible. You can create convenience functions for it:

    def read_file(fname):
        "Returns contents of file with name `fname`."
        with open(fname) as fp:
             return fp.read()
    
    def write_file(fname, txt):
        "Writes `txt` to a file named `fname`."
        with open(fname, 'w') as fp:
             fp.write(txt)
    

    then you can replace the first line above with:

    txt = read_file(files[0])
    

    and the last line with:

    write_file("outputfile.txt", out_text)
    
    qid & accept id: (25057912, 25080492) query: Automatically numbering and referencing Sphinx tables soup:

    OK, I've found an answer to the first part of my question. Actually it's a nobrainer:

    \n
    .. _table:\n\n.. table Supertable\n\n    +--------+----+\n    |Foo     |Bar |\n    +--------+----+\n
    \n

    And then:

    \n
    :ref:`table`\n
    \n

    As for enumerated tables I have actually seen enumerated figures, not tables and it gets done in LaTeX output. I looked around and haven't found any trace of automatically enumerated tables in Sphinx. It would probably make a good feature request, but for now there seems to be no such feature.

    \n

    PS: I've checked and actually tables are also enumerated in LaTeX. There is also a related problem discussed in this question.

    \n soup wrap:

    OK, I've found an answer to the first part of my question. Actually it's a nobrainer:

    .. _table:
    
    .. table Supertable
    
        +--------+----+
        |Foo     |Bar |
        +--------+----+
    

    And then:

    :ref:`table`
    

    As for enumerated tables I have actually seen enumerated figures, not tables and it gets done in LaTeX output. I looked around and haven't found any trace of automatically enumerated tables in Sphinx. It would probably make a good feature request, but for now there seems to be no such feature.

    PS: I've checked and actually tables are also enumerated in LaTeX. There is also a related problem discussed in this question.

    qid & accept id: (25068002, 25068220) query: Import object from module of same name using __import__ soup:

    You need to call something equivalent to :

    \n
    obj = __import__("mymod", ...).mymod\n
    \n

    to reproduce

    \n
    from mymod import mymod\n
    \n

    Getting the attribute of an object by its name can be done using

    \n
    getattr(obj, 'mymod')\n# or\nobj.__dict__['mymod']\n# or\nvars(obj)['mymod']\n
    \n

    Pick one. (I would go for getattr if I really had to).

    \n
    obj = getattr(__import__(mod_name, globals=globals(), locals=locals(), fromlist=(obj_name), level=1), mod_name)\n
    \n soup wrap:

    You need to call something equivalent to :

    obj = __import__("mymod", ...).mymod
    

    to reproduce

    from mymod import mymod
    

    Getting the attribute of an object by its name can be done using

    getattr(obj, 'mymod')
    # or
    obj.__dict__['mymod']
    # or
    vars(obj)['mymod']
    

    Pick one. (I would go for getattr if I really had to).

    obj = getattr(__import__(mod_name, globals=globals(), locals=locals(), fromlist=(obj_name), level=1), mod_name)
    
    qid & accept id: (25068218, 25068276) query: python BeautifulSoup how get values between tags? soup:

    You can iterate over the result of .find_next_siblings() that are

      elements:

      \n
      from itertools import takewhile, ifilter\n\ndiv = soup.find('div', class_='layout4-background')\nfor header in div.find_all('h6'):\n    print header.get_text()\n    listings = takewhile(lambda t: t.name == 'ul',\n                         header.find_next_siblings(text=False))\n    for listing in listings:\n        # do something with listing\n
      \n

      The find_next_siblings() search finds all nodes that are not just text nodes (skipping whitespace in between).\nThe itertools.takewhile() iterable lets you pick just the next elements that are all

        tags.

        \n

        Demo:

        \n
        >>> from bs4 import BeautifulSoup\n>>> from itertools import takewhile\n>>> soup = BeautifulSoup('''\\n... 
        \n...
        Game1. How to get all listings below and assign to class"game"?
        \n...
          \n...
        • \n...
        \n...
          \n...
        • \n...
        \n...
          \n...
        • \n...
        \n...
        Game2. How to get all listings below and assign to class"game?
        \n...
          \n...
        • \n...
        \n...
        Game3. How to get all listings below and assign to class"game?
        \n...
          \n...
        • \n...
        \n...
        \n... ''')\n>>> div = soup.find('div', class_='layout4-background')\n>>> for header in div.find_all('h6'):\n... print header.get_text()\n... listings = takewhile(lambda t: t.name == 'ul',\n... header.find_next_siblings(text=False))\n... print 'Listings found:', len(list(listings))\n... \nGame1. How to get all listings below and assign to class"game"?\nListings found: 3\nGame2. How to get all listings below and assign to class"game?\nListings found: 1\nGame3. How to get all listings below and assign to class"game?\nListings found: 1\n
        \n soup wrap:

        You can iterate over the result of .find_next_siblings() that are

          elements:

          from itertools import takewhile, ifilter
          
          div = soup.find('div', class_='layout4-background')
          for header in div.find_all('h6'):
              print header.get_text()
              listings = takewhile(lambda t: t.name == 'ul',
                                   header.find_next_siblings(text=False))
              for listing in listings:
                  # do something with listing
          

          The find_next_siblings() search finds all nodes that are not just text nodes (skipping whitespace in between). The itertools.takewhile() iterable lets you pick just the next elements that are all

            tags.

            Demo:

            >>> from bs4 import BeautifulSoup
            >>> from itertools import takewhile
            >>> soup = BeautifulSoup('''\
            ... 
            ...
            Game1. How to get all listings below and assign to class"game"?
            ...
              ...
            • ...
            ...
              ...
            • ...
            ...
              ...
            • ...
            ...
            Game2. How to get all listings below and assign to class"game?
            ...
              ...
            • ...
            ...
            Game3. How to get all listings below and assign to class"game?
            ...
              ...
            • ...
            ...
            ... ''') >>> div = soup.find('div', class_='layout4-background') >>> for header in div.find_all('h6'): ... print header.get_text() ... listings = takewhile(lambda t: t.name == 'ul', ... header.find_next_siblings(text=False)) ... print 'Listings found:', len(list(listings)) ... Game1. How to get all listings below and assign to class"game"? Listings found: 3 Game2. How to get all listings below and assign to class"game? Listings found: 1 Game3. How to get all listings below and assign to class"game? Listings found: 1
            qid & accept id: (25088066, 25095113) query: How to scrape table with different xpath on the same level with Scrapy? soup:

            What you can do is select all of the nodes and loop through them while checking whether the current node is a div or a table.

            \n

            Using this as my test case,

            \n
            \n
            04.09.2013
            \n 1
            \n 2
            \n
            05.10.2013
            \n 3
            \n 4
            \n 5
            \n 6
            \n
            \n
            \n

            I use the following to loop through the nodes and updating which div the current node is currently "under" in.

            \n
            currdiv = None\nmydict = {}\nfor e in sel.xpath('//div[@class="asdf"]/*'):\n    if bool(int(e.xpath('@class="button-left"').extract()[0])):\n        currdiv = e.xpath('text()').extract()[0]\n        mydict[currdiv] = []\n    elif currdiv is not None:\n        mydict[currdiv] += e.xpath('text()').extract()\n
            \n

            This results into:

            \n
            {u'04.09.2013': [u'1', u'2'], u'05.10.2013': [u'3', u'4', u'5', u'6']}\n
            \n soup wrap:

            What you can do is select all of the nodes and loop through them while checking whether the current node is a div or a table.

            Using this as my test case,

            04.09.2013
            1
            2
            05.10.2013
            3
            4
            5
            6

            I use the following to loop through the nodes and updating which div the current node is currently "under" in.

            currdiv = None
            mydict = {}
            for e in sel.xpath('//div[@class="asdf"]/*'):
                if bool(int(e.xpath('@class="button-left"').extract()[0])):
                    currdiv = e.xpath('text()').extract()[0]
                    mydict[currdiv] = []
                elif currdiv is not None:
                    mydict[currdiv] += e.xpath('text()').extract()
            

            This results into:

            {u'04.09.2013': [u'1', u'2'], u'05.10.2013': [u'3', u'4', u'5', u'6']}
            
            qid & accept id: (25100105, 25100142) query: How to create a object of variables, and return it, in Python? soup:

            You could create a class:

            \n
            class Disk(object):\n    def __init__(self, test1=None, test2=None, test3=None):\n        self.test1 = test1\n        self.test2 = test2\n        self.test3 = test3\n
            \n

            And then create an instance of it:

            \n
            mydist = Disk()\nmydist.test1 = "value"\n# And so on\n
            \n

            You could also use a dictionary (for your example):\n disk = {}\n disk["test1"] = "value"\n disk["test2"] = "value1"\n disk["test3"] = "value2"

            \n

            Unlike PHP and Javascript, Python isn't a Prototype-Based Language. This means you have to define classes to create objects, and you can't create generic "object" instances and add properties which weren't in the class definition.

            \n

            You can, however, add properties to instances of your own classes. For example:

            \n
            class Example(object):\n    pass\n\nmyobject = Example()\nmyobject.a = "value"\n
            \n soup wrap:

            You could create a class:

            class Disk(object):
                def __init__(self, test1=None, test2=None, test3=None):
                    self.test1 = test1
                    self.test2 = test2
                    self.test3 = test3
            

            And then create an instance of it:

            mydist = Disk()
            mydist.test1 = "value"
            # And so on
            

            You could also use a dictionary (for your example): disk = {} disk["test1"] = "value" disk["test2"] = "value1" disk["test3"] = "value2"

            Unlike PHP and Javascript, Python isn't a Prototype-Based Language. This means you have to define classes to create objects, and you can't create generic "object" instances and add properties which weren't in the class definition.

            You can, however, add properties to instances of your own classes. For example:

            class Example(object):
                pass
            
            myobject = Example()
            myobject.a = "value"
            
            qid & accept id: (25104929, 25130874) query: Initiating TCP Client after running reactor.run() soup:

            The crucial problem is that both Tkinter and Twisted solve similar problems in similar ways, namely, reacting asyncronously to external events. The fact that Tkinter is focused on gui events and Twitsted on network events is of only passing importance.

            \n

            The specific thing they do is that they have a "main loop" structure, a sort of point of no return from which you lose control. In the case of twisted, that's usually reactor.run(), and in tkinter, that'll be Tkinter.mainloop(). Both will not return until the program exits.

            \n

            Fortunately, you can get Twisted to manage tk's event loop for you! At the begining of your program, you should add:

            \n
            from Tkinter import Tk\nfrom twisted.internet import tksupport\nroot_window = Tk()\ntksupport.install(root_window)\n
            \n

            then, Once you've created your gui as normal, you should not call Tkinter.mainloop(), use:

            \n
            from twisted.internet import reactor\nroot_window.protocol("WM_DELETE_WINDOW", reactor.stop)\nreactor.run()\n
            \n

            The odd bit with Tk.protocol() is optional, but will get rid of some gruesome exceptions by shutting the reactor normally when the gui tries to exit.

            \n
            \n

            In case that's not quite enough, here's some real, working code! First a really simple server

            \n
            from twisted.internet.protocol import Protocol, Factory\nfrom twisted.internet import reactor\n\nclass Echo(Protocol):\n    def dataReceived(self, data):\n        print 'recieved:', data\n    def connectionLost(self, reason):\n        print 'connection closed', reason\n\nf = Factory()\nf.protocol = Echo\nreactor.listenTCP(8080, f)\nreactor.run()\n
            \n

            and a client, with a gui and network activity:

            \n
            from Tkinter import *\nfrom twisted.internet import tksupport, reactor\nmaster = Tk()\ntksupport.install(master)\n\ndef send_message():\n    message = e1.get()\n    reactor.connectTCP("localhost", 8080, MessageCFactory(message))\n    print("message: %s" % (message))\n\nLabel(master, text="Message").grid(row=0)\ne1 = Entry(master)\ne1.grid(row=0, column=1)\nButton(master, text='Send', command=send_message).grid(row=3, column=1, sticky=W, pady=4)\n\nfrom twisted.internet.protocol import ClientFactory, Protocol\nfrom twisted.internet import reactor\n\nclass MessageCProto(Protocol):\n    def connectionMade(self):\n        self.transport.write(self.factory.message)\n        self.transport.loseConnection()\n\nclass MessageCFactory(ClientFactory):\n    protocol = MessageCProto\n\n    def __init__(self, message):\n        self.message = message\n\nmaster.protocol("WM_DELETE_WINDOW", reactor.stop)\nreactor.run()\n
            \n soup wrap:

            The crucial problem is that both Tkinter and Twisted solve similar problems in similar ways, namely, reacting asyncronously to external events. The fact that Tkinter is focused on gui events and Twitsted on network events is of only passing importance.

            The specific thing they do is that they have a "main loop" structure, a sort of point of no return from which you lose control. In the case of twisted, that's usually reactor.run(), and in tkinter, that'll be Tkinter.mainloop(). Both will not return until the program exits.

            Fortunately, you can get Twisted to manage tk's event loop for you! At the begining of your program, you should add:

            from Tkinter import Tk
            from twisted.internet import tksupport
            root_window = Tk()
            tksupport.install(root_window)
            

            then, Once you've created your gui as normal, you should not call Tkinter.mainloop(), use:

            from twisted.internet import reactor
            root_window.protocol("WM_DELETE_WINDOW", reactor.stop)
            reactor.run()
            

            The odd bit with Tk.protocol() is optional, but will get rid of some gruesome exceptions by shutting the reactor normally when the gui tries to exit.


            In case that's not quite enough, here's some real, working code! First a really simple server

            from twisted.internet.protocol import Protocol, Factory
            from twisted.internet import reactor
            
            class Echo(Protocol):
                def dataReceived(self, data):
                    print 'recieved:', data
                def connectionLost(self, reason):
                    print 'connection closed', reason
            
            f = Factory()
            f.protocol = Echo
            reactor.listenTCP(8080, f)
            reactor.run()
            

            and a client, with a gui and network activity:

            from Tkinter import *
            from twisted.internet import tksupport, reactor
            master = Tk()
            tksupport.install(master)
            
            def send_message():
                message = e1.get()
                reactor.connectTCP("localhost", 8080, MessageCFactory(message))
                print("message: %s" % (message))
            
            Label(master, text="Message").grid(row=0)
            e1 = Entry(master)
            e1.grid(row=0, column=1)
            Button(master, text='Send', command=send_message).grid(row=3, column=1, sticky=W, pady=4)
            
            from twisted.internet.protocol import ClientFactory, Protocol
            from twisted.internet import reactor
            
            class MessageCProto(Protocol):
                def connectionMade(self):
                    self.transport.write(self.factory.message)
                    self.transport.loseConnection()
            
            class MessageCFactory(ClientFactory):
                protocol = MessageCProto
            
                def __init__(self, message):
                    self.message = message
            
            master.protocol("WM_DELETE_WINDOW", reactor.stop)
            reactor.run()
            
            qid & accept id: (25137972, 25160810) query: Weighted random choice from a variable length text file soup:

            The accepted answer does not appear to be aligned with the OP's requirements as written (although it might actually be so) so here is another answer that approaches the general problem of randomly selecting a line from a file with weighted probabilities. This comes from the random module examples in the Python 3 documentation.

            \n

            In this case, line 1 of a file is to be selected with greater probability than the last line, and with reducing probability for intervening lines, so our weights would be range(n, 0, -1) where n is the number of lines in the file, e.g. if there were 5 lines in the file, then the weights would be [5, 4, 3, 2, 1] and this would correspond to probabilities of:

            \n
            weights = range(5, 0, -1)\ntotal_weights = float(sum(weights))\nprobabilities = [w/total_weights for w in weights]\n>>> [round(p, 5) for p in probabilities]    # rounded for readability\n[0.33333, 0.26667, 0.2, 0.13333, 0.06667]\n
            \n

            So the first line has probability 5 times greater than the last line, with reducing probability for each line.

            \n

            Next we need to construct a cumulative distribution based on the weights, select a random value within that distribution, locate the random value within the distribution, and use that to retrieve a line from the file. Here is some code that does that.

            \n
            import bisect\nimport random\ntry:\n    from itertools import accumulate     # Python >= 3.2\nexcept ImportError:\n    def accumulate(weights):\n        accumulator = 0\n        for w in weights:\n            accumulator += w\n            yield accumulator\n\ndef count(iterable):\n    return sum(1 for elem in iterable)\n\ndef get_nth(iterable, n):\n    assert isinstance(n, int), "n must be an integer, got %r" % type(n)\n    assert n > 0, "n must be greater than 0, got %r" % n\n    for i, elem in enumerate(iterable, 1):\n        if i == n:\n            return elem\n\ndef weighted_select(filename):\n    with open(filename) as f:\n        n = count(f)\n        if n == 0:\n            return None\n\n        # set up cumulative distribution\n        weights = range(n, 0, -1)\n        cumulative_dist = list(accumulate(weights))\n\n        # select line number\n        x = random.random() * cumulative_dist[-1]\n        selected_line = bisect.bisect(cumulative_dist, x)\n\n        # retrieve line from file\n        f.seek(0)\n        return get_nth(f, selected_line + 1)    # N.B. +1 for nth line\n
            \n

            This uses weights according to my interpretation of the question. It's easy enough to adapt this to other weights, e.g. if you wanted a weighted select with city population as the weights, you'd just change weights = range(n, 0, -1) to a list of populations corresponding to each line in the file.

            \n soup wrap:

            The accepted answer does not appear to be aligned with the OP's requirements as written (although it might actually be so) so here is another answer that approaches the general problem of randomly selecting a line from a file with weighted probabilities. This comes from the random module examples in the Python 3 documentation.

            In this case, line 1 of a file is to be selected with greater probability than the last line, and with reducing probability for intervening lines, so our weights would be range(n, 0, -1) where n is the number of lines in the file, e.g. if there were 5 lines in the file, then the weights would be [5, 4, 3, 2, 1] and this would correspond to probabilities of:

            weights = range(5, 0, -1)
            total_weights = float(sum(weights))
            probabilities = [w/total_weights for w in weights]
            >>> [round(p, 5) for p in probabilities]    # rounded for readability
            [0.33333, 0.26667, 0.2, 0.13333, 0.06667]
            

            So the first line has probability 5 times greater than the last line, with reducing probability for each line.

            Next we need to construct a cumulative distribution based on the weights, select a random value within that distribution, locate the random value within the distribution, and use that to retrieve a line from the file. Here is some code that does that.

            import bisect
            import random
            try:
                from itertools import accumulate     # Python >= 3.2
            except ImportError:
                def accumulate(weights):
                    accumulator = 0
                    for w in weights:
                        accumulator += w
                        yield accumulator
            
            def count(iterable):
                return sum(1 for elem in iterable)
            
            def get_nth(iterable, n):
                assert isinstance(n, int), "n must be an integer, got %r" % type(n)
                assert n > 0, "n must be greater than 0, got %r" % n
                for i, elem in enumerate(iterable, 1):
                    if i == n:
                        return elem
            
            def weighted_select(filename):
                with open(filename) as f:
                    n = count(f)
                    if n == 0:
                        return None
            
                    # set up cumulative distribution
                    weights = range(n, 0, -1)
                    cumulative_dist = list(accumulate(weights))
            
                    # select line number
                    x = random.random() * cumulative_dist[-1]
                    selected_line = bisect.bisect(cumulative_dist, x)
            
                    # retrieve line from file
                    f.seek(0)
                    return get_nth(f, selected_line + 1)    # N.B. +1 for nth line
            

            This uses weights according to my interpretation of the question. It's easy enough to adapt this to other weights, e.g. if you wanted a weighted select with city population as the weights, you'd just change weights = range(n, 0, -1) to a list of populations corresponding to each line in the file.

            qid & accept id: (25141467, 25144062) query: Numpy: Efficient Way To Extract Subarray soup:

            Benchmark it!

            \n
            import numpy as np\n\n# some data\nA = np.random.random((250000, 30))\n\n# some random indices\nx = np.random.randint(0, 250000, 150000)\ny = np.random.randint(0, 30, 10)\n\ndef method1(A, x, y):\n    return A[x[:, np.newaxis], y]\n\ndef method2(A, x, y):\n    return A[np.ix_(x,y)]\n\ndef method3(A, x, y):\n    return A[x][:,y]\n\ndef method4(A, x, y):\n    return A[:,y][x]\n
            \n

            These three methods give the following benchmarks:

            \n
            method1: 87.7 ms\nmethod2: 89.2 ms\nmethod3: 115 ms\nmethod4: 141 ms\n
            \n

            So, the answer is that there is not real difference between the two methods in the question.

            \n soup wrap:

            Benchmark it!

            import numpy as np
            
            # some data
            A = np.random.random((250000, 30))
            
            # some random indices
            x = np.random.randint(0, 250000, 150000)
            y = np.random.randint(0, 30, 10)
            
            def method1(A, x, y):
                return A[x[:, np.newaxis], y]
            
            def method2(A, x, y):
                return A[np.ix_(x,y)]
            
            def method3(A, x, y):
                return A[x][:,y]
            
            def method4(A, x, y):
                return A[:,y][x]
            

            These three methods give the following benchmarks:

            method1: 87.7 ms
            method2: 89.2 ms
            method3: 115 ms
            method4: 141 ms
            

            So, the answer is that there is not real difference between the two methods in the question.

            qid & accept id: (25217124, 25218391) query: Getting stats about each row and putting them into a new column. Pandas soup:

            Here are solutions using apply.

            \n
            df['count of not x'] = df.apply(lambda x: (x[['y','z']] != x['x']).sum(), axis=1)\ndf['unique'] = df.apply(lambda x: x[['x','y','z']].nunique(), axis=1)\n
            \n

            A non-apply solution for getting count of not x:

            \n
            df['count of not x'] = (~df[['y','z']].isin(df['x'])).sum(1)\n
            \n

            Can't think of anything great for unique. This uses apply, but may be faster, depending on the shape of the data.

            \n
            df['unique'] = df[['x','y','z']].T.apply(lambda x: x.nunique())\n
            \n soup wrap:

            Here are solutions using apply.

            df['count of not x'] = df.apply(lambda x: (x[['y','z']] != x['x']).sum(), axis=1)
            df['unique'] = df.apply(lambda x: x[['x','y','z']].nunique(), axis=1)
            

            A non-apply solution for getting count of not x:

            df['count of not x'] = (~df[['y','z']].isin(df['x'])).sum(1)
            

            Can't think of anything great for unique. This uses apply, but may be faster, depending on the shape of the data.

            df['unique'] = df[['x','y','z']].T.apply(lambda x: x.nunique())
            
            qid & accept id: (25233885, 25233901) query: How to chain django query filters to conditionally filter by certain criteria soup:

            This line has no effect:

            \n
            user_profiles.filter(gender=gender)\n
            \n

            you must reassign the result:

            \n
            user_profiles = user_profiles.filter(gender=gender)\n
            \n soup wrap:

            This line has no effect:

            user_profiles.filter(gender=gender)
            

            you must reassign the result:

            user_profiles = user_profiles.filter(gender=gender)
            
            qid & accept id: (25244751, 25246244) query: Python - compare columns in a text file, loop and pop lists soup:

            Assuming they are always adjacent, and using your example data:

            \n
            import csv\n\nwith open(fn, 'r') as fin:\n    reader=csv.reader(fin, skipinitialspace=True)\n    header=next(reader)\n    data={k:[] for k in header}\n    for row in reader:\n        row_di={k:v for k,v in zip(header, row)}\n        if (all(len(data[e]) for e in header) \n               and row_di['Third col']==data['Third col'][-1] \n               and row_di['Fourth col']==data['Fourth col'][-1]):\n            for e in header:\n                data[e].pop()\n        else:\n            for e in header:\n                data[e].append(row_di[e])\n\n>>> data\n{'Second col': ['Bryant', 'Bryant', 'Williams', 'Williams', 'Williams'], 'First col': ['Pat', 'Pat', 'Jim', 'Jim', 'Jim'], 'Fourth col': ['29th April', '9th May', '10th March', '17th March', '21st March'], 'Third col': ['ID2', 'ID2', 'ID3', 'ID3', 'ID3'], '...': ['...   ', '... ', '...  ', '...   ', '...']}\n
            \n

            Printing that in your format:

            \n
            unique_ids=set(data['Third col'])    \n\nwhile True:                        \n    try:    \n        print ', '.join([data[e].pop(0) for e in header])\n    except IndexError:\n        break     \nprint 'Unique IDs:', len(unique_ids)         \n
            \n

            Prints:

            \n
            Pat, Bryant, ID2, 29th April, ...   \nPat, Bryant, ID2, 9th May, ... \nJim, Williams, ID3, 10th March, ...  \nJim, Williams, ID3, 17th March, ...   \nJim, Williams, ID3, 21st March, ...\nUnique IDs: 2\n
            \n

            Notes:

            \n
              \n
            1. It is usually better to use the csv module for csv data;
            2. \n
            3. Use a set(iterable) to get the number of unique entries in the iterable;
            4. \n
            5. You may consider using a dict of deques rather than a dict of lists if you have very much data. Deques are a lot faster with pop that this implementation relies on.
            6. \n
            \n soup wrap:

            Assuming they are always adjacent, and using your example data:

            import csv
            
            with open(fn, 'r') as fin:
                reader=csv.reader(fin, skipinitialspace=True)
                header=next(reader)
                data={k:[] for k in header}
                for row in reader:
                    row_di={k:v for k,v in zip(header, row)}
                    if (all(len(data[e]) for e in header) 
                           and row_di['Third col']==data['Third col'][-1] 
                           and row_di['Fourth col']==data['Fourth col'][-1]):
                        for e in header:
                            data[e].pop()
                    else:
                        for e in header:
                            data[e].append(row_di[e])
            
            >>> data
            {'Second col': ['Bryant', 'Bryant', 'Williams', 'Williams', 'Williams'], 'First col': ['Pat', 'Pat', 'Jim', 'Jim', 'Jim'], 'Fourth col': ['29th April', '9th May', '10th March', '17th March', '21st March'], 'Third col': ['ID2', 'ID2', 'ID3', 'ID3', 'ID3'], '...': ['...   ', '... ', '...  ', '...   ', '...']}
            

            Printing that in your format:

            unique_ids=set(data['Third col'])    
            
            while True:                        
                try:    
                    print ', '.join([data[e].pop(0) for e in header])
                except IndexError:
                    break     
            print 'Unique IDs:', len(unique_ids)         
            

            Prints:

            Pat, Bryant, ID2, 29th April, ...   
            Pat, Bryant, ID2, 9th May, ... 
            Jim, Williams, ID3, 10th March, ...  
            Jim, Williams, ID3, 17th March, ...   
            Jim, Williams, ID3, 21st March, ...
            Unique IDs: 2
            

            Notes:

            1. It is usually better to use the csv module for csv data;
            2. Use a set(iterable) to get the number of unique entries in the iterable;
            3. You may consider using a dict of deques rather than a dict of lists if you have very much data. Deques are a lot faster with pop that this implementation relies on.
            qid & accept id: (25249663, 25250715) query: Python: Determine whether each step in path across n arrays falls below threshold value soup:

            I found a much simpler solution (maybe related to the one from JohnB; I'm not sure):

            \n
            import numpy as np\n\ndef isPath(A, threshold):\n    for i in range(len(A) - 1):\n        print "Finding edges from layer", i, "to", i + 1, "..."\n        diffs = np.array(A[i]).reshape((-1, 1)) - np.array(A[i + 1]).reshape((1, -1))\n        reached = np.any(np.abs(diffs) <= threshold, axis = 0)\n        A[i + 1] = [A[i + 1][j] for j in range(len(reached)) if reached[j]]\n        print "Reachable nodes of next layer:", A[i + 1]\n    return any(reached)\n\nprint isPath([[1, 3, 7], [10, 13], [13, 24]], 3)\nprint isPath([[1, 3, 7], [10, 13], [13, 24]], 10)\n
            \n

            Output:

            \n
            Finding edges from layer 0 to 1 ...\nReachable nodes of next layer: [10]\nFinding edges from layer 1 to 2 ...\nReachable nodes of next layer: [13]\nTrue\n\nFinding edges from layer 0 to 1 ...\nReachable nodes of next layer: [10, 13]\nFinding edges from layer 1 to 2 ...\nReachable nodes of next layer: [13]\nTrue\n
            \n

            It steps from one layer to another an checks, which nodes still can be reached given the predefined threshold. Unreachable nodes are removed from the array. When the loop continues, those nodes are not considered anymore.

            \n

            I guess it's pretty efficient and easy to implement.

            \n soup wrap:

            I found a much simpler solution (maybe related to the one from JohnB; I'm not sure):

            import numpy as np
            
            def isPath(A, threshold):
                for i in range(len(A) - 1):
                    print "Finding edges from layer", i, "to", i + 1, "..."
                    diffs = np.array(A[i]).reshape((-1, 1)) - np.array(A[i + 1]).reshape((1, -1))
                    reached = np.any(np.abs(diffs) <= threshold, axis = 0)
                    A[i + 1] = [A[i + 1][j] for j in range(len(reached)) if reached[j]]
                    print "Reachable nodes of next layer:", A[i + 1]
                return any(reached)
            
            print isPath([[1, 3, 7], [10, 13], [13, 24]], 3)
            print isPath([[1, 3, 7], [10, 13], [13, 24]], 10)
            

            Output:

            Finding edges from layer 0 to 1 ...
            Reachable nodes of next layer: [10]
            Finding edges from layer 1 to 2 ...
            Reachable nodes of next layer: [13]
            True
            
            Finding edges from layer 0 to 1 ...
            Reachable nodes of next layer: [10, 13]
            Finding edges from layer 1 to 2 ...
            Reachable nodes of next layer: [13]
            True
            

            It steps from one layer to another an checks, which nodes still can be reached given the predefined threshold. Unreachable nodes are removed from the array. When the loop continues, those nodes are not considered anymore.

            I guess it's pretty efficient and easy to implement.

            qid & accept id: (25255535, 25257039) query: How to get QTreeWidgetItem if its ItemWidget is known soup:

            LAST EDITED : 12 / 8 / 2014 9 : 12

            \n

            My solution, create own method to find QTreeWidgetItem by part argument QWidget, Like this (use recursive function);

            \n
            class customQTreeWidget (QtGui.QTreeWidget):\n    .\n    .\n    .\n    def findItemWidget (self, findQWidget, currentQTreeWidgetItem = None):\n        if currentQTreeWidgetItem == None:\n            currentQTreeWidgetItem = self.invisibleRootItem()\n        for index in range(self.topLevelItemCount()):\n            if findQWidget is self.itemWidget(currentQTreeWidgetItem, index):\n                return currentQTreeWidgetItem\n        for index in range(currentQTreeWidgetItem.childCount()):\n            foundQWidget = self.findItemWidget(findQWidget, currentQTreeWidgetItem.child(index))\n            if foundQWidget != None:\n                return foundQWidget\n
            \n

            Then, your want you call you can use this;

            \n
            foundQTreeWidgetItem  = self.findItemWidget(findQWidget) # Don't part argument currentQTreeWidgetItem use in recursive loop\n
            \n
            \n

            itemWidget method Reference : http://pyqt.sourceforge.net/Docs/PyQt4/qtreewidget.html#itemWidget

            \n
            \n

            Regards,

            \n soup wrap:

            LAST EDITED : 12 / 8 / 2014 9 : 12

            My solution, create own method to find QTreeWidgetItem by part argument QWidget, Like this (use recursive function);

            class customQTreeWidget (QtGui.QTreeWidget):
                .
                .
                .
                def findItemWidget (self, findQWidget, currentQTreeWidgetItem = None):
                    if currentQTreeWidgetItem == None:
                        currentQTreeWidgetItem = self.invisibleRootItem()
                    for index in range(self.topLevelItemCount()):
                        if findQWidget is self.itemWidget(currentQTreeWidgetItem, index):
                            return currentQTreeWidgetItem
                    for index in range(currentQTreeWidgetItem.childCount()):
                        foundQWidget = self.findItemWidget(findQWidget, currentQTreeWidgetItem.child(index))
                        if foundQWidget != None:
                            return foundQWidget
            

            Then, your want you call you can use this;

            foundQTreeWidgetItem  = self.findItemWidget(findQWidget) # Don't part argument currentQTreeWidgetItem use in recursive loop
            

            itemWidget method Reference : http://pyqt.sourceforge.net/Docs/PyQt4/qtreewidget.html#itemWidget


            Regards,

            qid & accept id: (25269476, 25269629) query: Python transition matrix soup:

            I don't know if there's a module, but I'd go with this code, which is easily generalizeable:

            \n
            import numpy as np\nfrom collections import Counter\na = [2, 1, 3, 1, 2, 3, 1, 2, 2, 2]\nb = np.zeros((3,3))\nfor (x,y), c in Counter(zip(a, a[1:])).iteritems():\n    b[x-1,y-1] = c\nprint b\narray([[ 0.,  2.,  1.],\n       [ 1.,  2.,  1.],\n       [ 2.,  0.,  0.]])\n
            \n

            With no numpy installed:

            \n
            b = [[0 for _ in xrange(3)] for _ in xrange(3)]\nfor (x,y), c in Counter(zip(a, a[1:])).iteritems():\n    b[x-1][y-1] = c\n\nprint b\n[[0, 2, 1], [1, 2, 1], [2, 0, 0]]\n
            \n

            A few details of what's going on, if needed:

            \n
              \n
            1. zip(a, a[1:]) gets all the pairs of consecutive numbers.
            2. \n
            3. Counter counts how many times each pair appears
            4. \n
            5. The for loop simple converts the dictionary Counter produces into the matrix / list of lists you requested
            6. \n
            \n soup wrap:

            I don't know if there's a module, but I'd go with this code, which is easily generalizeable:

            import numpy as np
            from collections import Counter
            a = [2, 1, 3, 1, 2, 3, 1, 2, 2, 2]
            b = np.zeros((3,3))
            for (x,y), c in Counter(zip(a, a[1:])).iteritems():
                b[x-1,y-1] = c
            print b
            array([[ 0.,  2.,  1.],
                   [ 1.,  2.,  1.],
                   [ 2.,  0.,  0.]])
            

            With no numpy installed:

            b = [[0 for _ in xrange(3)] for _ in xrange(3)]
            for (x,y), c in Counter(zip(a, a[1:])).iteritems():
                b[x-1][y-1] = c
            
            print b
            [[0, 2, 1], [1, 2, 1], [2, 0, 0]]
            

            A few details of what's going on, if needed:

            1. zip(a, a[1:]) gets all the pairs of consecutive numbers.
            2. Counter counts how many times each pair appears
            3. The for loop simple converts the dictionary Counter produces into the matrix / list of lists you requested
            qid & accept id: (25277092, 25277221) query: Extract elements of a 2d array with indices from another 2d array soup:

            This can be easily done if we index into the raveled data array:

            \n
            out = data.ravel()[ind.ravel() + np.repeat(range(0, 8*ind.shape[0], 8), ind.shape[1])].reshape(ind.shape)\n
            \n

            Explanation

            \n

            It might be easier to understand if it is broken down into three steps:

            \n
            indices = ind.ravel() + np.repeat(range(0, 8*ind.shape[0], 8), ind.shape[1])\nout = data.ravel()[indices]\nout = out.reshape(ind.shape)\n
            \n

            ind has the information on the elements from data that we want. Unfortunately, it is expressed in 2-D indices. The first line above converts these into indices of the 1-D raveled data. The second line above selects those elements out of the raveled array data. The third line restores the 2-D shape to out.\nThe 2-D indices represented by ind is converted to indindices has the indices

            \n soup wrap:

            This can be easily done if we index into the raveled data array:

            out = data.ravel()[ind.ravel() + np.repeat(range(0, 8*ind.shape[0], 8), ind.shape[1])].reshape(ind.shape)
            

            Explanation

            It might be easier to understand if it is broken down into three steps:

            indices = ind.ravel() + np.repeat(range(0, 8*ind.shape[0], 8), ind.shape[1])
            out = data.ravel()[indices]
            out = out.reshape(ind.shape)
            

            ind has the information on the elements from data that we want. Unfortunately, it is expressed in 2-D indices. The first line above converts these into indices of the 1-D raveled data. The second line above selects those elements out of the raveled array data. The third line restores the 2-D shape to out. The 2-D indices represented by ind is converted to indindices has the indices

            qid & accept id: (25286811, 25354417) query: How to plot a 3D density map in python with matplotlib soup:

            Thanks to mwaskon - for suggesting the mayavi library.

            \n

            I recreated the density scatter plot in mayavi as follows:

            \n
            import numpy as np\nfrom scipy import stats\nfrom mayavi import mlab\n\nmu, sigma = 0, 0.1 \nx = 10*np.random.normal(mu, sigma, 5000)\ny = 10*np.random.normal(mu, sigma, 5000)\nz = 10*np.random.normal(mu, sigma, 5000)\n\nxyz = np.vstack([x,y,z])\nkde = stats.gaussian_kde(xyz)\ndensity = kde(xyz)\n\n# Plot scatter with mayavi\nfigure = mlab.figure('DensityPlot')\npts = mlab.points3d(x, y, z, density, scale_mode='none', scale_factor=0.07)\nmlab.axes()\nmlab.show()\n
            \n

            Alt text

            \n

            Setting the scale_mode to 'none' prevents glyphs from being scaled in proportion to the density vector. In addition for large datasets, I disabled scene rendering and used a mask to reduce the number of points.

            \n
            # Plot scatter with mayavi\nfigure = mlab.figure('DensityPlot')\nfigure.scene.disable_render = True\n\npts = mlab.points3d(x, y, z, density, scale_mode='none', scale_factor=0.07) \nmask = pts.glyph.mask_points\nmask.maximum_number_of_points = x.size\nmask.on_ratio = 1\npts.glyph.mask_input_points = True\n\nfigure.scene.disable_render = False \nmlab.axes()\nmlab.show()\n
            \n

            Next, to evaluate the gaussian kde on a grid:

            \n
            import numpy as np\nfrom scipy import stats\nfrom mayavi import mlab\n\nmu, sigma = 0, 0.1 \nx = 10*np.random.normal(mu, sigma, 5000)\ny = 10*np.random.normal(mu, sigma, 5000)    \nz = 10*np.random.normal(mu, sigma, 5000)\n\nxyz = np.vstack([x,y,z])\nkde = stats.gaussian_kde(xyz)\n\n# Evaluate kde on a grid\nxmin, ymin, zmin = x.min(), y.min(), z.min()\nxmax, ymax, zmax = x.max(), y.max(), z.max()\nxi, yi, zi = np.mgrid[xmin:xmax:30j, ymin:ymax:30j, zmin:zmax:30j]\ncoords = np.vstack([item.ravel() for item in [xi, yi, zi]]) \ndensity = kde(coords).reshape(xi.shape)\n\n# Plot scatter with mayavi\nfigure = mlab.figure('DensityPlot')\n\ngrid = mlab.pipeline.scalar_field(xi, yi, zi, density)\nmin = density.min()\nmax=density.max()\nmlab.pipeline.volume(grid, vmin=min, vmax=min + .5*(max-min))\n\nmlab.axes()\nmlab.show()\n
            \n

            \n

            As a final improvement I sped up the evaluation of kensity density function by calling the kde function in parallel.

            \n
            import numpy as np\nfrom scipy import stats\nfrom mayavi import mlab\nimport multiprocessing\n\ndef calc_kde(data):\n    return kde(data.T)\n\nmu, sigma = 0, 0.1 \nx = 10*np.random.normal(mu, sigma, 5000)\ny = 10*np.random.normal(mu, sigma, 5000)\nz = 10*np.random.normal(mu, sigma, 5000)\n\nxyz = np.vstack([x,y,z])\nkde = stats.gaussian_kde(xyz)\n\n# Evaluate kde on a grid\nxmin, ymin, zmin = x.min(), y.min(), z.min()\nxmax, ymax, zmax = x.max(), y.max(), z.max()\nxi, yi, zi = np.mgrid[xmin:xmax:30j, ymin:ymax:30j, zmin:zmax:30j]\ncoords = np.vstack([item.ravel() for item in [xi, yi, zi]]) \n\n# Multiprocessing\ncores = multiprocessing.cpu_count()\npool = multiprocessing.Pool(processes=cores)\nresults = pool.map(calc_kde, np.array_split(coords.T, 2))\ndensity = np.concatenate(results).reshape(xi.shape)\n\n# Plot scatter with mayavi\nfigure = mlab.figure('DensityPlot')\n\ngrid = mlab.pipeline.scalar_field(xi, yi, zi, density)\nmin = density.min()\nmax=density.max()\nmlab.pipeline.volume(grid, vmin=min, vmax=min + .5*(max-min))\n\nmlab.axes()\nmlab.show()\n
            \n soup wrap:

            Thanks to mwaskon - for suggesting the mayavi library.

            I recreated the density scatter plot in mayavi as follows:

            import numpy as np
            from scipy import stats
            from mayavi import mlab
            
            mu, sigma = 0, 0.1 
            x = 10*np.random.normal(mu, sigma, 5000)
            y = 10*np.random.normal(mu, sigma, 5000)
            z = 10*np.random.normal(mu, sigma, 5000)
            
            xyz = np.vstack([x,y,z])
            kde = stats.gaussian_kde(xyz)
            density = kde(xyz)
            
            # Plot scatter with mayavi
            figure = mlab.figure('DensityPlot')
            pts = mlab.points3d(x, y, z, density, scale_mode='none', scale_factor=0.07)
            mlab.axes()
            mlab.show()
            

            Alt text

            Setting the scale_mode to 'none' prevents glyphs from being scaled in proportion to the density vector. In addition for large datasets, I disabled scene rendering and used a mask to reduce the number of points.

            # Plot scatter with mayavi
            figure = mlab.figure('DensityPlot')
            figure.scene.disable_render = True
            
            pts = mlab.points3d(x, y, z, density, scale_mode='none', scale_factor=0.07) 
            mask = pts.glyph.mask_points
            mask.maximum_number_of_points = x.size
            mask.on_ratio = 1
            pts.glyph.mask_input_points = True
            
            figure.scene.disable_render = False 
            mlab.axes()
            mlab.show()
            

            Next, to evaluate the gaussian kde on a grid:

            import numpy as np
            from scipy import stats
            from mayavi import mlab
            
            mu, sigma = 0, 0.1 
            x = 10*np.random.normal(mu, sigma, 5000)
            y = 10*np.random.normal(mu, sigma, 5000)    
            z = 10*np.random.normal(mu, sigma, 5000)
            
            xyz = np.vstack([x,y,z])
            kde = stats.gaussian_kde(xyz)
            
            # Evaluate kde on a grid
            xmin, ymin, zmin = x.min(), y.min(), z.min()
            xmax, ymax, zmax = x.max(), y.max(), z.max()
            xi, yi, zi = np.mgrid[xmin:xmax:30j, ymin:ymax:30j, zmin:zmax:30j]
            coords = np.vstack([item.ravel() for item in [xi, yi, zi]]) 
            density = kde(coords).reshape(xi.shape)
            
            # Plot scatter with mayavi
            figure = mlab.figure('DensityPlot')
            
            grid = mlab.pipeline.scalar_field(xi, yi, zi, density)
            min = density.min()
            max=density.max()
            mlab.pipeline.volume(grid, vmin=min, vmax=min + .5*(max-min))
            
            mlab.axes()
            mlab.show()
            

            As a final improvement I sped up the evaluation of kensity density function by calling the kde function in parallel.

            import numpy as np
            from scipy import stats
            from mayavi import mlab
            import multiprocessing
            
            def calc_kde(data):
                return kde(data.T)
            
            mu, sigma = 0, 0.1 
            x = 10*np.random.normal(mu, sigma, 5000)
            y = 10*np.random.normal(mu, sigma, 5000)
            z = 10*np.random.normal(mu, sigma, 5000)
            
            xyz = np.vstack([x,y,z])
            kde = stats.gaussian_kde(xyz)
            
            # Evaluate kde on a grid
            xmin, ymin, zmin = x.min(), y.min(), z.min()
            xmax, ymax, zmax = x.max(), y.max(), z.max()
            xi, yi, zi = np.mgrid[xmin:xmax:30j, ymin:ymax:30j, zmin:zmax:30j]
            coords = np.vstack([item.ravel() for item in [xi, yi, zi]]) 
            
            # Multiprocessing
            cores = multiprocessing.cpu_count()
            pool = multiprocessing.Pool(processes=cores)
            results = pool.map(calc_kde, np.array_split(coords.T, 2))
            density = np.concatenate(results).reshape(xi.shape)
            
            # Plot scatter with mayavi
            figure = mlab.figure('DensityPlot')
            
            grid = mlab.pipeline.scalar_field(xi, yi, zi, density)
            min = density.min()
            max=density.max()
            mlab.pipeline.volume(grid, vmin=min, vmax=min + .5*(max-min))
            
            mlab.axes()
            mlab.show()
            
            qid & accept id: (25296219, 25296299) query: Import .py files with punctuation before extension soup:

            It is possible, however, it is unorthodox and I would strongly recommend just to rename your modules instead.

            \n

            If you don't have any more dots in the filename then you can do something with importlib (example with a filename 4-1.py)

            \n
            import importlib\nmy_module = importlib.import_module('4-1')\n
            \n

            But note you have to assign the module object to a name which is a valid python identifier.

            \n

            importlib.import_module is quite normal to use for dynamic importing i.e. when you have the module name stored in a string variable at runtime. However, it would not really be a good reason to use it just to workaround having weird filenames.

            \n

            Now, if you have dots in the filename, the situation is more tricky because dots mean subpackage structure to python. Never the less it is still kinda possible with imp, \nso here is that gnarly trick:

            \n
            import imp\nmy_module = imp.load_source('my_module', 'strange.name-1.py')\n
            \n soup wrap:

            It is possible, however, it is unorthodox and I would strongly recommend just to rename your modules instead.

            If you don't have any more dots in the filename then you can do something with importlib (example with a filename 4-1.py)

            import importlib
            my_module = importlib.import_module('4-1')
            

            But note you have to assign the module object to a name which is a valid python identifier.

            importlib.import_module is quite normal to use for dynamic importing i.e. when you have the module name stored in a string variable at runtime. However, it would not really be a good reason to use it just to workaround having weird filenames.

            Now, if you have dots in the filename, the situation is more tricky because dots mean subpackage structure to python. Never the less it is still kinda possible with imp, so here is that gnarly trick:

            import imp
            my_module = imp.load_source('my_module', 'strange.name-1.py')
            
            qid & accept id: (25344576, 25345604) query: scale two matrices with scipy or sklearn soup:

            The Standard Scaler from scikit learn handles this, and corner cases, pretty well.

            \n
            from sklearn.preprocessing import StandardScaler\nscaler = StandardScaler()\nscaler.fit(X1)\noutput = scaler.transform(X2)\n
            \n

            If necessary, you can access the means and standard deviations of the feature columns using

            \n
            scaler.std_\nscaler.mean_\n
            \n

            You can also use the StandardScaler in a pipeline as preprocessing preceding an estimator.

            \n soup wrap:

            The Standard Scaler from scikit learn handles this, and corner cases, pretty well.

            from sklearn.preprocessing import StandardScaler
            scaler = StandardScaler()
            scaler.fit(X1)
            output = scaler.transform(X2)
            

            If necessary, you can access the means and standard deviations of the feature columns using

            scaler.std_
            scaler.mean_
            

            You can also use the StandardScaler in a pipeline as preprocessing preceding an estimator.

            qid & accept id: (25345513, 25345521) query: Is it possible to post audio files with the python requests library soup:

            Yes, it is possible to send any sequence of bytes with the library:

            \n
            with open(audiofile, 'rb') as fobj:\n    requests.post(url, files={'fieldname', fobj})\n
            \n

            In fact, the first multipart-encoded file example in the requests documentation posts a binary file:

            \n
            >>> url = 'http://httpbin.org/post'\n>>> files = {'file': open('report.xls', 'rb')}\n\n>>> r = requests.post(url, files=files)\n>>> r.text\n{\n  ...\n  "files": {\n    "file": ""\n  },\n  ...\n}\n
            \n soup wrap:

            Yes, it is possible to send any sequence of bytes with the library:

            with open(audiofile, 'rb') as fobj:
                requests.post(url, files={'fieldname', fobj})
            

            In fact, the first multipart-encoded file example in the requests documentation posts a binary file:

            >>> url = 'http://httpbin.org/post'
            >>> files = {'file': open('report.xls', 'rb')}
            
            >>> r = requests.post(url, files=files)
            >>> r.text
            {
              ...
              "files": {
                "file": ""
              },
              ...
            }
            
            qid & accept id: (25345843, 25346972) query: inequality comparison of numpy array with nan to a scalar soup:

            Any comparison (other than !=) of a NaN to a non-NaN value will always return False:

            \n
            >>> x < -1000\narray([False, False, False,  True, False, False], dtype=bool)\n
            \n

            So you can simply ignore the fact that there are NaNs already in your array and do:

            \n
            >>> x[x < -1000] = np.nan\n>>> x\narray([ nan,   1.,   2.,  nan,  nan,   5.])\n
            \n

            EDIT I don't see any warning when I ran the above, but if you really need to stay away from the NaNs, you can do something like:

            \n
            mask = ~np.isnan(x)\nmask[mask] &= x[mask] < -1000\nx[mask] = np.nan\n
            \n soup wrap:

            Any comparison (other than !=) of a NaN to a non-NaN value will always return False:

            >>> x < -1000
            array([False, False, False,  True, False, False], dtype=bool)
            

            So you can simply ignore the fact that there are NaNs already in your array and do:

            >>> x[x < -1000] = np.nan
            >>> x
            array([ nan,   1.,   2.,  nan,  nan,   5.])
            

            EDIT I don't see any warning when I ran the above, but if you really need to stay away from the NaNs, you can do something like:

            mask = ~np.isnan(x)
            mask[mask] &= x[mask] < -1000
            x[mask] = np.nan
            
            qid & accept id: (25411441, 25411566) query: Create list using regex inputs soup:

            Instead of adding to dirExclude, why not just check whether there's a match for r'decadal[0-9]{4}' in a dirname d?

            \n

            I'm thinking of something like this:

            \n
            import re\ndirExclude = set(['amip4K','amip4xCO2','aqua4K','aqua4xCO2'])\nexre = re.compile(r'decadal[0-9]{4}')\nfor (path,dirs,files) in os.walk(inpath,topdown=True):\n     dirs = [d for d in dirs if d not in dirExclude and not exre.search(d)]\n     # Do something\n
            \n

            Explanation:

            \n

            exre.search(d) will return None if there is no match for your regex inside d. not None will then evaluate to True. Otherwise, exre.search(d) will return a MatchObject and not exre.search(d) will evaluate to False.

            \n

            Compiling the regular expression is optional. Without compiling, you would issue

            \n
            exre = r'decadal[0-9]{4}'\n
            \n

            and

            \n
            dirs = [d for d in dirs if d not in dirExclude and not re.search(exre, d)]\n
            \n

            Compiling can be useful when you need to apply a regex a lot of times in order to do the compiling part only once. However, most of the time you won't notice a difference, as even if you don't compile the regex manually Python will cache the last used regexes. To be precise, the last one hundred regexes, though the only reference I got for this is the Regular Expression Cookbook by Jan Goyvaerts and Steven Levithan.

            \n soup wrap:

            Instead of adding to dirExclude, why not just check whether there's a match for r'decadal[0-9]{4}' in a dirname d?

            I'm thinking of something like this:

            import re
            dirExclude = set(['amip4K','amip4xCO2','aqua4K','aqua4xCO2'])
            exre = re.compile(r'decadal[0-9]{4}')
            for (path,dirs,files) in os.walk(inpath,topdown=True):
                 dirs = [d for d in dirs if d not in dirExclude and not exre.search(d)]
                 # Do something
            

            Explanation:

            exre.search(d) will return None if there is no match for your regex inside d. not None will then evaluate to True. Otherwise, exre.search(d) will return a MatchObject and not exre.search(d) will evaluate to False.

            Compiling the regular expression is optional. Without compiling, you would issue

            exre = r'decadal[0-9]{4}'
            

            and

            dirs = [d for d in dirs if d not in dirExclude and not re.search(exre, d)]
            

            Compiling can be useful when you need to apply a regex a lot of times in order to do the compiling part only once. However, most of the time you won't notice a difference, as even if you don't compile the regex manually Python will cache the last used regexes. To be precise, the last one hundred regexes, though the only reference I got for this is the Regular Expression Cookbook by Jan Goyvaerts and Steven Levithan.

            qid & accept id: (25435908, 25436195) query: Python: Getting all the items out of a `threading.local` soup:

            If you're using the pure-python version of threading.local (from _threading_local import local), this is possible:

            \n
            for t in threading.enumerate():\n    for item in t.__dict__:\n       if isinstance(item, tuple):  # Each thread's `local` state is kept in a tuple stored in its __dict__\n           print("Thread's local is %s" % t.__dict__[item])\n
            \n

            Here's an example of it in action:

            \n
            from _threading_local import local\nimport threading\nimport time\n\nl = local()\n\ndef f():\n   global l\n   l.ok = "HMM"\n   time.sleep(50)\n\nif __name__ == "__main__":\n    l.ok = 'hi'\n    t = threading.Thread(target=f)\n    t.start()\n    for t in threading.enumerate():\n        for item in t.__dict__:\n           if isinstance(item, tuple):\n               print("Thread's local is %s" % t.__dict__[item])\n
            \n

            Output:

            \n
            Thread's local is {'ok': 'hi'}\nThread's local is {'ok': 'HMM'}\n
            \n

            This is exploiting the fact that the pure-python implementation of local stores each thread's local state in the Thread object's __dict__, using a tuple object as the key:

            \n
            >>> threading.current_thread().__dict__\n{ ..., ('_local__key', 'thread.local.140466266257288'): {'ok': 'hi'}, ...}\n
            \n

            If you're using the implementation of local written in C (which is usually the case if you just use from threading import local), I'm not sure how/if you can do it.

            \n soup wrap:

            If you're using the pure-python version of threading.local (from _threading_local import local), this is possible:

            for t in threading.enumerate():
                for item in t.__dict__:
                   if isinstance(item, tuple):  # Each thread's `local` state is kept in a tuple stored in its __dict__
                       print("Thread's local is %s" % t.__dict__[item])
            

            Here's an example of it in action:

            from _threading_local import local
            import threading
            import time
            
            l = local()
            
            def f():
               global l
               l.ok = "HMM"
               time.sleep(50)
            
            if __name__ == "__main__":
                l.ok = 'hi'
                t = threading.Thread(target=f)
                t.start()
                for t in threading.enumerate():
                    for item in t.__dict__:
                       if isinstance(item, tuple):
                           print("Thread's local is %s" % t.__dict__[item])
            

            Output:

            Thread's local is {'ok': 'hi'}
            Thread's local is {'ok': 'HMM'}
            

            This is exploiting the fact that the pure-python implementation of local stores each thread's local state in the Thread object's __dict__, using a tuple object as the key:

            >>> threading.current_thread().__dict__
            { ..., ('_local__key', 'thread.local.140466266257288'): {'ok': 'hi'}, ...}
            

            If you're using the implementation of local written in C (which is usually the case if you just use from threading import local), I'm not sure how/if you can do it.

            qid & accept id: (25457718, 25458914) query: How to map word combinations in python soup:

            What you've described is already built into Python (unless you're somehow on a version before 2.6):

            \n
            >>> '{fw} {lw}'.format(fw='hello', lw='world')\n'hello world'\n
            \n

            or equivalently

            \n
            >>> inputs = {'fw': 'hello', 'lw': 'world'}\n>>> '{fw} {lw}'.format(**inputs)\n'hello world'\n
            \n

            (The ** here takes a dict and uses it to set a function's keyword arguments.) See the standard library documentation for more.

            \n

            To iterate over a number of formats, you can use a standard for loop, or to be slick a list comprehension:

            \n
            >>> format_strings = ['{fw}{lw}', '{fw} {lw}']\n>>> [format_string.format(**inputs) for format_string in format_strings]\n['helloworld', 'hello world']\n
            \n

            Update: upon rereading your question it sounds like you might prefer the positional version of the above, which looks like this:

            \n
            >>> '{0} {1}'.format('hello', 'world')\n'hello world'\n>>> inputs = ['hello', 'world']  # or 'hello world'.split()\n>>> '{0} {1}'.format(*inputs)\n'hello world'\n
            \n

            The * is a lot like the **: instead of using a dict to set keyword arguments, it is using a list (or tuple) to set positional arguments.

            \n soup wrap:

            What you've described is already built into Python (unless you're somehow on a version before 2.6):

            >>> '{fw} {lw}'.format(fw='hello', lw='world')
            'hello world'
            

            or equivalently

            >>> inputs = {'fw': 'hello', 'lw': 'world'}
            >>> '{fw} {lw}'.format(**inputs)
            'hello world'
            

            (The ** here takes a dict and uses it to set a function's keyword arguments.) See the standard library documentation for more.

            To iterate over a number of formats, you can use a standard for loop, or to be slick a list comprehension:

            >>> format_strings = ['{fw}{lw}', '{fw} {lw}']
            >>> [format_string.format(**inputs) for format_string in format_strings]
            ['helloworld', 'hello world']
            

            Update: upon rereading your question it sounds like you might prefer the positional version of the above, which looks like this:

            >>> '{0} {1}'.format('hello', 'world')
            'hello world'
            >>> inputs = ['hello', 'world']  # or 'hello world'.split()
            >>> '{0} {1}'.format(*inputs)
            'hello world'
            

            The * is a lot like the **: instead of using a dict to set keyword arguments, it is using a list (or tuple) to set positional arguments.

            qid & accept id: (25467360, 25467515) query: Pandas Dataframe - How To Convert Date to Boolean Columns? soup:

            You can use get_dummies to do the hard work. Something like

            \n
            target = pd.DataFrame(0, index=df.index, columns=range(1,13))\ndm = pd.get_dummies(df.index.month).set_index(df.index)\ntarget = (target + dm).fillna(0)\ntarget.columns = ['is'+x.capitalize() for x in pd.datetools.MONTHS]\npd.concat([df, target], axis=1)\n
            \n

            produces

            \n
                            temp  isJan  isFeb  isMar  isApr  isMay  isJun  isJul  isAug  \\n2011-01-01  0.419860      1      0      0      0      0      0      0      0   \n2011-03-22  0.479502      0      0      1      0      0      0      0      0   \n2011-06-10  0.687352      0      0      0      0      0      1      0      0   \n2011-08-29  0.377993      0      0      0      0      0      0      0      1   \n2011-11-17  0.877410      0      0      0      0      0      0      0      0   \n\n            isSep  isOct  isNov  isDec  \n2011-01-01      0      0      0      0  \n2011-03-22      0      0      0      0  \n2011-06-10      0      0      0      0  \n2011-08-29      0      0      0      0  \n2011-11-17      0      0      1      0  \n
            \n
            \n

            Some explanation follows.

            \n

            First, let's make a test frame:

            \n
            >>> index = pd.date_range("2011-01-01", periods=5, freq="80d")\n>>> df = pd.DataFrame({"temp": np.random.random(5)}, index=index)\n>>> df\n                temp\n2011-01-01  0.566277\n2011-03-22  0.965421\n2011-06-10  0.854030\n2011-08-29  0.780752\n2011-11-17  0.148783\n
            \n

            Now let's make something that has the right shape as what we want (we shouldn't assume that we'll necessarily see every month, after all; our test example only has 5 months with nonzero values):

            \n
            >>> target = pd.DataFrame(0, index=df.index, columns=range(1,13))\n>>> target\n            1   2   3   4   5   6   7   8   9   10  11  12\n2011-01-01   0   0   0   0   0   0   0   0   0   0   0   0\n2011-03-22   0   0   0   0   0   0   0   0   0   0   0   0\n2011-06-10   0   0   0   0   0   0   0   0   0   0   0   0\n2011-08-29   0   0   0   0   0   0   0   0   0   0   0   0\n2011-11-17   0   0   0   0   0   0   0   0   0   0   0   0\n
            \n

            get_dummies will generate an indicator matrix:

            \n
            >>> dm = pd.get_dummies(df.index.month).set_index(df.index)\n>>> dm\n            1   3   6   8   11\n2011-01-01   1   0   0   0   0\n2011-03-22   0   1   0   0   0\n2011-06-10   0   0   1   0   0\n2011-08-29   0   0   0   1   0\n2011-11-17   0   0   0   0   1\n
            \n

            (And now you can see why we wanted to have the missing columns somewhere.) We can add these two together:

            \n
            >>> target = (target + dm).fillna(0)\n>>> target\n            1   2   3   4   5   6   7   8   9   10  11  12\n2011-01-01   1   0   0   0   0   0   0   0   0   0   0   0\n2011-03-22   0   0   1   0   0   0   0   0   0   0   0   0\n2011-06-10   0   0   0   0   0   1   0   0   0   0   0   0\n2011-08-29   0   0   0   0   0   0   0   1   0   0   0   0\n2011-11-17   0   0   0   0   0   0   0   0   0   0   1   0\n
            \n

            And we're all done except for making it look pretty. There are lots of ways to get month names; let's choose one at random:

            \n
            >>> pd.datetools.MONTHS\n['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']\n>>> target.columns = ['is'+x.capitalize() for x in pd.datetools.MONTHS]\n
            \n

            And now the columns are named as you wanted. All that remains is to combine everything:

            \n
            >>> pd.concat([df, target], axis=1)\n                temp  isJan  isFeb  isMar  isApr  isMay  isJun  isJul  isAug  \\n2011-01-01  0.566277      1      0      0      0      0      0      0      0   \n2011-03-22  0.965421      0      0      1      0      0      0      0      0   \n2011-06-10  0.854030      0      0      0      0      0      1      0      0   \n2011-08-29  0.780752      0      0      0      0      0      0      0      1   \n2011-11-17  0.148783      0      0      0      0      0      0      0      0   \n\n            isSep  isOct  isNov  isDec  \n2011-01-01      0      0      0      0  \n2011-03-22      0      0      0      0  \n2011-06-10      0      0      0      0  \n2011-08-29      0      0      0      0  \n2011-11-17      0      0      1      0  \n
            \n soup wrap:

            You can use get_dummies to do the hard work. Something like

            target = pd.DataFrame(0, index=df.index, columns=range(1,13))
            dm = pd.get_dummies(df.index.month).set_index(df.index)
            target = (target + dm).fillna(0)
            target.columns = ['is'+x.capitalize() for x in pd.datetools.MONTHS]
            pd.concat([df, target], axis=1)
            

            produces

                            temp  isJan  isFeb  isMar  isApr  isMay  isJun  isJul  isAug  \
            2011-01-01  0.419860      1      0      0      0      0      0      0      0   
            2011-03-22  0.479502      0      0      1      0      0      0      0      0   
            2011-06-10  0.687352      0      0      0      0      0      1      0      0   
            2011-08-29  0.377993      0      0      0      0      0      0      0      1   
            2011-11-17  0.877410      0      0      0      0      0      0      0      0   
            
                        isSep  isOct  isNov  isDec  
            2011-01-01      0      0      0      0  
            2011-03-22      0      0      0      0  
            2011-06-10      0      0      0      0  
            2011-08-29      0      0      0      0  
            2011-11-17      0      0      1      0  
            

            Some explanation follows.

            First, let's make a test frame:

            >>> index = pd.date_range("2011-01-01", periods=5, freq="80d")
            >>> df = pd.DataFrame({"temp": np.random.random(5)}, index=index)
            >>> df
                            temp
            2011-01-01  0.566277
            2011-03-22  0.965421
            2011-06-10  0.854030
            2011-08-29  0.780752
            2011-11-17  0.148783
            

            Now let's make something that has the right shape as what we want (we shouldn't assume that we'll necessarily see every month, after all; our test example only has 5 months with nonzero values):

            >>> target = pd.DataFrame(0, index=df.index, columns=range(1,13))
            >>> target
                        1   2   3   4   5   6   7   8   9   10  11  12
            2011-01-01   0   0   0   0   0   0   0   0   0   0   0   0
            2011-03-22   0   0   0   0   0   0   0   0   0   0   0   0
            2011-06-10   0   0   0   0   0   0   0   0   0   0   0   0
            2011-08-29   0   0   0   0   0   0   0   0   0   0   0   0
            2011-11-17   0   0   0   0   0   0   0   0   0   0   0   0
            

            get_dummies will generate an indicator matrix:

            >>> dm = pd.get_dummies(df.index.month).set_index(df.index)
            >>> dm
                        1   3   6   8   11
            2011-01-01   1   0   0   0   0
            2011-03-22   0   1   0   0   0
            2011-06-10   0   0   1   0   0
            2011-08-29   0   0   0   1   0
            2011-11-17   0   0   0   0   1
            

            (And now you can see why we wanted to have the missing columns somewhere.) We can add these two together:

            >>> target = (target + dm).fillna(0)
            >>> target
                        1   2   3   4   5   6   7   8   9   10  11  12
            2011-01-01   1   0   0   0   0   0   0   0   0   0   0   0
            2011-03-22   0   0   1   0   0   0   0   0   0   0   0   0
            2011-06-10   0   0   0   0   0   1   0   0   0   0   0   0
            2011-08-29   0   0   0   0   0   0   0   1   0   0   0   0
            2011-11-17   0   0   0   0   0   0   0   0   0   0   1   0
            

            And we're all done except for making it look pretty. There are lots of ways to get month names; let's choose one at random:

            >>> pd.datetools.MONTHS
            ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']
            >>> target.columns = ['is'+x.capitalize() for x in pd.datetools.MONTHS]
            

            And now the columns are named as you wanted. All that remains is to combine everything:

            >>> pd.concat([df, target], axis=1)
                            temp  isJan  isFeb  isMar  isApr  isMay  isJun  isJul  isAug  \
            2011-01-01  0.566277      1      0      0      0      0      0      0      0   
            2011-03-22  0.965421      0      0      1      0      0      0      0      0   
            2011-06-10  0.854030      0      0      0      0      0      1      0      0   
            2011-08-29  0.780752      0      0      0      0      0      0      0      1   
            2011-11-17  0.148783      0      0      0      0      0      0      0      0   
            
                        isSep  isOct  isNov  isDec  
            2011-01-01      0      0      0      0  
            2011-03-22      0      0      0      0  
            2011-06-10      0      0      0      0  
            2011-08-29      0      0      0      0  
            2011-11-17      0      0      1      0  
            
            qid & accept id: (25469326, 25469680) query: Delete "usr/lib/python2.7" byMistake, how to fix it? soup:

            First, be careful - even restrictive - about what you ever run as root. A normal user could not modify things under /usr/lib, and for good reason - it breaks the system.

            \n

            Second, you can find out what packages contain things in that directory using:

            \n
            $ dpkg -S /usr/lib/python2.7\npython-qgis, python-gdal, python-psycopg2, python-pyspatialite, youtube-dl, virtualbox, duplicity, bzr-git, bzr-builddeb, debconf, ipython, libpython2.7-minimal:i386, libpython2.7-dev:i386, tahoe-lafs, seascope, samba, qbzr, python2.7, python-zope.interface, python-zfec, python-yaml, python-xdg, python-xapian, python-wxversion, python-wxgtk2.8, python-ws4py, python-webob, python-wadllib, python-vipscc, python-utidylib, python-usb, python-urllib3, python-tz, python-twisted, python-twisted-words, python-twisted-web, python-twisted-runner, python-twisted-news, python-twisted-names, python-twisted-mail, python-twisted-lore, python-twisted-core, python-twisted-conch, python-twisted-bin, python-tk, python-tdb, python-talloc, python-support, python-subversion, python-sphinx, python-software-properties, python-six, python-sip, python-simplejson, python-simplegeneric, python-setuptools, python-setools, python-serial, python-sepolicy, python-sepolgen, python-semanage, python-selinux, python-secretstorage, python-scipy, python-samba, python-routes, python-roman, python-requests, python-repoze.lru, python-reportlab, python-reportlab-accel, python-renderpm, python-radare2, python-qt4, python-qt4-gl, python-qscintilla2, python-pyvorbis, python-pytools, python-pysqlite2, python-pyside.qtxml, python-pyside.qtwebkit, python-pyside.qtuitools, python-pyside.qttest, python-pyside.qtsvg, python-pyside.qtsql, python-pyside.qtscript, python-pyside.qtopengl, python-pyside.qtnetwork, python-pyside.qthelp, python-pyside.qtgui, python-pyside.qtdeclarative, python-pyside.qtcore, python-pyside.phonon, python-pyparsing, python-pyopencl, python-pygments, python-pygame, python-pycurl, python-pycryptopp, python-pyaudio, python-pyasn1, python-poppler-qt4, python-ply, python-pkg-resources, python-pivy, python-pip, python-pil, python-pexpect, python-paramiko, python-pam, python-openssl, python-opengl, python-opencv, python-ogg, python-oauthlib, python-oauth, python-numpy, python-ntdb, python-newt, python-nevow, python-networkx, python-netifaces, python-mysqldb, python-musicbrainz, python-mock, python-mechanize, python-markupsafe, python-markdown, python-mako, python-magic, python-lxml, python-libxml2, python-ldb, python-lazr.uri, python-lazr.restfulclient, python-launchpadlib, python-keyring, python-jinja2, python-ipy, python-imaging, python-httplib2, python-html5lib, python-gtk2, python-gst0.10, python-gst0.10-rtsp, python-gpgme, python-gobject-2, python-glade2, python-gi, python-freenect, python-foolscap, python-feedparser, python-fastimport, python-eyed3, python-enchant, python-egenix-mxtools, python-egenix-mxdatetime, python-ecdsa, python-dulwich, python-docutils, python-docopt, python-dnspython, python-distro-info, python-distlib, python-decorator, python-debian, python-dbus, python-dateutil, python-cssutils, python-cssselect, python-crypto, python-configobj, python-colorama, python-collada, python-cherrypy3, python-chardet, python-bzrlib, python-bluez, python-beautifulsoup, python-audit, python-apt, python-apsw, policycoreutils, mercurial, mercurial-common, lsb-release, iotop, hugin-tools, hplip, frescobaldi, libpython2.7:i386, libpython2.7-stdlib:i386, dblatex, cython, cfv, bzr-upload, bzr-search, bzr-pipeline, bzr-loom, bzr-explorer: /usr/lib/python2.7\n
            \n

            (Yes, the list is very long.) Knowing that list, we can request those packages to be reinstalled:

            \n
            $ sudo apt-get install --reinstall `dpkg -S /usr/lib/python2.7 | sed -e s/,//g -e 's/: .*$//'`\n
            \n

            I apologise for the very long command line; the sed command here cleans up the output of dpkg to produce only the list of packages we want to reinstall. This method is likely to help with the specific issue you mention, but even having it occur once suggests you're not clear on the consqeuences of other changes. You may want to slow down and learn more about your system's structure.

            \n

            Things like PYTHON* variables won't help you much unless you have a precisely matching version of Python elsewhere, something we tend to avoid on Linux distributions because we usually have working (albeit limited) package management.

            \n

            Lastly, I think the question ends up more of a superuser question than stack overflow.

            \n soup wrap:

            First, be careful - even restrictive - about what you ever run as root. A normal user could not modify things under /usr/lib, and for good reason - it breaks the system.

            Second, you can find out what packages contain things in that directory using:

            $ dpkg -S /usr/lib/python2.7
            python-qgis, python-gdal, python-psycopg2, python-pyspatialite, youtube-dl, virtualbox, duplicity, bzr-git, bzr-builddeb, debconf, ipython, libpython2.7-minimal:i386, libpython2.7-dev:i386, tahoe-lafs, seascope, samba, qbzr, python2.7, python-zope.interface, python-zfec, python-yaml, python-xdg, python-xapian, python-wxversion, python-wxgtk2.8, python-ws4py, python-webob, python-wadllib, python-vipscc, python-utidylib, python-usb, python-urllib3, python-tz, python-twisted, python-twisted-words, python-twisted-web, python-twisted-runner, python-twisted-news, python-twisted-names, python-twisted-mail, python-twisted-lore, python-twisted-core, python-twisted-conch, python-twisted-bin, python-tk, python-tdb, python-talloc, python-support, python-subversion, python-sphinx, python-software-properties, python-six, python-sip, python-simplejson, python-simplegeneric, python-setuptools, python-setools, python-serial, python-sepolicy, python-sepolgen, python-semanage, python-selinux, python-secretstorage, python-scipy, python-samba, python-routes, python-roman, python-requests, python-repoze.lru, python-reportlab, python-reportlab-accel, python-renderpm, python-radare2, python-qt4, python-qt4-gl, python-qscintilla2, python-pyvorbis, python-pytools, python-pysqlite2, python-pyside.qtxml, python-pyside.qtwebkit, python-pyside.qtuitools, python-pyside.qttest, python-pyside.qtsvg, python-pyside.qtsql, python-pyside.qtscript, python-pyside.qtopengl, python-pyside.qtnetwork, python-pyside.qthelp, python-pyside.qtgui, python-pyside.qtdeclarative, python-pyside.qtcore, python-pyside.phonon, python-pyparsing, python-pyopencl, python-pygments, python-pygame, python-pycurl, python-pycryptopp, python-pyaudio, python-pyasn1, python-poppler-qt4, python-ply, python-pkg-resources, python-pivy, python-pip, python-pil, python-pexpect, python-paramiko, python-pam, python-openssl, python-opengl, python-opencv, python-ogg, python-oauthlib, python-oauth, python-numpy, python-ntdb, python-newt, python-nevow, python-networkx, python-netifaces, python-mysqldb, python-musicbrainz, python-mock, python-mechanize, python-markupsafe, python-markdown, python-mako, python-magic, python-lxml, python-libxml2, python-ldb, python-lazr.uri, python-lazr.restfulclient, python-launchpadlib, python-keyring, python-jinja2, python-ipy, python-imaging, python-httplib2, python-html5lib, python-gtk2, python-gst0.10, python-gst0.10-rtsp, python-gpgme, python-gobject-2, python-glade2, python-gi, python-freenect, python-foolscap, python-feedparser, python-fastimport, python-eyed3, python-enchant, python-egenix-mxtools, python-egenix-mxdatetime, python-ecdsa, python-dulwich, python-docutils, python-docopt, python-dnspython, python-distro-info, python-distlib, python-decorator, python-debian, python-dbus, python-dateutil, python-cssutils, python-cssselect, python-crypto, python-configobj, python-colorama, python-collada, python-cherrypy3, python-chardet, python-bzrlib, python-bluez, python-beautifulsoup, python-audit, python-apt, python-apsw, policycoreutils, mercurial, mercurial-common, lsb-release, iotop, hugin-tools, hplip, frescobaldi, libpython2.7:i386, libpython2.7-stdlib:i386, dblatex, cython, cfv, bzr-upload, bzr-search, bzr-pipeline, bzr-loom, bzr-explorer: /usr/lib/python2.7
            

            (Yes, the list is very long.) Knowing that list, we can request those packages to be reinstalled:

            $ sudo apt-get install --reinstall `dpkg -S /usr/lib/python2.7 | sed -e s/,//g -e 's/: .*$//'`
            

            I apologise for the very long command line; the sed command here cleans up the output of dpkg to produce only the list of packages we want to reinstall. This method is likely to help with the specific issue you mention, but even having it occur once suggests you're not clear on the consqeuences of other changes. You may want to slow down and learn more about your system's structure.

            Things like PYTHON* variables won't help you much unless you have a precisely matching version of Python elsewhere, something we tend to avoid on Linux distributions because we usually have working (albeit limited) package management.

            Lastly, I think the question ends up more of a superuser question than stack overflow.

            qid & accept id: (25480433, 25480666) query: How to consistently ignore one byte from a string soup:

            Usually you'd use a filtered version of the object, for example:

            \n
            In [63]: test\nOut[63]: 'hello\x00world'\nIn [68]: for my_bytes in filter(lambda x: x != b'\x00', test):\n   ....:     print(my_bytes)\n   ....:\nh\ne\nl\nl\no\nw\no\nr\nl\nd\n
            \n

            Note I used my_bytes instead of bytes, which is a built-in name you'd rather not overwrite.

            \n

            Similar you can also simply construct a filtered bytes object for further processing:

            \n
            In [62]: test = b'hello\x00world'\nIn [63]: test\nOut[63]: 'hello\x00world'\nIn [64]: test_without_nulls = bytes(filter(lambda x: x != b'\x00', test))\nIn [65]: test_without_nulls\nOut[65]: 'helloworld'\n
            \n

            I usually use bytes objects as it does not share the interface with strings in python 3. Certainly not byte arrays.

            \n soup wrap:

            Usually you'd use a filtered version of the object, for example:

            In [63]: test
            Out[63]: 'hello\x00world'
            In [68]: for my_bytes in filter(lambda x: x != b'\x00', test):
               ....:     print(my_bytes)
               ....:
            h
            e
            l
            l
            o
            w
            o
            r
            l
            d
            

            Note I used my_bytes instead of bytes, which is a built-in name you'd rather not overwrite.

            Similar you can also simply construct a filtered bytes object for further processing:

            In [62]: test = b'hello\x00world'
            In [63]: test
            Out[63]: 'hello\x00world'
            In [64]: test_without_nulls = bytes(filter(lambda x: x != b'\x00', test))
            In [65]: test_without_nulls
            Out[65]: 'helloworld'
            

            I usually use bytes objects as it does not share the interface with strings in python 3. Certainly not byte arrays.

            qid & accept id: (25493559, 25493762) query: How do I sort a complex dictionary by a key, which resides deep the dictionary? soup:

            Honestly, your starting data structure is very poorly designed. It may work on a small scale, but as you progress and your 'dictionary' of items gets larger, well you can already see it gets over whelming quickly. Try instead to make a class, and create different instances of your structure. For example:

            \n
            class Employees():\n    def __init__(self):\n        self.surname = ''\n        self.salary = 0\n        self.car_man = []  # if in case you want to add more than one car use a list type or just use string if you plan on keeping this a single value\n
            \n

            From here you can create instances, and you would be able to keep track of them much much easier. You can even add these individual instances to a dictionary itself and you can sort them.

            \n

            EX:

            \n
             Mark = Employees()\n Mark.surname = 'Johnson'\n Mark.salary = 5\n Mark.car_man = 'Volvo'\n\n John = Employees()\n John.surname = "Doe"\n John.salary = 10\n John.car_man = Daewoo\n
            \n

            Do these for as many of them as you want then you could add these instances to a dictionary and be able to sort them much easier.

            \n

            Adding them to a dictionary is as simple as:

            \n
            my_dict = {}\nmy_dict[#key] = # your instance\n
            \n soup wrap:

            Honestly, your starting data structure is very poorly designed. It may work on a small scale, but as you progress and your 'dictionary' of items gets larger, well you can already see it gets over whelming quickly. Try instead to make a class, and create different instances of your structure. For example:

            class Employees():
                def __init__(self):
                    self.surname = ''
                    self.salary = 0
                    self.car_man = []  # if in case you want to add more than one car use a list type or just use string if you plan on keeping this a single value
            

            From here you can create instances, and you would be able to keep track of them much much easier. You can even add these individual instances to a dictionary itself and you can sort them.

            EX:

             Mark = Employees()
             Mark.surname = 'Johnson'
             Mark.salary = 5
             Mark.car_man = 'Volvo'
            
             John = Employees()
             John.surname = "Doe"
             John.salary = 10
             John.car_man = Daewoo
            

            Do these for as many of them as you want then you could add these instances to a dictionary and be able to sort them much easier.

            Adding them to a dictionary is as simple as:

            my_dict = {}
            my_dict[#key] = # your instance
            
            qid & accept id: (25513929, 25514003) query: Returning elements from a loop, one at a time soup:

            An efficient way of doing this is to use yield:

            \n
            def grabber3(datafile):\n    with open(datafile, 'rb') as f:\n        r =csv.DictReader(f)\n        for line in r:\n            del line['thisthing']\n            yield line\n
            \n

            And then in the code that calls this function, you can do:

            \n
            dict_generator = grabber3(a_file)\n
            \n

            And then iterate through this dict_generator as:

            \n
            for a_dict in dict_generator:\n    print a_dict\n
            \n

            More on yield and generators here:

            \n
              \n
            1. https://wiki.python.org/moin/Generators
            2. \n
            3. What does the "yield" keyword do in Python?
            4. \n
            \n soup wrap:

            An efficient way of doing this is to use yield:

            def grabber3(datafile):
                with open(datafile, 'rb') as f:
                    r =csv.DictReader(f)
                    for line in r:
                        del line['thisthing']
                        yield line
            

            And then in the code that calls this function, you can do:

            dict_generator = grabber3(a_file)
            

            And then iterate through this dict_generator as:

            for a_dict in dict_generator:
                print a_dict
            

            More on yield and generators here:

            1. https://wiki.python.org/moin/Generators
            2. What does the "yield" keyword do in Python?
            qid & accept id: (25537262, 25569261) query: Set global constant cross all the view soup:

            Eventually, I took the way of Middleware. I wrote a custom middleware and set a variable in the middleware, something like,

            \n
            CONSTANT_NAME = None\n
            \n

            It is global.\nAnd a local thread:

            \n
            _thread_local = threading.local()\n
            \n

            which is also global.

            \n

            Then I have two methods in the middleware,

            \n
            def get_constant_value()\n    return getattr(_thread_local, 'CONSTANT_NAME', None)\n\ndef set_constant_value(value):\n    CONSTANT_NAME = value\n
            \n

            which can be called from any views.

            \n

            Then inside my middleware, I have

            \n
            def process_request(self, request):\n    _thread_local.CONSTANT_NAME = CONSTANT_NAME\n
            \n

            At this point, I call set and get this server-crossed variable from any view I want.

            \n

            The solution is not perfect (I believe). If anyone got a better idea, let me know please!

            \n

            Thanks!

            \n soup wrap:

            Eventually, I took the way of Middleware. I wrote a custom middleware and set a variable in the middleware, something like,

            CONSTANT_NAME = None
            

            It is global. And a local thread:

            _thread_local = threading.local()
            

            which is also global.

            Then I have two methods in the middleware,

            def get_constant_value()
                return getattr(_thread_local, 'CONSTANT_NAME', None)
            
            def set_constant_value(value):
                CONSTANT_NAME = value
            

            which can be called from any views.

            Then inside my middleware, I have

            def process_request(self, request):
                _thread_local.CONSTANT_NAME = CONSTANT_NAME
            

            At this point, I call set and get this server-crossed variable from any view I want.

            The solution is not perfect (I believe). If anyone got a better idea, let me know please!

            Thanks!

            qid & accept id: (25538578, 25538637) query: Python: How can I print out an object as a regular dictionary without reference? soup:

            How a class gets printed is determined by it's __str__ and __repr__ methods, so you can add these to Object1. Note that you should only do this if you're sure that you want the value of Object1 to be represented by it's d attribute:

            \n
            class Object1:\n    d = 1\n\n    def __str__(self):\n        return str(self.d)\n\n    def __repr__(self):\n        return str(self.d)\n
            \n

            Output:

            \n
            print b.__dict__\n{'a': 2, 'o': 2}\n
            \n soup wrap:

            How a class gets printed is determined by it's __str__ and __repr__ methods, so you can add these to Object1. Note that you should only do this if you're sure that you want the value of Object1 to be represented by it's d attribute:

            class Object1:
                d = 1
            
                def __str__(self):
                    return str(self.d)
            
                def __repr__(self):
                    return str(self.d)
            

            Output:

            print b.__dict__
            {'a': 2, 'o': 2}
            
            qid & accept id: (25541651, 25541782) query: How to print formatted python output for javascript? soup:

            Newline characters (\n) are not translated to new lines when rendered as HTML. You can use a

             tag (preformatted) to allow them to have meaning when being rendered.  

            \n
            ...\nprint """\n\n\n
            \n
            \n"""\nprint nova.servers.get_console_output(VMID)\n\nprint """\n
            \n\n"""\n
            \n

            Or you could replace newline characters with a
            , like so:

            \n
            print nova.servers.get_console_output(VMID).replace("\n", "
            ")\n
            \n

            Either one should do what you want.

            \n soup wrap:

            Newline characters (\n) are not translated to new lines when rendered as HTML. You can use a

             tag (preformatted) to allow them to have meaning when being rendered.  

            ...
            print """
            
            
            
            """
            print nova.servers.get_console_output(VMID)
            
            print """
            
            """

            Or you could replace newline characters with a
            , like so:

            print nova.servers.get_console_output(VMID).replace("\n", "
            ")

            Either one should do what you want.

            qid & accept id: (25641980, 25650210) query: How to isolate group nodes in maya with python soup:

            As you alluded to, "group" nodes really are just transform nodes, with no real distinction.

            \n

            The clearest distinction I can think of however would be that its children must be comprised entirely of other transform nodes. Parenting a shape node under a "group" will no longer be considered a "group"

            \n
            \n

            First, your selection of transform nodes. I assume you already have something along these lines:

            \n
            selection = pymel.core.ls(selection=True, transforms=True)\n
            \n

            Next, a function to check if a given transform is itself a "group".

            \n

            Iterate over all the children of a given node, returning False if any of them aren't transform. Otherwise return True.

            \n
            def is_group(node):\n    children = node.getChildren()\n    for child in children:\n        if type(child) is not pymel.core.nodetypes.Transform:\n            return False\n    return True\n
            \n

            Now you just need to filter the selection, in one of the following two ways, depending on which style you find most clear:

            \n
            selection = filter(is_group, selection)\n
            \n

            or

            \n
            selection = [node for node in selection if is_group(node)]\n
            \n soup wrap:

            As you alluded to, "group" nodes really are just transform nodes, with no real distinction.

            The clearest distinction I can think of however would be that its children must be comprised entirely of other transform nodes. Parenting a shape node under a "group" will no longer be considered a "group"


            First, your selection of transform nodes. I assume you already have something along these lines:

            selection = pymel.core.ls(selection=True, transforms=True)
            

            Next, a function to check if a given transform is itself a "group".

            Iterate over all the children of a given node, returning False if any of them aren't transform. Otherwise return True.

            def is_group(node):
                children = node.getChildren()
                for child in children:
                    if type(child) is not pymel.core.nodetypes.Transform:
                        return False
                return True
            

            Now you just need to filter the selection, in one of the following two ways, depending on which style you find most clear:

            selection = filter(is_group, selection)
            

            or

            selection = [node for node in selection if is_group(node)]
            
            qid & accept id: (25649412, 25650745) query: Exponential Decay on Python Pandas DataFrame soup:

            You can use the fact that when exponentials multiply their exponents add:

            \n

            eg:

            \n
            N(2) = N(2) + N(1) * exp(-0.05)\nN(3) = N(3) + (N(2) + N(1) * exp(-0.05))*exp(-0.05)\nN(3) = N(3) + N(2)*exp(-0.05) + N(1)*exp(-0.1)\nN(4) = ...and so on\n
            \n

            This can then be vectorized using numpy:

            \n
            dataset = pd.DataFrame(np.random.rand(1000,3), columns=["A", "B","C"])\n\nweightspace = np.exp(np.linspace(len(dataset), 0, num=len(dataset))*-0.05)\ndef rollingsum(array):\n    weights = weightspace[0-len(array):]\n    # Convolve the array and the weights to obtain the result\n    a = np.dot(array, weights).sum()\n    return a\n\n\na = pd.expanding_apply(dataset, rollingsum)\n
            \n

            pd.expanding_apply applies the rollingsum function backwards to each row, calling it len(dataset) times. np.linspace generates a dataset of size len(dataset) and calculates how many times each row is multiplied by exp(-0.05) for the current row.

            \n

            Because it is vectorized, it should be fast:

            \n
            %timeit a = pd.expanding_apply(dataset, rollingsum)\n10 loops, best of 3: 25.5 ms per loop\n
            \n

            This compares with (note I'm using python 3 and had to make a change to the behaviour on the first row...):

            \n
            def multipleApply(df):\n    for j, val in df.iteritems():\n        for i, row in enumerate(val):\n            if i == 0:\n                continue\n            df[j].iloc[i] = row + val[i-1]*np.exp(-0.05)\n
            \n

            This comes out as:

            \n
            In[68]: %timeit multipleApply(dataset)\n1 loops, best of 3: 414 ms per loop\n
            \n soup wrap:

            You can use the fact that when exponentials multiply their exponents add:

            eg:

            N(2) = N(2) + N(1) * exp(-0.05)
            N(3) = N(3) + (N(2) + N(1) * exp(-0.05))*exp(-0.05)
            N(3) = N(3) + N(2)*exp(-0.05) + N(1)*exp(-0.1)
            N(4) = ...and so on
            

            This can then be vectorized using numpy:

            dataset = pd.DataFrame(np.random.rand(1000,3), columns=["A", "B","C"])
            
            weightspace = np.exp(np.linspace(len(dataset), 0, num=len(dataset))*-0.05)
            def rollingsum(array):
                weights = weightspace[0-len(array):]
                # Convolve the array and the weights to obtain the result
                a = np.dot(array, weights).sum()
                return a
            
            
            a = pd.expanding_apply(dataset, rollingsum)
            

            pd.expanding_apply applies the rollingsum function backwards to each row, calling it len(dataset) times. np.linspace generates a dataset of size len(dataset) and calculates how many times each row is multiplied by exp(-0.05) for the current row.

            Because it is vectorized, it should be fast:

            %timeit a = pd.expanding_apply(dataset, rollingsum)
            10 loops, best of 3: 25.5 ms per loop
            

            This compares with (note I'm using python 3 and had to make a change to the behaviour on the first row...):

            def multipleApply(df):
                for j, val in df.iteritems():
                    for i, row in enumerate(val):
                        if i == 0:
                            continue
                        df[j].iloc[i] = row + val[i-1]*np.exp(-0.05)
            

            This comes out as:

            In[68]: %timeit multipleApply(dataset)
            1 loops, best of 3: 414 ms per loop
            
            qid & accept id: (25664682, 25666134) query: How to find cluster sizes in 2D numpy array? soup:

            it seems like a percolation problem.\nThe following link has your answer if you have scipy installed.

            \n

            http://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/

            \n
            from pylab import *\nfrom scipy.ndimage import measurements\n\nz2 = array([[0,0,0,0,0,0,0,0,0,0],\n    [0,0,1,0,0,0,0,0,0,0],\n    [0,0,1,0,1,0,0,0,1,0],\n    [0,0,0,0,0,0,1,0,1,0],\n    [0,0,0,0,0,0,1,0,0,0],\n    [0,0,0,0,1,0,1,0,0,0],\n    [0,0,0,0,0,1,1,0,0,0],\n    [0,0,0,1,0,1,0,0,0,0],\n    [0,0,0,0,1,0,0,0,0,0],\n    [0,0,0,0,0,0,0,0,0,0]])\n
            \n

            This will identify the clusters:

            \n
            lw, num = measurements.label(z2)\nprint lw\narray([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],\n   [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],\n   [0, 0, 1, 0, 2, 0, 0, 0, 3, 0],\n   [0, 0, 0, 0, 0, 0, 4, 0, 3, 0],\n   [0, 0, 0, 0, 0, 0, 4, 0, 0, 0],\n   [0, 0, 0, 0, 5, 0, 4, 0, 0, 0],\n   [0, 0, 0, 0, 0, 4, 4, 0, 0, 0],\n   [0, 0, 0, 6, 0, 4, 0, 0, 0, 0],\n   [0, 0, 0, 0, 7, 0, 0, 0, 0, 0],\n   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])\n
            \n

            The following will calculate their area.

            \n
            area = measurements.sum(z2, lw, index=arange(lw.max() + 1))\nprint area\n[ 0.  2.  1.  2.  6.  1.  1.  1.]\n
            \n

            This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.

            \n soup wrap:

            it seems like a percolation problem. The following link has your answer if you have scipy installed.

            http://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/

            from pylab import *
            from scipy.ndimage import measurements
            
            z2 = array([[0,0,0,0,0,0,0,0,0,0],
                [0,0,1,0,0,0,0,0,0,0],
                [0,0,1,0,1,0,0,0,1,0],
                [0,0,0,0,0,0,1,0,1,0],
                [0,0,0,0,0,0,1,0,0,0],
                [0,0,0,0,1,0,1,0,0,0],
                [0,0,0,0,0,1,1,0,0,0],
                [0,0,0,1,0,1,0,0,0,0],
                [0,0,0,0,1,0,0,0,0,0],
                [0,0,0,0,0,0,0,0,0,0]])
            

            This will identify the clusters:

            lw, num = measurements.label(z2)
            print lw
            array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
               [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
               [0, 0, 1, 0, 2, 0, 0, 0, 3, 0],
               [0, 0, 0, 0, 0, 0, 4, 0, 3, 0],
               [0, 0, 0, 0, 0, 0, 4, 0, 0, 0],
               [0, 0, 0, 0, 5, 0, 4, 0, 0, 0],
               [0, 0, 0, 0, 0, 4, 4, 0, 0, 0],
               [0, 0, 0, 6, 0, 4, 0, 0, 0, 0],
               [0, 0, 0, 0, 7, 0, 0, 0, 0, 0],
               [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
            

            The following will calculate their area.

            area = measurements.sum(z2, lw, index=arange(lw.max() + 1))
            print area
            [ 0.  2.  1.  2.  6.  1.  1.  1.]
            

            This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.

            qid & accept id: (25690354, 25690522) query: How to generalize a multiplication table for (n * m) soup:

            You can write your function as:

            \n
            def print_multiples(n, m = 10):\n    for i in range(0, m + 1):\n        print n * i,\n    print ""\n
            \n

            then

            \n
             print_multiples(2)\n
            \n

            will print

            \n
            0 2 4 6 8 10 12 14 16 18 20\n
            \n

            and

            \n

            print_multiples(2, 5)

            \n
            0 2 4 6 8 10\n
            \n

            Then with the function:

            \n
            def print_table(n = 10):\n    for i in range(1, n + 1):\n        print_multiples(i)\n
            \n

            you can:

            \n
             print_table()\n
            \n

            and this produces the output:

            \n
            0 1 2 3 4 5 6 7 8 9 10  \n0 2 4 6 8 10 12 14 16 18 20 \n0 3 6 9 12 15 18 21 24 27 30 \n...\n0 10 20 30 40 50 60 70 80 90 100 \n
            \n

            while

            \n
             print_table(2)\n
            \n

            produces:

            \n
            0 1 2 3 4 5 6 7 8 9 10 \n0 2 4 6 8 10 12 14 16 18 20 \n
            \n soup wrap:

            You can write your function as:

            def print_multiples(n, m = 10):
                for i in range(0, m + 1):
                    print n * i,
                print ""
            

            then

             print_multiples(2)
            

            will print

            0 2 4 6 8 10 12 14 16 18 20
            

            and

            print_multiples(2, 5)

            0 2 4 6 8 10
            

            Then with the function:

            def print_table(n = 10):
                for i in range(1, n + 1):
                    print_multiples(i)
            

            you can:

             print_table()
            

            and this produces the output:

            0 1 2 3 4 5 6 7 8 9 10  
            0 2 4 6 8 10 12 14 16 18 20 
            0 3 6 9 12 15 18 21 24 27 30 
            ...
            0 10 20 30 40 50 60 70 80 90 100 
            

            while

             print_table(2)
            

            produces:

            0 1 2 3 4 5 6 7 8 9 10 
            0 2 4 6 8 10 12 14 16 18 20 
            
            qid & accept id: (25690778, 25690959) query: Jinja2 to put a whole element in

            How to pass the target - in your view you need to do this (I assume you know the basics of views in Flask)

                target = request.form['key_of_the_data_we_need'] # the value that should be selected
                mydict = {'5min': '5-Min', '1hour': 'Hour', '1day': 'Day'} # helper for the select
                return render_template('yourtemplate.html', target=target, mydict=mydict)
            

            This way, the data is sent to yourtemplate.html which contains the code discussed above, therefore selecting the desired Option 1\n \n \n \n

            \n

            \n

            Then in the javascript from the mpld3-flask example I use:

            \n
            $('.btn-primary').on('click', function(){\n  var qu = {"plot_type":$(this).find('input').attr('id')}\n  $(this).addClass('active').siblings().removeClass('active');\n  $.ajax({\n    type: "POST",\n    async:true,\n    contentType: "application/json; charset=utf-8",\n    url: "/query",\n    data: JSON.stringify(qu),\n    success: function (data) {\n     var graph = $("#container");\n     graph.html(data);\n     $("#container").show();\n     },\n   dataType: "html"\n  });  \n}); \n
            \n

            Now I have a radio-button bar, with the currently active plot button set to 'active', which only requires one click on one of the buttons to draw a new plot.

            \n

            EDIT: After learning more about flask and Jinja2, it is also easy to pass the mpld3-generated html to a a template, but there is a slight gotcha; it looks something like this

            \n

            In the return of your python routing function:

            \n

            return render_template('index.html', plot=mpld3_html)

            \n

            Then in the html template you can reference this html with

            \n

            {{plot|safe}}

            \n

            Hope this helps someone else.

            \n soup wrap:

            After working for a while, I have found a solution that seems to work and achieve the behavior I want. Note that I am using the twitter bootstrap css and javascript packages.

            Basically, I make a button group:

              

            Then in the javascript from the mpld3-flask example I use:

            $('.btn-primary').on('click', function(){
              var qu = {"plot_type":$(this).find('input').attr('id')}
              $(this).addClass('active').siblings().removeClass('active');
              $.ajax({
                type: "POST",
                async:true,
                contentType: "application/json; charset=utf-8",
                url: "/query",
                data: JSON.stringify(qu),
                success: function (data) {
                 var graph = $("#container");
                 graph.html(data);
                 $("#container").show();
                 },
               dataType: "html"
              });  
            }); 
            

            Now I have a radio-button bar, with the currently active plot button set to 'active', which only requires one click on one of the buttons to draw a new plot.

            EDIT: After learning more about flask and Jinja2, it is also easy to pass the mpld3-generated html to a a template, but there is a slight gotcha; it looks something like this

            In the return of your python routing function:

            return render_template('index.html', plot=mpld3_html)

            Then in the html template you can reference this html with

            {{plot|safe}}

            Hope this helps someone else.

            qid & accept id: (25744399, 25744551) query: Switch between assignments to different variables in python? With ternary operator? soup:

            It's possible to conditionally assign on only one line, but I don't consider it "elegant".

            \n
            test = True\na = 23\nb = 42\na,b = (1,b) if test else (a,1)\nprint (a,b)\n
            \n

            Result:

            \n
            (1, 42)\n
            \n
            \n

            As an alternative approach, consider using a dictionary to store your a and b values.

            \n
            test = True\nd = {"a": 23, "b": 42}\nd["a" if test else "b"] = 1\nprint d\n#result: {'a': 1, 'b': 42}\n
            \n

            Or, if the names have no semantic value, store your numbers in a list.

            \n
            test = True\nseq = [42, 23]\nseq[test] = 1\nprint seq\n#result: [42, 1]\n
            \n soup wrap:

            It's possible to conditionally assign on only one line, but I don't consider it "elegant".

            test = True
            a = 23
            b = 42
            a,b = (1,b) if test else (a,1)
            print (a,b)
            

            Result:

            (1, 42)
            

            As an alternative approach, consider using a dictionary to store your a and b values.

            test = True
            d = {"a": 23, "b": 42}
            d["a" if test else "b"] = 1
            print d
            #result: {'a': 1, 'b': 42}
            

            Or, if the names have no semantic value, store your numbers in a list.

            test = True
            seq = [42, 23]
            seq[test] = 1
            print seq
            #result: [42, 1]
            
            qid & accept id: (25746147, 25746907) query: Pandas: Get value of mutliple sorting/grouping query soup:

            Tested on python 2.7 and pandas 0.14, but I am almost sure this should be identical for your environment. Using your example data frame:

            \n
              \n
            • Group the df by the column B and get the first and last element of each identical value in this group.

              \n
              df.groupby('B').head(1)\ndf.groupby('B').last()\n
            • \n
            • Group the df by the column C and get the first and last element of each identical value in this group: Use the same snippets as above, but replacing 'B' for 'C'

            • \n
            • Get also the value that are a multiple of 0.1 in column A

              \n
              df[df.A % 0.1 == 0]\n
            • \n
            \n soup wrap:

            Tested on python 2.7 and pandas 0.14, but I am almost sure this should be identical for your environment. Using your example data frame:

            • Group the df by the column B and get the first and last element of each identical value in this group.

              df.groupby('B').head(1)
              df.groupby('B').last()
              
            • Group the df by the column C and get the first and last element of each identical value in this group: Use the same snippets as above, but replacing 'B' for 'C'

            • Get also the value that are a multiple of 0.1 in column A

              df[df.A % 0.1 == 0]
              
            qid & accept id: (25757650, 25757751) query: replacing appointed characters in a string in txt file soup:

            Using regular expressions, its a trivial task:

            \n
            >>> s = '''C  DesignerTEE edBore 1 1/42006\n... Cylinder SingleVerticalB DesignerHHJ e 1 1/8Cooling 1\n... EngineBore 11/16 DesignerTDT 8Length 3Width 3\n... EngineCy DesignerHEE Inline2008Bore 1\n... Height 4TheChallen DesignerTET e 1Stroke 1P 305\n... Height 8C 606Wall15ccG DesignerQBG ccGasEngineJ 142\n... Height DesignerEQE C 60150ccGas2007'''\n>>> import re\n>>> exp = 'Designer[A-Z]{3}'\n>>> re.findall(exp, s)\n['DesignerTEE', 'DesignerHHJ', 'DesignerTDT', 'DesignerHEE', 'DesignerTET', 'DesignerQBG', 'DesignerEQE']\n
            \n

            The regular expression is Designer[A-Z]{3} which means the letters Designer, followed by any letter from capital A to capital Z that appears 3 times, and only three times.

            \n

            So, it won't match DesignerABCD (4 letters), it also wont match Desginer123 (123 is not valid letters).

            \n

            It also won't match Designerabc (abc are small letters). To make it ignore the case, you can pass an optional flag re.I as a third argument; but this will also match designerabc (you have to be very specific with regular expressions).

            \n

            So, to make it so that it matches Designer followed by exactly 3 upper or lower case letters, you'd have to change the expression to Designer[Aa-zZ]{3}.

            \n

            If you want to search and replace, then you can use re.sub for substituting matches; so if I want to replace all matches with the word 'hello':

            \n
            >>> x = re.sub(exp, 'hello', s)\n>>> print(x)\nC  hello edBore 1 1/42006\nCylinder SingleVerticalB hello e 1 1/8Cooling 1\nEngineBore 11/16 hello 8Length 3Width 3\nEngineCy hello Inline2008Bore 1\nHeight 4TheChallen hello e 1Stroke 1P 305\nHeight 8C 606Wall15ccG hello ccGasEngineJ 142\nHeight hello C 60150ccGas2007\n
            \n
            \n
            \n

            and what if both before and after 'Designer', there are characters,\n and the length of character is not fixed. I tried\n '[Aa-zZ]Designer[Aa-zZ]{0~9}', but it doesn't work..

            \n
            \n

            For these things, there are special characters in regular expressions. Briefly summarized below:

            \n
              \n
            • When you want to say "1 or more, but at least 1", use +
            • \n
            • When you want to say "0 or any number, but there maybe none", use *
            • \n
            • When you want to say "none but if it exists, only repeats once" use ?
            • \n
            \n

            You use this after the expression you want to be modified with the "repetition" modifiers.

            \n

            For more on this, have a read through the documentation.

            \n

            Now your requirements is "there are characters but the length is not fixed", based on this, we have to use +.

            \n soup wrap:

            Using regular expressions, its a trivial task:

            >>> s = '''C  DesignerTEE edBore 1 1/42006
            ... Cylinder SingleVerticalB DesignerHHJ e 1 1/8Cooling 1
            ... EngineBore 11/16 DesignerTDT 8Length 3Width 3
            ... EngineCy DesignerHEE Inline2008Bore 1
            ... Height 4TheChallen DesignerTET e 1Stroke 1P 305
            ... Height 8C 606Wall15ccG DesignerQBG ccGasEngineJ 142
            ... Height DesignerEQE C 60150ccGas2007'''
            >>> import re
            >>> exp = 'Designer[A-Z]{3}'
            >>> re.findall(exp, s)
            ['DesignerTEE', 'DesignerHHJ', 'DesignerTDT', 'DesignerHEE', 'DesignerTET', 'DesignerQBG', 'DesignerEQE']
            

            The regular expression is Designer[A-Z]{3} which means the letters Designer, followed by any letter from capital A to capital Z that appears 3 times, and only three times.

            So, it won't match DesignerABCD (4 letters), it also wont match Desginer123 (123 is not valid letters).

            It also won't match Designerabc (abc are small letters). To make it ignore the case, you can pass an optional flag re.I as a third argument; but this will also match designerabc (you have to be very specific with regular expressions).

            So, to make it so that it matches Designer followed by exactly 3 upper or lower case letters, you'd have to change the expression to Designer[Aa-zZ]{3}.

            If you want to search and replace, then you can use re.sub for substituting matches; so if I want to replace all matches with the word 'hello':

            >>> x = re.sub(exp, 'hello', s)
            >>> print(x)
            C  hello edBore 1 1/42006
            Cylinder SingleVerticalB hello e 1 1/8Cooling 1
            EngineBore 11/16 hello 8Length 3Width 3
            EngineCy hello Inline2008Bore 1
            Height 4TheChallen hello e 1Stroke 1P 305
            Height 8C 606Wall15ccG hello ccGasEngineJ 142
            Height hello C 60150ccGas2007
            

            and what if both before and after 'Designer', there are characters, and the length of character is not fixed. I tried '[Aa-zZ]Designer[Aa-zZ]{0~9}', but it doesn't work..

            For these things, there are special characters in regular expressions. Briefly summarized below:

            • When you want to say "1 or more, but at least 1", use +
            • When you want to say "0 or any number, but there maybe none", use *
            • When you want to say "none but if it exists, only repeats once" use ?

            You use this after the expression you want to be modified with the "repetition" modifiers.

            For more on this, have a read through the documentation.

            Now your requirements is "there are characters but the length is not fixed", based on this, we have to use +.

            qid & accept id: (25765631, 25768664) query: Need to parse a tool log file in python and then save the results in excel or csv soup:

            I'm not sure what the values enclosed by the inequality signs are, so I have replaced them with foo and bar. Something like this ought to do the trick:

            \n
            import re\nimport csv\n\nfiltered_messages = ['UpdatePlaybackStatusInfo', 'Assert']\nfieldnames = ['ticks', 'foo', 'type', 'bar', 'message']\n\nwith open('log.txt') as log:\n    with open('output.csv', 'w') as csv_file:\n        writer = csv.DictWriter(csv_file, delimiter=',', fieldnames=fieldnames)\n        writer.writerow(dict((fn,fn) for fn in fieldnames))\n        for line in log:\n            match = re.search(r'^Ticks = (?P\d+)\s+<(?P\d+)> <(?P\w+)> <(?P\d+)>\s+(?P.+)$', line)\n            if match is not None and match.group('type') in filtered_messages:\n                writer.writerow(match.groupdict())\n
            \n

            Output (as CSV):

            \n
            ticks   foo type    bar message\n\n2408967 3360    UpdatePlaybackStatusInfo    0   Avg Prefetch(ms): 157.739, Avg Render(ms): 25.7375, Avg Display FPS: 27.3688\n\n3371181 3360    UpdatePlaybackStatusInfo    0   Frames dropped during playback: 0 / 219, Preroll(ms): 812.849\n\n3371181 3360    UpdatePlaybackStatusInfo    0   Avg Prefetch(ms): 17.1389, Avg Render(ms): 33.8339, Avg Display FPS: 29.5562\n\n3465531 10548   Assert  0   Debug Assert failed!\n\n3465531 10548   Assert  0   wglMakeCurrent failed: Error 0: The operation completed successfully.\n
            \n soup wrap:

            I'm not sure what the values enclosed by the inequality signs are, so I have replaced them with foo and bar. Something like this ought to do the trick:

            import re
            import csv
            
            filtered_messages = ['UpdatePlaybackStatusInfo', 'Assert']
            fieldnames = ['ticks', 'foo', 'type', 'bar', 'message']
            
            with open('log.txt') as log:
                with open('output.csv', 'w') as csv_file:
                    writer = csv.DictWriter(csv_file, delimiter=',', fieldnames=fieldnames)
                    writer.writerow(dict((fn,fn) for fn in fieldnames))
                    for line in log:
                        match = re.search(r'^Ticks = (?P\d+)\s+<(?P\d+)> <(?P\w+)> <(?P\d+)>\s+(?P.+)$', line)
                        if match is not None and match.group('type') in filtered_messages:
                            writer.writerow(match.groupdict())
            

            Output (as CSV):

            ticks   foo type    bar message
            
            2408967 3360    UpdatePlaybackStatusInfo    0   Avg Prefetch(ms): 157.739, Avg Render(ms): 25.7375, Avg Display FPS: 27.3688
            
            3371181 3360    UpdatePlaybackStatusInfo    0   Frames dropped during playback: 0 / 219, Preroll(ms): 812.849
            
            3371181 3360    UpdatePlaybackStatusInfo    0   Avg Prefetch(ms): 17.1389, Avg Render(ms): 33.8339, Avg Display FPS: 29.5562
            
            3465531 10548   Assert  0   Debug Assert failed!
            
            3465531 10548   Assert  0   wglMakeCurrent failed: Error 0: The operation completed successfully.
            
            qid & accept id: (25838448, 25838514) query: Cycling through possible indentations in python.el in Emacs soup:

            Use C-h m in order to know that. It invokes the describe-mode function.

            \n

            You can also look at the python.el file and look for define-key:

            \n
            ;; Indent specific                                                                                                                         \n(define-key map "\177" 'python-indent-dedent-line-backspace)                                                                               \n(define-key map (kbd "") 'python-indent-dedent-line)                                                                              \n(define-key map "\C-c<" 'python-indent-shift-left)                                                                                         \n(define-key map "\C-c>" 'python-indent-shift-right)                                                                                        \n(define-key map ":" 'python-indent-electric-colon)      \n
            \n

            Or indent

            \n
            ;; Indentation: Automatic indentation with indentation cycling is                                                                              \n;; provided, it allows you to navigate different available levels of                                                                           \n;; indentation by hitting  several times.  Also when inserting a                                                                          \n;; colon the `python-indent-electric-colon' command is invoked and                                                                             \n;; causes the current line to be dedented automatically if needed. \n
            \n soup wrap:

            Use C-h m in order to know that. It invokes the describe-mode function.

            You can also look at the python.el file and look for define-key:

            ;; Indent specific                                                                                                                         
            (define-key map "\177" 'python-indent-dedent-line-backspace)                                                                               
            (define-key map (kbd "") 'python-indent-dedent-line)                                                                              
            (define-key map "\C-c<" 'python-indent-shift-left)                                                                                         
            (define-key map "\C-c>" 'python-indent-shift-right)                                                                                        
            (define-key map ":" 'python-indent-electric-colon)      
            

            Or indent

            ;; Indentation: Automatic indentation with indentation cycling is                                                                              
            ;; provided, it allows you to navigate different available levels of                                                                           
            ;; indentation by hitting  several times.  Also when inserting a                                                                          
            ;; colon the `python-indent-electric-colon' command is invoked and                                                                             
            ;; causes the current line to be dedented automatically if needed. 
            
            qid & accept id: (25859572, 25859700) query: Pandas: Change dataframe values based on dictionary and remove rows with no match soup:

            You can use isin to filter for valid rows, and then use replace to replace the values:

            \n
            import pandas as pd\nHashTable = {"chr1" : 1, "chr2" : 2, "chr3" : 3, "chr4" : 4, "chr5" : 5, "chr6" : 6, "chr7" : 7, "chr8" : 8, "chr9" : 9, "chr10" : 10, "chr11" : 11, "chr12" : 12, "chr13" : 13, "chr14" : 14, "chr15" : 15, "chr16" : 16, "chr17" : 17, "chr18" : 18, "chr19" : 19, "chrX" : 20, "chrY" : 21, "chrM" : 22, 'chrMT': 23}\n# A dummy DataFrame with all the valid chromosomes and one unknown chromosome\ndf = pd.DataFrame({"Chrom": HashTable.keys() + ["unknown_chr"]})\n# Filter for valid rows\ndf = df[df["Chrom"].isin(HashTable.keys())]\n# Replace the values according to dict\ndf["Chrom"].replace(HashTable, inplace=True)\nprint df\n
            \n

            Input (the dummy df above):

            \n
                      Chrom\n0         chrMT\n1          chrY\n2          chrX\n3         chr13\n4         chr12\n5         chr11\n6         chr10\n7         chr17\n8         chr16\n9         chr15\n10        chr14\n11        chr19\n12        chr18\n13         chrM\n14         chr7\n15         chr6\n16         chr5\n17         chr4\n18         chr3\n19         chr2\n20         chr1\n21         chr9\n22         chr8\n23  unknown_chr\n
            \n

            Output DataFrame:

            \n
               Chrom\n0     23\n1     21\n2     20\n3     13\n4     12\n5     11\n6     10\n7     17\n8     16\n9     15\n10    14\n11    19\n12    18\n13    22\n14     7\n15     6\n16     5\n17     4\n18     3\n19     2\n20     1\n21     9\n22     8\n
            \n

            If the resulting values are all integers, you change the above replace line to enforce the correct dtype:

            \n
            df["Chrom"] = df["Chrom"].replace(HashTable).astype(int)\n
            \n soup wrap:

            You can use isin to filter for valid rows, and then use replace to replace the values:

            import pandas as pd
            HashTable = {"chr1" : 1, "chr2" : 2, "chr3" : 3, "chr4" : 4, "chr5" : 5, "chr6" : 6, "chr7" : 7, "chr8" : 8, "chr9" : 9, "chr10" : 10, "chr11" : 11, "chr12" : 12, "chr13" : 13, "chr14" : 14, "chr15" : 15, "chr16" : 16, "chr17" : 17, "chr18" : 18, "chr19" : 19, "chrX" : 20, "chrY" : 21, "chrM" : 22, 'chrMT': 23}
            # A dummy DataFrame with all the valid chromosomes and one unknown chromosome
            df = pd.DataFrame({"Chrom": HashTable.keys() + ["unknown_chr"]})
            # Filter for valid rows
            df = df[df["Chrom"].isin(HashTable.keys())]
            # Replace the values according to dict
            df["Chrom"].replace(HashTable, inplace=True)
            print df
            

            Input (the dummy df above):

                      Chrom
            0         chrMT
            1          chrY
            2          chrX
            3         chr13
            4         chr12
            5         chr11
            6         chr10
            7         chr17
            8         chr16
            9         chr15
            10        chr14
            11        chr19
            12        chr18
            13         chrM
            14         chr7
            15         chr6
            16         chr5
            17         chr4
            18         chr3
            19         chr2
            20         chr1
            21         chr9
            22         chr8
            23  unknown_chr
            

            Output DataFrame:

               Chrom
            0     23
            1     21
            2     20
            3     13
            4     12
            5     11
            6     10
            7     17
            8     16
            9     15
            10    14
            11    19
            12    18
            13    22
            14     7
            15     6
            16     5
            17     4
            18     3
            19     2
            20     1
            21     9
            22     8
            

            If the resulting values are all integers, you change the above replace line to enforce the correct dtype:

            df["Chrom"] = df["Chrom"].replace(HashTable).astype(int)
            
            qid & accept id: (25882275, 25882453) query: iterate the range in for loop to satisfy the condition soup:

            Supposing you are just asking to do the dynamic way use the following:

            \n
            set_mean = -10\n#calculated_mean = None\n\nenergy = []\ncalculated_mean = float('inf')\nx = 1\nwhile calculated_mean > set_mean:\n    for i in range(x, x+4):  # you can change step size here by passing it as last argument\n        energy.append(i-i*i)  \n        print(energy)\n        calculated_mean =  sum(energy[-2:])/2\n        print(calculated_mean)\n    x = x + 1\n
            \n

            Output is:

            \n
            [0]\n0\n[0, -2]\n-1\n[0, -2, -6]\n-4\n[0, -2, -6, -12]\n-9\n[0, -2, -6, -12, -2]\n-7\n[0, -2, -6, -12, -2, -6]\n-4\n[0, -2, -6, -12, -2, -6, -12]\n-9\n[0, -2, -6, -12, -2, -6, -12, -20]\n-16\n
            \n

            I think this is exactly what you want. Because the loop stops when you get -16

            \n soup wrap:

            Supposing you are just asking to do the dynamic way use the following:

            set_mean = -10
            #calculated_mean = None
            
            energy = []
            calculated_mean = float('inf')
            x = 1
            while calculated_mean > set_mean:
                for i in range(x, x+4):  # you can change step size here by passing it as last argument
                    energy.append(i-i*i)  
                    print(energy)
                    calculated_mean =  sum(energy[-2:])/2
                    print(calculated_mean)
                x = x + 1
            

            Output is:

            [0]
            0
            [0, -2]
            -1
            [0, -2, -6]
            -4
            [0, -2, -6, -12]
            -9
            [0, -2, -6, -12, -2]
            -7
            [0, -2, -6, -12, -2, -6]
            -4
            [0, -2, -6, -12, -2, -6, -12]
            -9
            [0, -2, -6, -12, -2, -6, -12, -20]
            -16
            

            I think this is exactly what you want. Because the loop stops when you get -16

            qid & accept id: (25883410, 25883985) query: How to sort through keys in a dictionary, adding the values and returning a list of keys if combined values equal a certain number soup:

            As long as the number of items isn't too large, you can brute-force this:

            \n
            import itertools\ndef matches(d, target):\n    # First try single items, then couples, then triplets etc.\n    for num in range(1,len(d)+1):\n        # Iterate over all possible combinations of length num\n        for com in itertools.combinations(d.items(), num):\n            # Does the sum of all second items per key/value pair match the target?\n            if sum(item[1] for item in com) == target:\n                # Yield one item at a time, so the caller can decide when to stop\n                yield com\n
            \n

            You can use it to iterate over all matches:

            \n
            >>> mydict = {'a':1, 'b':12, 'c':33, 'd':40, 'e':15, 'f':6, 'g':27}\n>>> for match in matches(mydict,55):\n...     print(match)\n...\n(('d', 40), ('e', 15))\n(('c', 33), ('e', 15), ('f', 6), ('a', 1))\n(('b', 12), ('e', 15), ('g', 27), ('a', 1))\n
            \n

            or add a break after the print() line to make your program stop at the first match.

            \n soup wrap:

            As long as the number of items isn't too large, you can brute-force this:

            import itertools
            def matches(d, target):
                # First try single items, then couples, then triplets etc.
                for num in range(1,len(d)+1):
                    # Iterate over all possible combinations of length num
                    for com in itertools.combinations(d.items(), num):
                        # Does the sum of all second items per key/value pair match the target?
                        if sum(item[1] for item in com) == target:
                            # Yield one item at a time, so the caller can decide when to stop
                            yield com
            

            You can use it to iterate over all matches:

            >>> mydict = {'a':1, 'b':12, 'c':33, 'd':40, 'e':15, 'f':6, 'g':27}
            >>> for match in matches(mydict,55):
            ...     print(match)
            ...
            (('d', 40), ('e', 15))
            (('c', 33), ('e', 15), ('f', 6), ('a', 1))
            (('b', 12), ('e', 15), ('g', 27), ('a', 1))
            

            or add a break after the print() line to make your program stop at the first match.

            qid & accept id: (25923521, 25923546) query: Compare list w/ sublist soup:

            You could turn lista into a set for fast membership testing, then just loop over listb to select any that are found in lista:

            \n
            lista_set = set(lista)\nfor item in listb:\n    if item[0] in lista_set:\n        print item\n
            \n

            The next step is turning listb into a dictionary:

            \n
            listb_dict = {item[0]: item[1:] for item in listb}\n
            \n

            Now you can use sets to pick out just the ones that are both in lista_set and listb_dict:

            \n
            for match in listb_dict.viewkeys() & lista_set:\n    print match, listb_dict[match]\n
            \n soup wrap:

            You could turn lista into a set for fast membership testing, then just loop over listb to select any that are found in lista:

            lista_set = set(lista)
            for item in listb:
                if item[0] in lista_set:
                    print item
            

            The next step is turning listb into a dictionary:

            listb_dict = {item[0]: item[1:] for item in listb}
            

            Now you can use sets to pick out just the ones that are both in lista_set and listb_dict:

            for match in listb_dict.viewkeys() & lista_set:
                print match, listb_dict[match]
            
            qid & accept id: (25932166, 25933055) query: Generic way to get primary key from declaratively defined instance in SQLAlchemy soup:

            You can use inspection for that purpose:

            \n

            http://docs.sqlalchemy.org/en/latest/core/inspection.html

            \n

            Passing an instance of a mapped object to inspect, returns an InstanceState, describing that object.\nThis state also contains the identity:

            \n
            Base = declarative_base()\n\nclass MyClass(Base):\n    __tablename__ = 'mytable'\n    key = Column(Integer, primary_key=True)\na = MyClass(key=1)\n\nfrom sqlalchemy.inspection import inspect    \npk = inspect(a).identity\nprint pk\n
            \n

            Will give:

            \n
            (1,)\n
            \n

            Since primary keys can consist of multiple columns, the identity in general is a tuple containing all the column values that are part of the primary key.\nIn your case, that's simply the key.

            \n soup wrap:

            You can use inspection for that purpose:

            http://docs.sqlalchemy.org/en/latest/core/inspection.html

            Passing an instance of a mapped object to inspect, returns an InstanceState, describing that object. This state also contains the identity:

            Base = declarative_base()
            
            class MyClass(Base):
                __tablename__ = 'mytable'
                key = Column(Integer, primary_key=True)
            a = MyClass(key=1)
            
            from sqlalchemy.inspection import inspect    
            pk = inspect(a).identity
            print pk
            

            Will give:

            (1,)
            

            Since primary keys can consist of multiple columns, the identity in general is a tuple containing all the column values that are part of the primary key. In your case, that's simply the key.

            qid & accept id: (25946008, 25946883) query: How to remove all "document.write(' ');" with beautifulsoup soup:

            Why don't you just use regexs to remove the parts you don't want and then parse it using beautifulsoup?

            \n
            import re\n\ndata = """document.write('');\ndocument.write('\n \n  \n  ');\ndocument.write('\n  \n \n ');\ndocument.write('
            \n \n some text\n \n \n \n 7.70.022\n \n
            ');"""\n\npattern = re.compile(r"document\.write\('\n?([^']*?)(?:\n\s*)?'\);")\ndata = pattern.sub('\g<1>', data)\nprint data\n
            \n

            Output

            \n
            \n \n  \n  \n \n
            \n \n some text\n \n \n \n 7.70.022\n \n
            \n
            \n soup wrap:

            Why don't you just use regexs to remove the parts you don't want and then parse it using beautifulsoup?

            import re
            
            data = """document.write('');
            document.write('
             
              ');
            document.write('
              
             ');
            document.write('
            some text 7.70.022
            ');""" pattern = re.compile(r"document\.write\('\n?([^']*?)(?:\n\s*)?'\);") data = pattern.sub('\g<1>', data) print data

            Output

            some text 7.70.022
            qid & accept id: (25959543, 25961098) query: Match rows between two files and mark the matched strings soup:

            You give raw text and don't specify the kind of formatting you want to do. Leaving the formatting details out, yes you can replace text in FileA that is also in FileB with formatted content.

            \n
            import re\nwith open('fileA.txt') as A:\n    A_content=[x.strip() for x in A]\nwith open('fileB.txt') as B:\n    B_content=[x.strip() for x in B]\noutput=[]\nfor line_A in A_content:\n    for line_B in B_content:\n        #do whatever formatting you need on the text, \n        # I am just surrounding it with *'s here\n\n        replace = "**" + line_B + "**"\n\n        #use re.sub, \n        # details here: https://docs.python.org/2/library/re.html#re.sub\n\n        line_A = re.sub(line_B, replace , line_A)\n    #I am adding everything to the output array but you can check if it is \n    # different from the initial content. I leave that for you to do\n    output.append(line_A)\n
            \n

            output

            \n
            **NM_134083**  mmu-miR-96-5p   **NM_134083**       0.96213 -0.054\n**NM_177305**  mmu-miR-96-5p   **NM_177305**       0.95707 -0.099\nNM_026184  mmu-miR-93-3p   NM_026184       0.9552  -0.01\n
            \n soup wrap:

            You give raw text and don't specify the kind of formatting you want to do. Leaving the formatting details out, yes you can replace text in FileA that is also in FileB with formatted content.

            import re
            with open('fileA.txt') as A:
                A_content=[x.strip() for x in A]
            with open('fileB.txt') as B:
                B_content=[x.strip() for x in B]
            output=[]
            for line_A in A_content:
                for line_B in B_content:
                    #do whatever formatting you need on the text, 
                    # I am just surrounding it with *'s here
            
                    replace = "**" + line_B + "**"
            
                    #use re.sub, 
                    # details here: https://docs.python.org/2/library/re.html#re.sub
            
                    line_A = re.sub(line_B, replace , line_A)
                #I am adding everything to the output array but you can check if it is 
                # different from the initial content. I leave that for you to do
                output.append(line_A)
            

            output

            **NM_134083**  mmu-miR-96-5p   **NM_134083**       0.96213 -0.054
            **NM_177305**  mmu-miR-96-5p   **NM_177305**       0.95707 -0.099
            NM_026184  mmu-miR-93-3p   NM_026184       0.9552  -0.01
            
            qid & accept id: (25996817, 25997002) query: python regex comma separated group soup:
            >>> import re\n>>> s = "/dev/mapper/ex_s-l_home /home  ext4    rw,exec,auto,nouser,async    1  2"\n>>> s2 = "/dev/mapper/ex_s-l_home /home  ext4    rw,exec,auto,nodev,nouser,async    1  2"\n>>> re.findall(r'(?<=\s)(?!.*nodev)(?=\S*,\S*)\S+', s)\n['rw,exec,auto,nouser,async']\n>>> re.findall(r'(?<=\s)(?!.*nodev)(?=\S*,\S*)\S+', s2)\n[]\n
            \n

            To append ,nodev:

            \n
            >>> re.sub(r'(?<=\s)(?!.*nodev)(?=\S*,\S*)\S+', r'\g<0>,nodev', s)\n'/dev/mapper/ex_s-l_home /home  ext4    rw,exec,auto,nouser,async,nodev    1  2'\n>>> re.sub(r'(?<=\s)(?!.*nodev)(?=\S*,\S*)\S+', r'\g<0>,nodev', s2)\n'/dev/mapper/ex_s-l_home /home  ext4    rw,exec,auto,nodev,nouser,async    1  2'\n
            \n
            \n

            pythex demo

            \n soup wrap:
            >>> import re
            >>> s = "/dev/mapper/ex_s-l_home /home  ext4    rw,exec,auto,nouser,async    1  2"
            >>> s2 = "/dev/mapper/ex_s-l_home /home  ext4    rw,exec,auto,nodev,nouser,async    1  2"
            >>> re.findall(r'(?<=\s)(?!.*nodev)(?=\S*,\S*)\S+', s)
            ['rw,exec,auto,nouser,async']
            >>> re.findall(r'(?<=\s)(?!.*nodev)(?=\S*,\S*)\S+', s2)
            []
            

            To append ,nodev:

            >>> re.sub(r'(?<=\s)(?!.*nodev)(?=\S*,\S*)\S+', r'\g<0>,nodev', s)
            '/dev/mapper/ex_s-l_home /home  ext4    rw,exec,auto,nouser,async,nodev    1  2'
            >>> re.sub(r'(?<=\s)(?!.*nodev)(?=\S*,\S*)\S+', r'\g<0>,nodev', s2)
            '/dev/mapper/ex_s-l_home /home  ext4    rw,exec,auto,nodev,nouser,async    1  2'
            

            pythex demo

            qid & accept id: (26059710, 26060703) query: Compare two files in python and save line differences in a new file soup:
            f1 = open('a', 'r').readlines()\nf2 = open('b', 'r').readlines()\nout = []\ncount = 1 \nfor i in f1:\n    flag = False\n    for j in f2:\n        if i == j:\n            flag = True\n    if not flag:\n        out.append(count)\n    count+=1\nfor o in out:\n    print o\n
            \n

            optimized one

            \n
            f1 = open('a', 'r').readlines()\nf2 = open('b', 'r').readlines()\nout = []\nindexa = 0\nindexb = 0\nout = []\nwhile(1):\n    try:\n        if f1[indexa][:-1] ==  f2[indexb][:-1]:\n            indexa +=1\n            indexb +=1\n        elif f1[indexa][:-1] > f2[indexb][:-1]:\n            indexb += 1\n        elif f1[indexa][:-1] < f2[indexb][:-1]:\n            out.append(indexa+1)\n            indexa += 1\n    except IndexError:\n        break\nfor i in out:\n    print i\n
            \n soup wrap:
            f1 = open('a', 'r').readlines()
            f2 = open('b', 'r').readlines()
            out = []
            count = 1 
            for i in f1:
                flag = False
                for j in f2:
                    if i == j:
                        flag = True
                if not flag:
                    out.append(count)
                count+=1
            for o in out:
                print o
            

            optimized one

            f1 = open('a', 'r').readlines()
            f2 = open('b', 'r').readlines()
            out = []
            indexa = 0
            indexb = 0
            out = []
            while(1):
                try:
                    if f1[indexa][:-1] ==  f2[indexb][:-1]:
                        indexa +=1
                        indexb +=1
                    elif f1[indexa][:-1] > f2[indexb][:-1]:
                        indexb += 1
                    elif f1[indexa][:-1] < f2[indexb][:-1]:
                        out.append(indexa+1)
                        indexa += 1
                except IndexError:
                    break
            for i in out:
                print i
            
            qid & accept id: (26063269, 26080894) query: Simulating electron motion - differential equation with adaptive step size in python soup:

            As Warren Weckesser suggested, I can simply follow the Scipy cookbook for the coupled mass-spring system. First, I need to write my "right side" equations as:

            \n
            x'  = vx\ny'  = vy\nz'  = vz\nvx' = Ac*x/r\nvy' = Ac*y/r + q*E/m\nvz' = Ac*z/r \n
            \n

            where Ac=keq^2/(mr^2) is the magnitude of the acceleration due to the Coulomb potential and E is the time-dependent electric field of the laser. Then, I can use scipy.integrate.odeint to find the solutions. This is faster and more reliable than the method that I was using previously.

            \n

            Here is what the electron trajectories look like with odeint. Now none of them fly away crazily:\nenter image description here

            \n

            And here is the code:

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\nimport scipy.integrate\n\nq  = 1.602e-19    #Coulombs   Charge of electron\nc  = 3.0e8        #m/s        Speed of light\neo = 8.8541e-12   #C^2/(Nm^2) Permittivity of vacuum\nme = 9.109e-31    #kg         Mass of electron\nke = 8.985551e9   #N m^2 C-2  Coulomb's constant\n\ndef tunnel_position(tb,intensity,wavelength,pulseFWHM,Ip):\n    Ip = 15.13 * q  \n    Eb = E_laser(tb,intensity,wavelength,pulseFWHM) \n    return -Ip / (Eb*q) \n\ndef E_laser(t,intensity,wavelength,pulseFWHM):\n    w = c/wavelength * 2. * np.pi #Angular frequency of the laser\n    Eo = np.sqrt(2*intensity*10**4/(c*8.85e-12)) # Electric field in V/m\n    return Eo*np.sin(w*t) * np.exp(-t**2/(2*(pulseFWHM / 2.35482)**2))\n\ndef vectorfield(variables,t,params):\n    x,y,z,vx,vy,vz = variables\n    intensity,wavelength,pulseFWHM,tb = params\n    r = np.sqrt(x**2+y**2+z**2)\n    Ac = -ke*q**2/(r**2*me)\n    return [vx,vy,vz,\n         Ac*x/r,\n         Ac*y/r + q/me * E_laser((t-tb),intensity,wavelength,pulseFWHM),\n         Ac*z/r]\n\nIp  = 15.13   # Ionization potential of Argon eV\nintensity  = 2e14\nwavelength = 800e-9\npulseFWHM  = 40e-15\n\nperiod  = wavelength/c\nt = np.linspace(0,20*period,10000)\n\nbirth_times = np.linspace(0.01*period,0.999*period,50)\nmax_field   = np.max(np.abs(E_laser(birth_times,intensity,wavelength,pulseFWHM)))\n\nfor tb in birth_times:\n    x0  = 0 \n    y0  = tunnel_position(tb,intensity,wavelength,pulseFWHM,Ip)\n    z0  = 0\n    vx0 = 2e4\n    vy0 = 0\n    vz0 = 0\n\n    p = [intensity,wavelength,pulseFWHM,tb]\n    w0 = [x0,y0,z0,vx0,vy0,vz0]\n\n    solution,info = scipy.integrate.odeint(vectorfield,w0,t, args=(p,),full_output=True)\n    print 'Tb: %.2f fs - smallest step : %.05f attosec'%((tb*1e15),np.min(info['hu'])*1e18)\n\n    y = solution[:,1]\n\n    importance = (np.abs(E_laser(tb,intensity,wavelength,pulseFWHM))/max_field)\n    plt.plot(t,y,alpha=importance*0.8,lw=1)\n\nplt.xlabel('Time (sec)')\nplt.ylabel('Y-distance (meters)')\n\nplt.show()\n
            \n soup wrap:

            As Warren Weckesser suggested, I can simply follow the Scipy cookbook for the coupled mass-spring system. First, I need to write my "right side" equations as:

            x'  = vx
            y'  = vy
            z'  = vz
            vx' = Ac*x/r
            vy' = Ac*y/r + q*E/m
            vz' = Ac*z/r 
            

            where Ac=keq^2/(mr^2) is the magnitude of the acceleration due to the Coulomb potential and E is the time-dependent electric field of the laser. Then, I can use scipy.integrate.odeint to find the solutions. This is faster and more reliable than the method that I was using previously.

            Here is what the electron trajectories look like with odeint. Now none of them fly away crazily: enter image description here

            And here is the code:

            import numpy as np
            import matplotlib.pyplot as plt
            import scipy.integrate
            
            q  = 1.602e-19    #Coulombs   Charge of electron
            c  = 3.0e8        #m/s        Speed of light
            eo = 8.8541e-12   #C^2/(Nm^2) Permittivity of vacuum
            me = 9.109e-31    #kg         Mass of electron
            ke = 8.985551e9   #N m^2 C-2  Coulomb's constant
            
            def tunnel_position(tb,intensity,wavelength,pulseFWHM,Ip):
                Ip = 15.13 * q  
                Eb = E_laser(tb,intensity,wavelength,pulseFWHM) 
                return -Ip / (Eb*q) 
            
            def E_laser(t,intensity,wavelength,pulseFWHM):
                w = c/wavelength * 2. * np.pi #Angular frequency of the laser
                Eo = np.sqrt(2*intensity*10**4/(c*8.85e-12)) # Electric field in V/m
                return Eo*np.sin(w*t) * np.exp(-t**2/(2*(pulseFWHM / 2.35482)**2))
            
            def vectorfield(variables,t,params):
                x,y,z,vx,vy,vz = variables
                intensity,wavelength,pulseFWHM,tb = params
                r = np.sqrt(x**2+y**2+z**2)
                Ac = -ke*q**2/(r**2*me)
                return [vx,vy,vz,
                     Ac*x/r,
                     Ac*y/r + q/me * E_laser((t-tb),intensity,wavelength,pulseFWHM),
                     Ac*z/r]
            
            Ip  = 15.13   # Ionization potential of Argon eV
            intensity  = 2e14
            wavelength = 800e-9
            pulseFWHM  = 40e-15
            
            period  = wavelength/c
            t = np.linspace(0,20*period,10000)
            
            birth_times = np.linspace(0.01*period,0.999*period,50)
            max_field   = np.max(np.abs(E_laser(birth_times,intensity,wavelength,pulseFWHM)))
            
            for tb in birth_times:
                x0  = 0 
                y0  = tunnel_position(tb,intensity,wavelength,pulseFWHM,Ip)
                z0  = 0
                vx0 = 2e4
                vy0 = 0
                vz0 = 0
            
                p = [intensity,wavelength,pulseFWHM,tb]
                w0 = [x0,y0,z0,vx0,vy0,vz0]
            
                solution,info = scipy.integrate.odeint(vectorfield,w0,t, args=(p,),full_output=True)
                print 'Tb: %.2f fs - smallest step : %.05f attosec'%((tb*1e15),np.min(info['hu'])*1e18)
            
                y = solution[:,1]
            
                importance = (np.abs(E_laser(tb,intensity,wavelength,pulseFWHM))/max_field)
                plt.plot(t,y,alpha=importance*0.8,lw=1)
            
            plt.xlabel('Time (sec)')
            plt.ylabel('Y-distance (meters)')
            
            plt.show()
            
            qid & accept id: (26126880, 26127036) query: Python Pandas DataFrame how to Pivot soup:

            I believe you want to pivot this using pd.pivot_table. See the examples on pivot tables to understand better how this works.

            \n

            The following should give you what you want.

            \n
            df_wanted = pd.pivot_table(\n    df_orig, \n    index=['AN', 'Bincode', 'BC_all'], \n    columns=['Treatment', 'Timepoint'], \n    values=['RIA_avg', 'sum14N_avg']\n)\n
            \n

            Note that the column names will not be transformed exactly as you stated in your output, but rather there will be a hierarchical index on both the columns and rows, which should be more convenient to work with.

            \n

            Getting rows/columns/values out from this format is possible by using .loc:

            \n
            df_wanted.loc['XYK987', :]\ndf_wanted.loc[:, ('sum14N_avg')]\ndf_wanted.loc['ALF234', ('RIA_avg', 'C', 24)]\n
            \n soup wrap:

            I believe you want to pivot this using pd.pivot_table. See the examples on pivot tables to understand better how this works.

            The following should give you what you want.

            df_wanted = pd.pivot_table(
                df_orig, 
                index=['AN', 'Bincode', 'BC_all'], 
                columns=['Treatment', 'Timepoint'], 
                values=['RIA_avg', 'sum14N_avg']
            )
            

            Note that the column names will not be transformed exactly as you stated in your output, but rather there will be a hierarchical index on both the columns and rows, which should be more convenient to work with.

            Getting rows/columns/values out from this format is possible by using .loc:

            df_wanted.loc['XYK987', :]
            df_wanted.loc[:, ('sum14N_avg')]
            df_wanted.loc['ALF234', ('RIA_avg', 'C', 24)]
            
            qid & accept id: (26163563, 26163696) query: combination of two DF, pandas soup:

            Perform a 'left' merge in your case on column 'B':

            \n
            In [206]:\n\ndf.merge(df1, how='left', on='B')\nOut[206]:\n   A  B  C  D\n0  1  1  3  5\n1  1  1  2  5\n2  1  2  5  6\n3  2  2  7  6\n4  2  3  7  4\n
            \n

            Another method would be to set 'B' on your second df as the index and then call map:

            \n
            In [215]:\n\ndf1 = df1.set_index('B')\ndf['D'] = df['B'].map(df1['D'])\ndf\nOut[215]:\n   A  B  C  D\n0  1  1  3  5\n1  1  1  2  5\n2  1  2  5  6\n3  2  2  7  6\n4  2  3  7  4\n
            \n soup wrap:

            Perform a 'left' merge in your case on column 'B':

            In [206]:
            
            df.merge(df1, how='left', on='B')
            Out[206]:
               A  B  C  D
            0  1  1  3  5
            1  1  1  2  5
            2  1  2  5  6
            3  2  2  7  6
            4  2  3  7  4
            

            Another method would be to set 'B' on your second df as the index and then call map:

            In [215]:
            
            df1 = df1.set_index('B')
            df['D'] = df['B'].map(df1['D'])
            df
            Out[215]:
               A  B  C  D
            0  1  1  3  5
            1  1  1  2  5
            2  1  2  5  6
            3  2  2  7  6
            4  2  3  7  4
            
            qid & accept id: (26179639, 26179873) query: Python & Numpy - create dynamic, arbitrary subsets of ndarray soup:

            You could use numpy.all and index broadcasting for this

            \n
            filter_matrix = np.array(filterColumns)\ncombination_array = np.array(combination)\nbool_matrix = filter_matrix == combination_array[newaxis, :]   #not sure of the newaxis position\nsubset = raw_data[bool_matrix]\n
            \n

            There are however simpler ways of doing the same thing if your filters are within the matrix, notably through numpy argsort and numpy roll over an axis. First you roll axes until your axes until you've ordered your filters as first columns, then you sort on them and slice the array vertically to get the rest of the matrix.

            \n

            In general if an for loop can be avoided in Python, better avoid it.

            \n

            Update:

            \n

            Here is the full code without a for loop:

            \n
            import numpy as np\n\n# select filtering indexes\nfilter_indexes = [1, 3]\n# generate the test data\nraw_data = np.random.randint(0, 4, size=(50,5))\n\n\n# create a column that we would use for indexing\nindex_columns = raw_data[:, filter_indexes]\n\n# sort the index columns by lexigraphic order over all the indexing columns\nargsorts = np.lexsort(index_columns.T)\n\n# sort both the index and the data column\nsorted_index = index_columns[argsorts, :]\nsorted_data = raw_data[argsorts, :]\n\n# in each indexing column, find if number in row and row-1 are identical\n# then group to check if all numbers in corresponding positions in row and row-1 are identical\nautocorrelation = np.all(sorted_index[1:, :] == sorted_index[:-1, :], axis=1)\n\n# find out the breakpoints: these are the positions where row and row-1 are not identical\nbreakpoints = np.nonzero(np.logical_not(autocorrelation))[0]+1\n\n# finally find the desired subsets \nsubsets = np.split(sorted_data, breakpoints)\n
            \n

            An alternative implementation would be to transform the indexing matrix into a string matrix, sum row-wise, get an argsort over the now unique indexing column and split as above.

            \n

            For conveniece, it might be more interesting to first roll the indexing matrix until they are all in the beginning of the matrix, so that the sorting done above is clear.

            \n soup wrap:

            You could use numpy.all and index broadcasting for this

            filter_matrix = np.array(filterColumns)
            combination_array = np.array(combination)
            bool_matrix = filter_matrix == combination_array[newaxis, :]   #not sure of the newaxis position
            subset = raw_data[bool_matrix]
            

            There are however simpler ways of doing the same thing if your filters are within the matrix, notably through numpy argsort and numpy roll over an axis. First you roll axes until your axes until you've ordered your filters as first columns, then you sort on them and slice the array vertically to get the rest of the matrix.

            In general if an for loop can be avoided in Python, better avoid it.

            Update:

            Here is the full code without a for loop:

            import numpy as np
            
            # select filtering indexes
            filter_indexes = [1, 3]
            # generate the test data
            raw_data = np.random.randint(0, 4, size=(50,5))
            
            
            # create a column that we would use for indexing
            index_columns = raw_data[:, filter_indexes]
            
            # sort the index columns by lexigraphic order over all the indexing columns
            argsorts = np.lexsort(index_columns.T)
            
            # sort both the index and the data column
            sorted_index = index_columns[argsorts, :]
            sorted_data = raw_data[argsorts, :]
            
            # in each indexing column, find if number in row and row-1 are identical
            # then group to check if all numbers in corresponding positions in row and row-1 are identical
            autocorrelation = np.all(sorted_index[1:, :] == sorted_index[:-1, :], axis=1)
            
            # find out the breakpoints: these are the positions where row and row-1 are not identical
            breakpoints = np.nonzero(np.logical_not(autocorrelation))[0]+1
            
            # finally find the desired subsets 
            subsets = np.split(sorted_data, breakpoints)
            

            An alternative implementation would be to transform the indexing matrix into a string matrix, sum row-wise, get an argsort over the now unique indexing column and split as above.

            For conveniece, it might be more interesting to first roll the indexing matrix until they are all in the beginning of the matrix, so that the sorting done above is clear.

            qid & accept id: (26187054, 26187202) query: Extract html cell data XPath soup:

            Try this:

            \n

            Code:

            \n
            src = """\nOPEN\n80002\n\n\nACCY\n \n\n\n2001\n\n\n\n10\nIntro Financial Accounting\n3.00\n Ray, K\nMON 113\nMW
            12:45PM - 02:00PM\n08/25/14 - 12/06/14\n\n\n"""\n\nfrom lxml import html\n\ntree = html.fromstring(src)\ntds = tree.xpath("//td/descendant-or-self::*/text()[normalize-space()]")\n\nprint ", ".join([td.strip() for td in tds])\n
            \n

            Result:

            \n
            OPEN, 80002, ACCY, 2001, 10, Intro Financial Accounting, 3.00, Ray, K, MON, 113, MW, 12:45PM - 02:00PM, 08/25/14 - 12/06/14\n[Finished in 0.5s]\n
            \n

            Note that this gets all the text from inside all td tags, including the ones from inside the child node, ie. MON.

            \n

            Cleaning the result is up to you.

            \n soup wrap:

            Try this:

            Code:

            src = """
            OPEN
            80002
            
            
            ACCY
             
            
            
            2001
            
            
            
            10
            Intro Financial Accounting
            3.00
             Ray, K
            MON 113
            MW
            12:45PM - 02:00PM 08/25/14 - 12/06/14 """ from lxml import html tree = html.fromstring(src) tds = tree.xpath("//td/descendant-or-self::*/text()[normalize-space()]") print ", ".join([td.strip() for td in tds])

            Result:

            OPEN, 80002, ACCY, 2001, 10, Intro Financial Accounting, 3.00, Ray, K, MON, 113, MW, 12:45PM - 02:00PM, 08/25/14 - 12/06/14
            [Finished in 0.5s]
            

            Note that this gets all the text from inside all td tags, including the ones from inside the child node, ie. MON.

            Cleaning the result is up to you.

            qid & accept id: (26205922, 26206622) query: Calculate weighted average using a pandas/dataframe soup:

            I think I would do this with two groupbys.

            \n

            First to calculate the "weighted average":

            \n
            In [11]: g = df.groupby('Date')\n\nIn [12]: df.value / g.value.transform("sum") * df.wt\nOut[12]:\n0    0.125000\n1    0.250000\n2    0.416667\n3    0.277778\n4    0.444444\ndtype: float64\n
            \n

            If you set this as a column, you can groupby over it:

            \n
            In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt\n
            \n

            Now the sum of this column is the desired:

            \n
            In [14]: g.wa.sum()\nOut[14]:\nDate\n01/01/2012    0.791667\n01/02/2012    0.722222\nName: wa, dtype: float64\n
            \n

            or potentially:

            \n
            In [15]: g.wa.transform("sum")\nOut[15]:\n0    0.791667\n1    0.791667\n2    0.791667\n3    0.722222\n4    0.722222\nName: wa, dtype: float64\n
            \n soup wrap:

            I think I would do this with two groupbys.

            First to calculate the "weighted average":

            In [11]: g = df.groupby('Date')
            
            In [12]: df.value / g.value.transform("sum") * df.wt
            Out[12]:
            0    0.125000
            1    0.250000
            2    0.416667
            3    0.277778
            4    0.444444
            dtype: float64
            

            If you set this as a column, you can groupby over it:

            In [13]: df['wa'] = df.value / g.value.transform("sum") * df.wt
            

            Now the sum of this column is the desired:

            In [14]: g.wa.sum()
            Out[14]:
            Date
            01/01/2012    0.791667
            01/02/2012    0.722222
            Name: wa, dtype: float64
            

            or potentially:

            In [15]: g.wa.transform("sum")
            Out[15]:
            0    0.791667
            1    0.791667
            2    0.791667
            3    0.722222
            4    0.722222
            Name: wa, dtype: float64
            
            qid & accept id: (26222720, 26222860) query: Python: checking for the existence of a variable in globals() makes it invisible in the local context soup:

            Note that when you start dynamically inspecting globals like this, people start to wonder ... with that said, here's a working version of your code that works so long as you only "read" from the global variable.

            \n
            def calledfunction():\n  default_local = 'some default'\n  var = globalvar if 'globalvar' in globals() else default_local\n  print var\n\n# -----------------\n\nprint "calling function before the variable is defined"\nprint\ncalledfunction()\n\nglobalvar = "created outside the function"\n\nprint "calling function after the variable is defined"\nprint\ncalledfunction()\n
            \n

            Note that within a function, a variable name is either global or it's local (and python3.x adds nonlocal to the bunch). However, a variable name cannot be either global or local depending on how the function is called.

            \n

            A better way is to just use keyword arguments:

            \n
            def calledfunction(var="Created within the function"):\n    print var\n\ncalledfunction()  # Created within the function\ncalledfunction(var="Created by the caller")  # Created by the caller\n
            \n

            There are some gotchas when you want to create new mutable objects this way, but they are well known and documented with work-arounds.

            \n soup wrap:

            Note that when you start dynamically inspecting globals like this, people start to wonder ... with that said, here's a working version of your code that works so long as you only "read" from the global variable.

            def calledfunction():
              default_local = 'some default'
              var = globalvar if 'globalvar' in globals() else default_local
              print var
            
            # -----------------
            
            print "calling function before the variable is defined"
            print
            calledfunction()
            
            globalvar = "created outside the function"
            
            print "calling function after the variable is defined"
            print
            calledfunction()
            

            Note that within a function, a variable name is either global or it's local (and python3.x adds nonlocal to the bunch). However, a variable name cannot be either global or local depending on how the function is called.

            A better way is to just use keyword arguments:

            def calledfunction(var="Created within the function"):
                print var
            
            calledfunction()  # Created within the function
            calledfunction(var="Created by the caller")  # Created by the caller
            

            There are some gotchas when you want to create new mutable objects this way, but they are well known and documented with work-arounds.

            qid & accept id: (26238723, 26242895) query: Largest weakly connected component in networkX soup:

            The NetworkX component functions return Python generators. You can create a list of items in the generator using the Python list function. Here is an example showing that and also finding the largest weakly connected component.

            \n
            In [1]: import networkx as nx\n\nIn [2]: G = nx.DiGraph()\n\nIn [3]: G.add_path([1,2,3,4])\n\nIn [4]: G.add_path([10,11,12])\n
            \n

            You can use e.g. list to turn the generator into a list of subgraphs:

            \n
            In [5]: list(nx.weakly_connected_component_subgraphs(G))\nOut[5]: \n[,\n ]\n
            \n

            The max operator takes a key argument which you can set to the Python function len which calls len(g) on each subgraph to compute the number of nodes. So to get the component with the largest number of nodes you can write

            \n
            In [6]: largest = max(nx.weakly_connected_component_subgraphs(G),key=len)\n\nIn [7]: largest.nodes()\nOut[7]: [1, 2, 3, 4]\n\nIn [8]: largest.edges()\nOut[8]: [(1, 2), (2, 3), (3, 4)]\n
            \n soup wrap:

            The NetworkX component functions return Python generators. You can create a list of items in the generator using the Python list function. Here is an example showing that and also finding the largest weakly connected component.

            In [1]: import networkx as nx
            
            In [2]: G = nx.DiGraph()
            
            In [3]: G.add_path([1,2,3,4])
            
            In [4]: G.add_path([10,11,12])
            

            You can use e.g. list to turn the generator into a list of subgraphs:

            In [5]: list(nx.weakly_connected_component_subgraphs(G))
            Out[5]: 
            [,
             ]
            

            The max operator takes a key argument which you can set to the Python function len which calls len(g) on each subgraph to compute the number of nodes. So to get the component with the largest number of nodes you can write

            In [6]: largest = max(nx.weakly_connected_component_subgraphs(G),key=len)
            
            In [7]: largest.nodes()
            Out[7]: [1, 2, 3, 4]
            
            In [8]: largest.edges()
            Out[8]: [(1, 2), (2, 3), (3, 4)]
            
            qid & accept id: (26240228, 26240438) query: how to join multiple sorted files in Python alphabetically? soup:

            You can use heapq.merge:

            \n
            import heapq\nimport contextlib\n\nfiles = [open(fn) for fn in inFiles]\nwith contextlib.nested(*files):\n    with open('output', 'w') as f:\n        f.writelines(heapq.merge(*files))\n
            \n

            In Python 3.x (3.3+):

            \n
            import heapq\nimport contextlib\n\nwith contextlib.ExitStack() as stack:\n    files = [stack.enter_context(open(fn)) for fn in inFiles]\n    with open('output', 'w') as f:\n        f.writelines(heapq.merge(*files))\n
            \n soup wrap:

            You can use heapq.merge:

            import heapq
            import contextlib
            
            files = [open(fn) for fn in inFiles]
            with contextlib.nested(*files):
                with open('output', 'w') as f:
                    f.writelines(heapq.merge(*files))
            

            In Python 3.x (3.3+):

            import heapq
            import contextlib
            
            with contextlib.ExitStack() as stack:
                files = [stack.enter_context(open(fn)) for fn in inFiles]
                with open('output', 'w') as f:
                    f.writelines(heapq.merge(*files))
            
            qid & accept id: (26265015, 26476526) query: How to get console output printed using kivy soup:

            Here I how I went about getting console command output.

            \n

            The python code first:

            \n
                from kivy.app import App\n    from kivy.uix.boxlayout import BoxLayout\n    from kivy.uix.popup import Popup\n    from kivy.properties import ObjectProperty\n    from kivy.uix.label import Label \n    import subprocess\n\n    class shellcommand(BoxLayout):\n        first=ObjectProperty()\n        second=ObjectProperty()\n        third=ObjectProperty()\n\n        def uname(self):\n            v=subprocess.check_output("uname -a",shell=True)\n            result=Popup(title="RESULT",content=Label(text="kernel is\n" + v))\n            result.open()\n        def date(self):\n            d=subprocess.check_output("date",shell=True)\n            res=Popup(title="DATE",content=Label(text="the date today is\n" + d))\n            res.open()\n        def last(self):\n            last=subprocess.check_output("w",shell=True)\n            ls=Popup(title="LOGIN",content=Label(text="logged in \n" + last))\n            ls.open()\n\n\n    class shellApp(App):\n        def build(self):\n            return shellcommand()\n\n    shellApp().run()\n
            \n

            And then the kivy file named shellapp.kv

            \n
            :\norientation: "vertical"\nfirst:one\nsecond:two\nthird:three\ncanvas:\n    Rectangle:\n        source: "snaps.png" #location of any picture\n        pos: self.pos\n        size: self.size\n\n\n\nBoxLayout:\n    orientation: "horizontal"\n    Button:\n        id:one\n        text: "UNAME"\n        background_color: 0,0,0,1\n        font_size:32\n        size_hint:1,None\n        on_press: root.uname()\n\n\n    Button:\n        id:two      \n        text: "DATE"\n        background_color: 1,1.5,0,1\n        font_size:32\n        size_hint:1,None\n        on_press: root.date()\n\n\n    Button:\n        id: three\n        text: "LOGGED IN"\n        background_color: 1,0,0,1\n        font_size:32\n        size_hint: 1,None\n        on_press: root.last()\n
            \n

            If there is a way to improve this code please let Me know how to.Thanks

            \n soup wrap:

            Here I how I went about getting console command output.

            The python code first:

                from kivy.app import App
                from kivy.uix.boxlayout import BoxLayout
                from kivy.uix.popup import Popup
                from kivy.properties import ObjectProperty
                from kivy.uix.label import Label 
                import subprocess
            
                class shellcommand(BoxLayout):
                    first=ObjectProperty()
                    second=ObjectProperty()
                    third=ObjectProperty()
            
                    def uname(self):
                        v=subprocess.check_output("uname -a",shell=True)
                        result=Popup(title="RESULT",content=Label(text="kernel is\n" + v))
                        result.open()
                    def date(self):
                        d=subprocess.check_output("date",shell=True)
                        res=Popup(title="DATE",content=Label(text="the date today is\n" + d))
                        res.open()
                    def last(self):
                        last=subprocess.check_output("w",shell=True)
                        ls=Popup(title="LOGIN",content=Label(text="logged in \n" + last))
                        ls.open()
            
            
                class shellApp(App):
                    def build(self):
                        return shellcommand()
            
                shellApp().run()
            

            And then the kivy file named shellapp.kv

            :
            orientation: "vertical"
            first:one
            second:two
            third:three
            canvas:
                Rectangle:
                    source: "snaps.png" #location of any picture
                    pos: self.pos
                    size: self.size
            
            
            
            BoxLayout:
                orientation: "horizontal"
                Button:
                    id:one
                    text: "UNAME"
                    background_color: 0,0,0,1
                    font_size:32
                    size_hint:1,None
                    on_press: root.uname()
            
            
                Button:
                    id:two      
                    text: "DATE"
                    background_color: 1,1.5,0,1
                    font_size:32
                    size_hint:1,None
                    on_press: root.date()
            
            
                Button:
                    id: three
                    text: "LOGGED IN"
                    background_color: 1,0,0,1
                    font_size:32
                    size_hint: 1,None
                    on_press: root.last()
            

            If there is a way to improve this code please let Me know how to.Thanks

            qid & accept id: (26277322, 26311049) query: passing arrays with ctypes soup:

            The first argument's type is POINTER(POINTER(c_int16)) not POINTER(ARRAY(c_int16,size)).

            \n

            Here's a short example:

            \n

            x.c (compiled with cl /LD x.c:

            \n
            #include \n#include \n__declspec(dllexport) void read(int16_t** input, size_t size)\n{\n  int i;\n  int16_t* p = (int16_t*) malloc (size*sizeof(int16_t));\n  for(i=0;i
            \n

            x.py

            \n
            from ctypes import *\nx = CDLL('x')\nx.read.argtypes = [POINTER(POINTER(c_int16))]\nx.read.restype = None\nx.release.argtypes = [POINTER(c_int16)]\nx.release.restype = None\np = POINTER(c_int16)()\nx.read(p,5)\nfor i in range(5):\n    print(p[i])\nx.release(p)\n
            \n

            Output:

            \n
            0\n1\n2\n3\n4\n
            \n

            Note this leaves you with potential memory leak if you don't remember to free the malloc. A better way would be to allocate the buffer in Python and tell the C function the size:

            \n

            x.c

            \n
            #include \n#include \n__declspec(dllexport) void read(int16_t* input, size_t size)\n{\n  int i;\n  for(i=0;i
            \n

            x.py

            \n
            from ctypes import *\nx = CDLL('x')\nx.read.argtypes = [POINTER(c_int16)]\nx.read.restype = None\np = (c_int16*5)()\nx.read(p,len(p))\nprint(list(p))\n
            \n

            Output

            \n
            [0, 1, 2, 3, 4]\n
            \n soup wrap:

            The first argument's type is POINTER(POINTER(c_int16)) not POINTER(ARRAY(c_int16,size)).

            Here's a short example:

            x.c (compiled with cl /LD x.c:

            #include 
            #include 
            __declspec(dllexport) void read(int16_t** input, size_t size)
            {
              int i;
              int16_t* p = (int16_t*) malloc (size*sizeof(int16_t));
              for(i=0;i

            x.py

            from ctypes import *
            x = CDLL('x')
            x.read.argtypes = [POINTER(POINTER(c_int16))]
            x.read.restype = None
            x.release.argtypes = [POINTER(c_int16)]
            x.release.restype = None
            p = POINTER(c_int16)()
            x.read(p,5)
            for i in range(5):
                print(p[i])
            x.release(p)
            

            Output:

            0
            1
            2
            3
            4
            

            Note this leaves you with potential memory leak if you don't remember to free the malloc. A better way would be to allocate the buffer in Python and tell the C function the size:

            x.c

            #include 
            #include 
            __declspec(dllexport) void read(int16_t* input, size_t size)
            {
              int i;
              for(i=0;i

            x.py

            from ctypes import *
            x = CDLL('x')
            x.read.argtypes = [POINTER(c_int16)]
            x.read.restype = None
            p = (c_int16*5)()
            x.read(p,len(p))
            print(list(p))
            

            Output

            [0, 1, 2, 3, 4]
            
            qid & accept id: (26279903, 26279972) query: Addition of multiple arrays in python soup:

            Stick with Numpy array and use its sum() method:

            \n
            >>> arr = np.array([[1,2,3,5,4,3], \n          [5,7,2,4,6,7],\n          [3,6,2,4,5,9]])\n>>> arr.sum(axis=0)\narray([ 9, 15,  7, 13, 15, 19])\n
            \n

            Of course you can do it with Python lists as well but it is going to be slow:

            \n
            >>> lst = [[1,2,3,5,4,3], \n          [5,7,2,4,6,7],\n          [3,6,2,4,5,9]]\n>>> map(sum, zip(*lst))\n[9, 15, 7, 13, 15, 19]\n
            \n soup wrap:

            Stick with Numpy array and use its sum() method:

            >>> arr = np.array([[1,2,3,5,4,3], 
                      [5,7,2,4,6,7],
                      [3,6,2,4,5,9]])
            >>> arr.sum(axis=0)
            array([ 9, 15,  7, 13, 15, 19])
            

            Of course you can do it with Python lists as well but it is going to be slow:

            >>> lst = [[1,2,3,5,4,3], 
                      [5,7,2,4,6,7],
                      [3,6,2,4,5,9]]
            >>> map(sum, zip(*lst))
            [9, 15, 7, 13, 15, 19]
            
            qid & accept id: (26286980, 26287045) query: create sublists within sublists in python soup:

            You can define your sublisting action as a function and apply it twice. This is probably inefficient since it will construct intermediate lists before constructing ones with the finest level of sublisting. But, is is easier to read since you're already familiar with the first step given that you used it when asking this question.

            \n
            def nest_in_pairs(original):\n    return [original[i:i+2] for i in range(0,len(original),2)]\n\nprint nest_in_pairs(nest_in_pairs(original))\n
            \n

            A more efficient way to do it would be to create a generator that yields a list of up to two items from the front of the list. Then chain them.

            \n
            from types import GeneratorType\n\ndef yield_next_two(seq):\n    if not isinstance(seq, GeneratorType):\n        for i in range(0, len(seq), 2):\n            yield seq[i:i+2]\n    else:\n        while True:\n            item1 = next(seq)\n            try:\n                item2 = next(seq)\n                yield [item1, item2]\n            except StopIteration:\n                yield [item1]\n\npair_generator = yield_next_two(original)\n\nquad_generator = yield_next_two(yield_next_two(original))\n\nnext(pair_generator)\n\nnext(quad_generator)\n
            \n

            and you can call list on pair_generator or quad_generator to get the whole set of contents.

            \n

            Here's an example of playing with this after pasting the above code into an IPython session:

            \n
            In [40]: quad_generator = yield_next_two(yield_next_two(original))\n\nIn [41]: list(quad_generator)\nOut[41]: [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]]\n\nIn [42]: nearly_eights_generator = yield_next_two(yield_next_two(yield_next_two(original)))\n\nIn [43]: list(nearly_eights_generator)\nOut[43]: [[[[1, 2], [3, 4]], [[5, 6], [7, 8]]], [[[9, 10], [11, 12]]]]\n
            \n soup wrap:

            You can define your sublisting action as a function and apply it twice. This is probably inefficient since it will construct intermediate lists before constructing ones with the finest level of sublisting. But, is is easier to read since you're already familiar with the first step given that you used it when asking this question.

            def nest_in_pairs(original):
                return [original[i:i+2] for i in range(0,len(original),2)]
            
            print nest_in_pairs(nest_in_pairs(original))
            

            A more efficient way to do it would be to create a generator that yields a list of up to two items from the front of the list. Then chain them.

            from types import GeneratorType
            
            def yield_next_two(seq):
                if not isinstance(seq, GeneratorType):
                    for i in range(0, len(seq), 2):
                        yield seq[i:i+2]
                else:
                    while True:
                        item1 = next(seq)
                        try:
                            item2 = next(seq)
                            yield [item1, item2]
                        except StopIteration:
                            yield [item1]
            
            pair_generator = yield_next_two(original)
            
            quad_generator = yield_next_two(yield_next_two(original))
            
            next(pair_generator)
            
            next(quad_generator)
            

            and you can call list on pair_generator or quad_generator to get the whole set of contents.

            Here's an example of playing with this after pasting the above code into an IPython session:

            In [40]: quad_generator = yield_next_two(yield_next_two(original))
            
            In [41]: list(quad_generator)
            Out[41]: [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]]
            
            In [42]: nearly_eights_generator = yield_next_two(yield_next_two(yield_next_two(original)))
            
            In [43]: list(nearly_eights_generator)
            Out[43]: [[[[1, 2], [3, 4]], [[5, 6], [7, 8]]], [[[9, 10], [11, 12]]]]
            
            qid & accept id: (26317418, 26317611) query: Reading text file and returning most popular name for that year soup:

            Here is how you could do (here it is with 2.7.8, I do not have 3.x on this machine):

            \n
            from collections import defaultdict, Counter\n\ndata = '''-,-,1970,John,-\n-,-,1970,John,-\n-,-,1970,Paul,-\n-,-,2014,Bob,-\n-,-,2014,Mary,-\n-,-,2014,Mary,-'''\n\ntemp = defaultdict(list)\n\nfor record in (line.split(',') for line in data.splitlines()):\n  y = record[2]\n  n = record[3]\n  temp[y].append(n)\n\nresults = [(k, Counter(v).most_common(1)) for k,v in temp.items()]\n
            \n

            [('2014', [('Mary', 2)]), ('1970', [('John', 2)])]

            \n
            for year,r in results:\n  if int(year) in valid:\n   print('In {0} the name {1} occured the most ({2} times)'.format(year, r[0][0], r[0][1]))\n
            \n

            In 2014 the name Mary occured the most (2 times)

            \n

            In 1970 the name John occured the most (2 times)

            \n soup wrap:

            Here is how you could do (here it is with 2.7.8, I do not have 3.x on this machine):

            from collections import defaultdict, Counter
            
            data = '''-,-,1970,John,-
            -,-,1970,John,-
            -,-,1970,Paul,-
            -,-,2014,Bob,-
            -,-,2014,Mary,-
            -,-,2014,Mary,-'''
            
            temp = defaultdict(list)
            
            for record in (line.split(',') for line in data.splitlines()):
              y = record[2]
              n = record[3]
              temp[y].append(n)
            
            results = [(k, Counter(v).most_common(1)) for k,v in temp.items()]
            

            [('2014', [('Mary', 2)]), ('1970', [('John', 2)])]

            for year,r in results:
              if int(year) in valid:
               print('In {0} the name {1} occured the most ({2} times)'.format(year, r[0][0], r[0][1]))
            

            In 2014 the name Mary occured the most (2 times)

            In 1970 the name John occured the most (2 times)

            qid & accept id: (26321557, 26321599) query: Beautiful Soup - Class contains 'a' and not contains 'b' soup:

            Following will find every tr tag with viewLicense

            \n
            soup.find_all("tr", class_="viewLicense")\n
            \n

            So, it will work for the text provided in quesiton:

            \n
            >>> soup.find_all("tr", class_="viewLicense")\n[, ]\n
            \n

            However if you have a tr tag which has both viewLicense and viewLicenseDetails classes, then following will find all tr tags with viewLicense and then remove tags with viewLicenseDetails:

            \n
            >>> both_tags = soup.find_all("tr", class_="viewLicense")\n>>> for tag in both_tags:\n...     if 'viewLicenseDetails' not in tag.attrs['class']:\n...             print tag\n
            \n soup wrap:

            Following will find every tr tag with viewLicense

            soup.find_all("tr", class_="viewLicense")
            

            So, it will work for the text provided in quesiton:

            >>> soup.find_all("tr", class_="viewLicense")
            [, ]
            

            However if you have a tr tag which has both viewLicense and viewLicenseDetails classes, then following will find all tr tags with viewLicense and then remove tags with viewLicenseDetails:

            >>> both_tags = soup.find_all("tr", class_="viewLicense")
            >>> for tag in both_tags:
            ...     if 'viewLicenseDetails' not in tag.attrs['class']:
            ...             print tag
            
            qid & accept id: (26427128, 26429390) query: How to make POS n-grams more effective? soup:

            Using a server with these specs from inxi -C:

            \n
            CPU(s): 2 Hexa core Intel Xeon CPU E5-2430 v2s (-HT-MCP-SMP-) cache: 30720 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) \nClock Speeds: 1: 2500.036 MHz\n
            \n

            Normally, the canonical answer is to use batch tagging with pos_tag_sents but it doesn't seem that it's faster.

            \n

            Let's try to profile some of the steps before you get the POS tags (using just 1 core):

            \n
            import time\n\nfrom nltk.corpus import brown\nfrom nltk import sent_tokenize, word_tokenize, pos_tag\nfrom nltk import pos_tag_sents\n\n# Load brown corpus\nstart = time.time()\nbrown_corpus = brown.raw()\nloading_time = time.time() - start\nprint "Loading brown corpus took",  loading_time\n\n# Sentence tokenizing corpus\nstart = time.time()\nbrown_sents = sent_tokenize(brown_corpus)\nsent_time = time.time() - start\nprint "Sentence tokenizing corpus took", sent_time\n\n\n# Word tokenizing corpus\nstart = time.time()\nbrown_words = [word_tokenize(i) for i in brown_sents]\nword_time = time.time() - start\nprint "Word tokenizing corpus took", word_time\n\n# Loading, sent_tokenize, word_tokenize all together.\nstart = time.time()\nbrown_words = [word_tokenize(s) for s in sent_tokenize(brown.raw())]\ntokenize_time = time.time() - start\nprint "Loading and tokenizing corpus took", tokenize_time\n\n# POS tagging one sentence at a time took.\nstart = time.time()\nbrown_tagged = [pos_tag(word_tokenize(s)) for s in sent_tokenize(brown.raw())]\ntagging_time = time.time() - start\nprint "Tagging sentence by sentence took", tagging_time\n\n\n# Using batch_pos_tag.\nstart = time.time()\nbrown_tagged = pos_tag_sents([word_tokenize(s) for s in sent_tokenize(brown.raw())])\ntagging_time = time.time() - start\nprint "Tagging sentences by batch took", tagging_time\n
            \n

            [out]:

            \n
            Loading brown corpus took 0.154870033264\nSentence tokenizing corpus took 3.77206301689\nWord tokenizing corpus took 13.982845068\nLoading and tokenizing corpus took 17.8847839832\nTagging sentence by sentence took 1114.65085101\nTagging sentences by batch took 1104.63432097\n
            \n

            Note: that the pos_tag_sents was previously called batch_pos_tag in version before NLTK3.0

            \n

            In conclusion, i think you would need to consider other POS tagger to preprocess your data or you have to use threading to handle the POS tags.

            \n soup wrap:

            Using a server with these specs from inxi -C:

            CPU(s): 2 Hexa core Intel Xeon CPU E5-2430 v2s (-HT-MCP-SMP-) cache: 30720 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) 
            Clock Speeds: 1: 2500.036 MHz
            

            Normally, the canonical answer is to use batch tagging with pos_tag_sents but it doesn't seem that it's faster.

            Let's try to profile some of the steps before you get the POS tags (using just 1 core):

            import time
            
            from nltk.corpus import brown
            from nltk import sent_tokenize, word_tokenize, pos_tag
            from nltk import pos_tag_sents
            
            # Load brown corpus
            start = time.time()
            brown_corpus = brown.raw()
            loading_time = time.time() - start
            print "Loading brown corpus took",  loading_time
            
            # Sentence tokenizing corpus
            start = time.time()
            brown_sents = sent_tokenize(brown_corpus)
            sent_time = time.time() - start
            print "Sentence tokenizing corpus took", sent_time
            
            
            # Word tokenizing corpus
            start = time.time()
            brown_words = [word_tokenize(i) for i in brown_sents]
            word_time = time.time() - start
            print "Word tokenizing corpus took", word_time
            
            # Loading, sent_tokenize, word_tokenize all together.
            start = time.time()
            brown_words = [word_tokenize(s) for s in sent_tokenize(brown.raw())]
            tokenize_time = time.time() - start
            print "Loading and tokenizing corpus took", tokenize_time
            
            # POS tagging one sentence at a time took.
            start = time.time()
            brown_tagged = [pos_tag(word_tokenize(s)) for s in sent_tokenize(brown.raw())]
            tagging_time = time.time() - start
            print "Tagging sentence by sentence took", tagging_time
            
            
            # Using batch_pos_tag.
            start = time.time()
            brown_tagged = pos_tag_sents([word_tokenize(s) for s in sent_tokenize(brown.raw())])
            tagging_time = time.time() - start
            print "Tagging sentences by batch took", tagging_time
            

            [out]:

            Loading brown corpus took 0.154870033264
            Sentence tokenizing corpus took 3.77206301689
            Word tokenizing corpus took 13.982845068
            Loading and tokenizing corpus took 17.8847839832
            Tagging sentence by sentence took 1114.65085101
            Tagging sentences by batch took 1104.63432097
            

            Note: that the pos_tag_sents was previously called batch_pos_tag in version before NLTK3.0

            In conclusion, i think you would need to consider other POS tagger to preprocess your data or you have to use threading to handle the POS tags.

            qid & accept id: (26430002, 26430091) query: how to goup items in a list of dictionaries by matching values ​​in python soup:

            You can use itertools.groupby. You can construct a lambda expression that looks for the value corresponding to the 'command' key, then finds the [1] and [2] elements of splitting on the ';' character.

            \n
            d =[{'name': 'fire', 'command': '1;2;3;4'},\n    {'name': 'brain', 'command': '2;2;3;4'},\n    {'name': 'word', 'command': '1;3;4;5'},\n    {'name': 'cellphone', 'command': '6;1;3;4'},\n    {'name': 'ocean', 'command': '9;3;7;4'}]\n\nimport itertools\ngroups = itertools.groupby(d, lambda i: i['command'].split(';')[1:3])\n\nfor key, group in groups:\n    print(list(group))\n
            \n

            Output

            \n
            [{'name': 'fire', 'command': '1;2;3;4'}, {'name': 'brain', 'command': '2;2;3;4'}]\n[{'name': 'word', 'command': '1;3;4;5'}]\n[{'name': 'cellphone', 'command': '6;1;3;4'}]\n[{'name': 'ocean', 'command': '9;3;7;4'}]\n
            \n

            To find groups that had more than one member, you need one more step:

            \n
            for key, group in groups:\n    groupList = list(group)\n    if len(groupList) > 1:\n        print(groupList)\n\n[{'command': '1;2;3;4', 'name': 'fire'}, {'command': '2;2;3;4', 'name': 'brain'}]\n
            \n soup wrap:

            You can use itertools.groupby. You can construct a lambda expression that looks for the value corresponding to the 'command' key, then finds the [1] and [2] elements of splitting on the ';' character.

            d =[{'name': 'fire', 'command': '1;2;3;4'},
                {'name': 'brain', 'command': '2;2;3;4'},
                {'name': 'word', 'command': '1;3;4;5'},
                {'name': 'cellphone', 'command': '6;1;3;4'},
                {'name': 'ocean', 'command': '9;3;7;4'}]
            
            import itertools
            groups = itertools.groupby(d, lambda i: i['command'].split(';')[1:3])
            
            for key, group in groups:
                print(list(group))
            

            Output

            [{'name': 'fire', 'command': '1;2;3;4'}, {'name': 'brain', 'command': '2;2;3;4'}]
            [{'name': 'word', 'command': '1;3;4;5'}]
            [{'name': 'cellphone', 'command': '6;1;3;4'}]
            [{'name': 'ocean', 'command': '9;3;7;4'}]
            

            To find groups that had more than one member, you need one more step:

            for key, group in groups:
                groupList = list(group)
                if len(groupList) > 1:
                    print(groupList)
            
            [{'command': '1;2;3;4', 'name': 'fire'}, {'command': '2;2;3;4', 'name': 'brain'}]
            
            qid & accept id: (26450673, 26455626) query: sklearn decomposition top terms soup:

            Assuming lsa = TruncatedSVD(n_components=k) for some k, the obvious way to get term weights makes use of the fact that LSA/SVD is a linear transformation, i.e., each row of lsa.components_ is a weighted sum of the input terms, and you can multiply that with the cluster centroids from k-means.

            \n

            Let's set some things up and train some models:

            \n
            >>> from sklearn.datasets import fetch_20newsgroups\n>>> from sklearn.feature_extraction.text import TfidfVectorizer\n>>> from sklearn.cluster import KMeans\n>>> from sklearn.decomposition import TruncatedSVD\n>>> data = fetch_20newsgroups()\n>>> vectorizer = TfidfVectorizer(min_df=3, max_df=.95, stop_words='english')\n>>> lsa = TruncatedSVD(n_components=10)\n>>> km = KMeans(n_clusters=3)\n>>> X = vectorizer.fit_transform(data.data)\n>>> X_lsa = lsa.fit_transform(X)\n>>> km.fit(X_lsa)\n
            \n

            Now multiply the LSA components and the k-means centroids:

            \n
            >>> X.shape\n(11314, 38865)\n>>> lsa.components_.shape\n(10, 38865)\n>>> km.cluster_centers_.shape\n(3, 10)\n>>> weights = np.dot(km.cluster_centers_, lsa.components_)\n>>> weights.shape\n(3, 38865)\n
            \n

            Then print; we need absolute values for the weights because of the sign indeterminacy in LSA:

            \n
            >>> features = vectorizer.get_feature_names()\n>>> weights = np.abs(weights)\n>>> for i in range(km.n_clusters):\n...     top5 = np.argsort(weights[i])[-5:]\n...     print(zip([features[j] for j in top5], weights[i, top5]))\n...     \n[(u'escrow', 0.042965734662740895), (u'chip', 0.07227072329320372), (u'encryption', 0.074855609122467345), (u'clipper', 0.075661844826553887), (u'key', 0.095064798549230306)]\n[(u'posting', 0.012893125486957332), (u'article', 0.013105911161236845), (u'university', 0.0131617377000081), (u'com', 0.023016036009601809), (u'edu', 0.034532489348082958)]\n[(u'don', 0.02087448155525683), (u'com', 0.024327099321009758), (u'people', 0.033365757270264217), (u'edu', 0.036318114826463417), (u'god', 0.042203130080860719)]\n
            \n

            Mind you, you really need a stop word filter for this to work. The stop words tend to end up in every single component, and get a high weight in every cluster centroid.

            \n soup wrap:

            Assuming lsa = TruncatedSVD(n_components=k) for some k, the obvious way to get term weights makes use of the fact that LSA/SVD is a linear transformation, i.e., each row of lsa.components_ is a weighted sum of the input terms, and you can multiply that with the cluster centroids from k-means.

            Let's set some things up and train some models:

            >>> from sklearn.datasets import fetch_20newsgroups
            >>> from sklearn.feature_extraction.text import TfidfVectorizer
            >>> from sklearn.cluster import KMeans
            >>> from sklearn.decomposition import TruncatedSVD
            >>> data = fetch_20newsgroups()
            >>> vectorizer = TfidfVectorizer(min_df=3, max_df=.95, stop_words='english')
            >>> lsa = TruncatedSVD(n_components=10)
            >>> km = KMeans(n_clusters=3)
            >>> X = vectorizer.fit_transform(data.data)
            >>> X_lsa = lsa.fit_transform(X)
            >>> km.fit(X_lsa)
            

            Now multiply the LSA components and the k-means centroids:

            >>> X.shape
            (11314, 38865)
            >>> lsa.components_.shape
            (10, 38865)
            >>> km.cluster_centers_.shape
            (3, 10)
            >>> weights = np.dot(km.cluster_centers_, lsa.components_)
            >>> weights.shape
            (3, 38865)
            

            Then print; we need absolute values for the weights because of the sign indeterminacy in LSA:

            >>> features = vectorizer.get_feature_names()
            >>> weights = np.abs(weights)
            >>> for i in range(km.n_clusters):
            ...     top5 = np.argsort(weights[i])[-5:]
            ...     print(zip([features[j] for j in top5], weights[i, top5]))
            ...     
            [(u'escrow', 0.042965734662740895), (u'chip', 0.07227072329320372), (u'encryption', 0.074855609122467345), (u'clipper', 0.075661844826553887), (u'key', 0.095064798549230306)]
            [(u'posting', 0.012893125486957332), (u'article', 0.013105911161236845), (u'university', 0.0131617377000081), (u'com', 0.023016036009601809), (u'edu', 0.034532489348082958)]
            [(u'don', 0.02087448155525683), (u'com', 0.024327099321009758), (u'people', 0.033365757270264217), (u'edu', 0.036318114826463417), (u'god', 0.042203130080860719)]
            

            Mind you, you really need a stop word filter for this to work. The stop words tend to end up in every single component, and get a high weight in every cluster centroid.

            qid & accept id: (26453595, 26454180) query: Build slice objetcs from subscript notation soup:

            If you must use subscript notation, then your current solution is the most compact besides maybe a dynamic class created with type:

            \n
            >>> Slice = type('', (), {'__getitem__': lambda _, x: x})()\n>>> Slice[1:2]\nslice(1, 2, None)\n>>> Slice[1:2:3]\nslice(1, 2, 3)\n>>>\n
            \n

            But code like this is usually hard to understand/maintain/extend/etc.

            \n

            Instead, I would recommend that you use slice, which allows you to create slice objects directly:

            \n
            >>> slice(1, 2)\nslice(1, 2, None)\n>>> slice(1, 2, 3)\nslice(1, 2, 3)\n>>>\n
            \n

            The built-in was made specifically for this purpose (well, that and a few others such as type-checking with isinstance) and is therefore very portable as well as pythonic.

            \n soup wrap:

            If you must use subscript notation, then your current solution is the most compact besides maybe a dynamic class created with type:

            >>> Slice = type('', (), {'__getitem__': lambda _, x: x})()
            >>> Slice[1:2]
            slice(1, 2, None)
            >>> Slice[1:2:3]
            slice(1, 2, 3)
            >>>
            

            But code like this is usually hard to understand/maintain/extend/etc.

            Instead, I would recommend that you use slice, which allows you to create slice objects directly:

            >>> slice(1, 2)
            slice(1, 2, None)
            >>> slice(1, 2, 3)
            slice(1, 2, 3)
            >>>
            

            The built-in was made specifically for this purpose (well, that and a few others such as type-checking with isinstance) and is therefore very portable as well as pythonic.

            qid & accept id: (26487617, 26500132) query: Managing Processes from Python multiprocessing module soup:

            You're misunderstanding the way apply_async works. It doesn't call the function you pass to it in every process in the Pool. It just calls the function one time, in one of the worker processes. So the results you're seeing are to be expected. You have a couple of options to get the behavior you want:

            \n
            from multiprocessing import Pool                                                                                   \nimport time\nimport random\n\nSOME_LIST = []\n\ndef myfunc():\n    a = random.randint(0,3)\n    time.sleep(a)\n    return a\n\ndef cb(retval):\n    SOME_LIST.append(retval)\n\nprint("Starting...")\n\np = Pool(processes=8)\nfor _ in range(p._processes):\n    p.apply_async(myfunc, callback=cb)\np.close()\np.join()\n\nprint("Stopping...")\nprint(SOME_LIST)\n
            \n

            Or

            \n
            from multiprocessing import Pool                                                                                      \nimport time\nimport random\n\n\ndef myfunc():\n    a = random.randint(0,3)\n    time.sleep(a)\n    return a\n\nprint("Starting...")\n\np = Pool(processes=8)\nSOME_LIST = p.map(myfunc, range(p._processes))\np.close()\np.join()\n\nprint("Stopping...")\nprint(SOME_LIST)\n
            \n

            Note that you could also call apply_async or map for more than the number of processes in the pool. The idea of the Pool is that it guarantees exactly num_processes processes will be running for the entire lifetime of the Pool, no matter how many tasks you submit. So if you create a Pool(8) and call apply_async once, one of your eight workers will get a task, and the other seven will be idle. If you create a Pool(8) and call apply_async 80 times, the 80 tasks will get distributed to your eight workers, with no more than eight of the tasks actually being processed at once.

            \n soup wrap:

            You're misunderstanding the way apply_async works. It doesn't call the function you pass to it in every process in the Pool. It just calls the function one time, in one of the worker processes. So the results you're seeing are to be expected. You have a couple of options to get the behavior you want:

            from multiprocessing import Pool                                                                                   
            import time
            import random
            
            SOME_LIST = []
            
            def myfunc():
                a = random.randint(0,3)
                time.sleep(a)
                return a
            
            def cb(retval):
                SOME_LIST.append(retval)
            
            print("Starting...")
            
            p = Pool(processes=8)
            for _ in range(p._processes):
                p.apply_async(myfunc, callback=cb)
            p.close()
            p.join()
            
            print("Stopping...")
            print(SOME_LIST)
            

            Or

            from multiprocessing import Pool                                                                                      
            import time
            import random
            
            
            def myfunc():
                a = random.randint(0,3)
                time.sleep(a)
                return a
            
            print("Starting...")
            
            p = Pool(processes=8)
            SOME_LIST = p.map(myfunc, range(p._processes))
            p.close()
            p.join()
            
            print("Stopping...")
            print(SOME_LIST)
            

            Note that you could also call apply_async or map for more than the number of processes in the pool. The idea of the Pool is that it guarantees exactly num_processes processes will be running for the entire lifetime of the Pool, no matter how many tasks you submit. So if you create a Pool(8) and call apply_async once, one of your eight workers will get a task, and the other seven will be idle. If you create a Pool(8) and call apply_async 80 times, the 80 tasks will get distributed to your eight workers, with no more than eight of the tasks actually being processed at once.

            qid & accept id: (26497396, 26497481) query: How to change a DateTimeIndex in a pandas dataframe to all the same year? soup:
            import pandas as pd\ndf = pd.read_table('data', sep='\s{2,}').set_index('observation_date')\ndf.index = pd.DatetimeIndex(df.index)\ndf.index = df.index + pd.DateOffset(year=2013)\nprint(df)\n
            \n

            yields

            \n
                         Charge 1  Charge 2\n2013-01-31  35.535318  0.073390\n2013-02-28  27.685739  0.050302\n2013-01-31  27.671290  0.296882\n2013-02-28  26.647262  0.225714\n2013-03-31  21.495699  0.362151\n
            \n soup wrap:
            import pandas as pd
            df = pd.read_table('data', sep='\s{2,}').set_index('observation_date')
            df.index = pd.DatetimeIndex(df.index)
            df.index = df.index + pd.DateOffset(year=2013)
            print(df)
            

            yields

                         Charge 1  Charge 2
            2013-01-31  35.535318  0.073390
            2013-02-28  27.685739  0.050302
            2013-01-31  27.671290  0.296882
            2013-02-28  26.647262  0.225714
            2013-03-31  21.495699  0.362151
            
            qid & accept id: (26498100, 26558290) query: How to Define Google Endpoints API File Download Message Endpoint soup:

            f you use the blobstore use the get_serving_url function to read the images from url in the client, or use the messages.ByteField in the ResourceContainer and serialize the image with base64.b64decode

            \n
            #the returned class\nclass Img(messages.Message):\n     message = messages.BytesField (1)\n\n#The api class\n@endpoints.api(name='helloImg', version='v1')\nclass HelloImgApi(remote.Service):\n\nID_RESOURCE = endpoints.ResourceContainer(\n        message_types.VoidMessage,\n        id=messages.StringField(1, variant=messages.Variant.STRING))\n\n@endpoints.method(ID_RESOURCE, Img,\n                  path='serveimage/{id}', http_method='GET', #ID is the blobstore key\n                  name='greetings.getImage')\ndef image_get(self, request):\n    try:\n        blob_reader = blobstore.BlobReader(blob_key)\n        value = blob_reader.read()\n\n        return Img(message=value)\n    except:\n        raise endpoints.NotFoundException('image %s not found.' %\n                                          (request.id,))        \n\nAPPLICATION = endpoints.api_server([HelloImgApi])\n
            \n

            And this is the response (save it in the client with the proper format)

            \n
            {\n  "message": "/9j/4AAQSkZJRgABAQAAAQABAAD//gA+Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2NjIpLCBkZWZhdWx0IHF1YWxpdHkK/9sAQwAIBgYHBgUIBwcHCQkICgwUDQwLCwwZEhMPFB0aHx4dGhwcICQuJyAiLCMcHCg3KSwwMTQ0NB8nOT04MjwuMzQy/9sAQwEJCQkMCwwYDQ0YMiEcITIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy/8AAEQgBZwKAAwEiAAIRAQMRAf/EAB8AAAEFAQEBAQEBAAAAAAAAAAABAgMEBQYHCAkKC//EALUQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5+v/EAB8BAAMBAQEBAQEBAQEAAAAAAAABAgMEBQYHCAkKC//EALURAAIBAgQEAwQHBQQEAAECdwABAgMRBAUhMQYSQVEHYXETIjKBCBRCkaGxwQkjM1LwFWJy0QoWJDThJfEXGBkaJicoKSo1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uLj5OXm5+jp6vLz9PX29/j5+v/aAAwDAQACEQMRAD8A9/ooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACkpaKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAoopKACkZgilmIAHUntWTr3iTTfD1qZr2YBiPkjHLN9BXjniTxzqniSRreItb2ZOBDGeWH+0e/wBK5a+KhRWu/Y48TjadBa6vsei6z8TNF0udreDzL2VeD5ONgPpuP9M1zF18U9VuVZrWzt7OH/npKS5/DpmvP9kVqMyYkl/uDoPrUEs0kzbnbPoOwryZ46tPZ2PDqZlXm9HZHp/g74g3V54gFlqU2+Kf5Y3YBcN+HTNeqCvlqORopFkRiroQykdiK+k/D+oDVNBsrwHJkiUn645rvwFdzTjLdHpZZiZVIuEndo06KKK9E9YKKKKACikpaACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACkpaKAEoopaACkpaKACkpaKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigApKWoLu7gsrZ7i5lSKJBlnc4AFF7CbS1ZKSAMk1wPi74j22lb7PSytxeDhn6pH/ia5bxh8RrjVTJY6SzwWf3Wl6PJ/gK4iOAbfOnJVP1b6V5WJx9vdp/eeLjMzt7lL7ya4uLzWLt7q7naRzy0jngVE9wsSmO34z1c9TTZpzLhQNkY6KKhryG23dnhtuTuxOvWilPSu68DeApdbdNR1JGj04HKJ0M3+C+/etaVKVWXLE2oUZ1pcsEUvB/ge68STC4nDQ6ep5fHMnsv+Ne4afYW2mWMVnaRiOGJcKoqWCCK2hWGGNY40GFVRgAVLXv4fDxoxstz6fC4WGHjZbhRRRXQdQUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUlFAC0UUlAC0UUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFJRmsLxN4psfDViZbhg07D91CDyx/wqZSUVdkznGEeaT0Les65Y6FYvdXsoRR91e7H0Arw3xT4wvvE9zhyYrNT+7gU8fVvU1Q1zXr/wAQ37XN3IWOcJGPuoPQCqyqtoMsA03Ydl/+vXh4rGOp7sdj5vG5hKr7sNI/mIkSQKJJhlj91P8AGopZWlfcx+ntTWYuxZjkmkrhPNCg0V3PgPwO+tzLqN+hWwQ5RT/y1P8AhWlKlKrLlibUKEq0+WJJ4E8Btq8ialqcZWxU5SM8GU+/t/OvZo41iRURQqqMAAYAFEcSRRrHGoVFGAoHAFPr6GhQjRjZH1WGw0KEOWIUUUVudAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRSUALRRRQAlFLRQAUUUUAFJS0UAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUlFcn4y8Z23hq1MUZWW/kHyR5+77n2qJzjCPNIipUjTjzSehL4u8Y2nhizxxNfSD91CD+p9BXheo6lea1qD3V5K0s8h/AewHYUy8vLrVL6S5uZGmuJTksf5fSnZW2XauDKerf3fpXgYrFSrO3Q+YxmNlXlboHy2gwMGY9T2X/wCvVckkkk5JoPNFchwBSUtb/hLwvceJtTES5S0jOZpfQeg9zV04SnLliaUqcqklGO5f8D+DZfEd4Lm5Vk02Fvnb/nof7o/rXuUEEVtCkMKKkaDaqqMACo7Cxt9OsorS1jWOGJdqqKsV9FhsPGjGy3Pq8JhY4eFlv1FoooroOoKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigBKWiigApKWkoAWiiigAopKWgAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigApM0ZrlvGXjC38M2O1dsl9KD5UWen+0faonNQjzSIqVI04uUthnjTxlb+G7QxRFZL+Qfu4/7v+0a8Muru51K9e4uJGlnlbJJ5JNF5eXOpXsl1dStLPK2WY08AWy4HMp6n+7XgYnEyrS8j5fGYyVeXkHFspVcGU/eb+77CoOtFFchwCUUtSWtrNe3UdtbxtJNIwVVHUmhJt2RSTbsi5omjXWvapFY2q/MxyzdkXuTX0BoWiWug6ZFZWq4VR8zd2Pcms7wd4Wg8NaWqEK15KA00nv6D2FdJ3r6DB4VUo3e7PqMBg1QjzS+JgKWiiu09AKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKAEopaKAEopaKAEopaKAEopaSgBaKKKAEpaKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKSsjxH4htfDulveXLAt0jjB5dvQUpSUVdkykormlsVfFnim28NaaZXIe5cYhizyx9T7V4HqOoXWrX8t5dyGSeQ5JPb2HtU2s6zd67qUl7dyFnY8L2UegqCNBAokcfOfuqe3vXz+LxTqystj5jHYx15WXwoVVFuuT/AK0/+O1ETk5pSSxJPU0lcR51wpKWg0AJ1r2T4deDhpdqNVvo/wDTZl/dow/1Sn+prm/hz4Q/tK5XV76P/RYW/cow/wBYw7/QV7GABwK9nAYW372XyPfyzB2/fT+QUUUYr1T2xaKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAopKWgAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAopKiuLiK1t3nncJFGpZmY4AFFxN2K+q6pa6Np8t7dyBIoxnnqT6D3r5+8S+I7rxLqjXU5KxDiGLPCL/jV/xt4ul8S6iViZl0+EkRJ/eP8AeNc5BEGy7/cH614eNxXtHyx2PnMwxvtXyR+FfiPhjCL5sgz/AHV9TSMxdixOSaV3Ltk8DsPSm15tzyb3EopaSgQVveE/Dc3iTV1gAK2yfNM/oPT6msmxsp9RvYbO2QvNK21QK+gfDHh+Dw7pEdpEAZD80sn95q7cFhvayu9kell+E9vPml8KNO1tYbK1itreMJFGoVVHYCpqMUV9ClY+oSsrIWiiigYUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAJS0UUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFJRQAhOASTwK8a+IvjM6ncNpFhJ/okTfvXU/6xh2+grofiR4y/s+3bR7CT/SpV/eup/1ant9TXjyqXYKOSa8rHYq37uPzPEzLGW/dQfqOiiMr46KOSfQVM7g4VRhF6ChsRp5aHj+I+ppleM2eA3cKKKKQgpDS123w88K/2zqI1C7TNlbNkAjiR/T6CtaNJ1ZqKNqFGVaahE634ceE/wCy7Iarex4vLhfkVhzGn+JrvhQAAMDpS19NSpRpwUYn2FGjGjBQj0CiiitDUKKKKACiiigAooooAKSlooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKAEpaSigBaSiigBaKYzqv3mA+tNE8RbAlQn0zSuhXRLRSUUxi0UlLQAUUUUAFFJRQAVzfjPxRD4Z0hpAQ13KCsEfqfX6CtnU9St9J0+a9u3CQxLlj6+w96+d/EevXHiPWJb64JCn5Yo88IvYVx4vEeyjZbs4Mdi1QhZfEzPubia8uZLidzJNIxZmPUmpUXyU5++36Co4EAHmN/wEe9PJJ5718/KVz5acrsSiiioMxKKWgKWIABJPAAosNK5oaHo8+u6tDY24OXOXbsq9zX0HpWm2+k6dDZWyhY41AHv71zvgLwuNB0oT3Cf6dcANJkcoOy119fQ4LDeyhzPdn1OXYT2NPml8TFoooruPSCiiigAooooAKKKKAEpaKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAopKKACiiuJ8Y+P7bQVezsys+oYwR/DH9ff2rOpUjTjzSZnVqwpR5ps6PWde07QrYz31wqD+FOrN9BXmOt/FPULxjDpEIto+0jjc5/DoK4m6vL3Wbt7q9naRicl3PA9hTfMWIbYRg93PU14uIzCcnaGiPncVmlSbtT0Rcu7/U79vM1DUZj6B3P8hVdZkhkDpLcGQchw+0iqxJJyTk0lcDqSbu2eY6k27tnZaR8RdU00qkrNdQjtMcsPxr0bw9430zX3ECMYbnH+rk7/Q968Hp8UjwyrJG7I6nKspwQa6qGOq03q7o7cPmVak7N3R9OClrhfAnjP+2YRp984F9GPlY/8tB/jXc179KrGpHmifT0a0K0FOAtFJS1oaiUhIUEk4ApTXA/ErxZ/ZGnf2baSYvLlfmI6onr+NZ1KipxcmZVqsaUHORxvxG8WnWdROnWkn+hW7ckHiR/X6CuJij8x/8AZHJNRjLN6k1bACJtH4185WqucnJnyWIryqTc31FY54HAHApKKK5zlCkpaSgA+ld78NfC/wDaN9/a93Hm1t2xEpHDv6/Qfzrk9D0efXdXgsIAQZD87f3V7mvoTTdPg0vT4LK2QJFEoVRXpZfhueXtJbI9fK8J7SftJbL8y3RS0V7p9KFFFFABRRRQAUUUUAFFFFABRRRQAlLRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUlFV72CS5spoIpmhd0KrIo5U+tJiZwfj3x8NKV9L0qQNesMSyjkQj0H+1/KvI0Rp3aaZ2bJyzE5LGtjxD4W1LQb9xfK0kTNlbgch8+/rWUTkY6AdBXzuLrVJztLQ+Tx9erOpaenkK7lgABhR0AptFFcR54UUUUAFFFFAEttczWdzHc27lJY2DIw7GvffCniGLxFo0d0uFmX5Jk/ut/h3r59rpfBPiFtA11DIx+yXGI5h2Ho34V3YHEOlOz2Z6WW4t0anLLZnvNLTVYOoZTkEZBFBIAya+iPqzO17WbfQdIn1C5b5I1+Ve7N2A+pr5w1TVLjWNTnv7pt0srbj6AdgPYV1HxJ8V/wBuax9htpM2NoxAweJH7t/QVxkKGR/bvXiY2vzy5Vsj53McT7SXKtkWIEwN569qlo4xxRXlt3Z4zd2FFFFIQUdeBRXYfD7w3/bWsC6nTNpakMc9GbsK1pUnUmoo2oUZVqihE7v4e+GRouk/a7hP9MugGbI5RewrtKAAAAOlFfT06apxUUfZUaUaUFCPQWikpa0NAooooAKKKKACiiigAooooAKKSloAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooASloooAKSlooAr3dnBfW7QXMSyxMMFWGa8y8RfDF0L3GjtuXr5LdR9DXqtJWFbD06ytJHNiMJSrq00fM9zaz2czQ3EbRyKcFWGMVDXt/jTwjHrdm91axgX0YyB08wen19DXis0LQuykEYJBBGCD6H3rwMThZUJeR8vjMHPDSs9V3IqKWiuU4xKKKKACiiigD2v4da/wD2roQtZnzc2mEOTyy9j/SoPiZ4rGgaKbO3fF7dgquDyidz/SvNvCmv/wDCO65HeOW+zkFZlHda5bxT4lm8SeILi/lJCs22JP7qDoK9qlinOhbrsfRUca54bl+1sVlYu/qTWnDH5ceO561S02LcPNYfStGvLqy1seLXlryoSiiisTAKKKKAJrO0mvryG1t0LzTMEUD1NfQnh7RYdB0aCxiAyoy7f3m7muF+F3hwBX125Tk5jtgR27t/T869Pr3svw/JD2j3Z9NleF9nD2kt3+QUtFFekesFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABSUtFABRRRQAlLRRQAlFLRQAlFLRQAleX/Evwx5W7X7OLKHAvI1HbtIPcd69QqOeGO4geGVA8bqVZSOCDWValGrBxZjiKEa1NwkfM7LtIwcqeQfUUlbHiXQn8O69NprZNu+ZLVz/AHT/AA/hWMRivmKtN05OLPja1J0puEgooorMyCiiigA61zlzZFdV8sfdc7hXR0xokaVZCPmXoa1pVeRs3oVvZthEgijVB0Ap9FFZN3MW7u4lFFFABWn4f0eXXdagsIwdrnMjD+FB1NZley/DXw9/ZukHUJ0xc3YyMjlU7CuvB0PbVEuiO3AYb29VJ7Lc7O0tYrK0itoECRRKFUDsBU9FLX0iVtEfXJWVkFFFFMYUUUUAFFFFABRRRQAUUUlAC0UUUAFFFFABRRSUALRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUlAC0UUUAFFFFABRRSUALSUtJQBx/xE8Of274eeWFP9MtP3sRHU+orxFX86MSdG6MPQ19PMAykHoa8B8b6J/wAI94qlVVxZ3v7yP0BPUfnXlZjQuvaI8TNsNzL2qOfooIIJB60V4h86FFFFABRRSUDFpKKKACiijNOwG/4P0Jtf8QQwMD5EZ8yY/wCyO34178iLHGqIAFUYAHYVyPw90D+x9AWeVMXN1iR8jkL2FdhX0eCoeyp67s+sy7DexpK+7ClpM0V2HoC0UlFAC0UlFAC0UlFAC0UmaKAClpKKAFopKM0ALRSZozQAtFJmigBaKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigBKWiigAooooAKKKKACiiigAooooAKKKKACiiigBK474keH/AO2/DEskSZubTMseOpA6j8v5V2OaawDKQRkHqDUTipxcWRUgpxcX1PmGKTzoFc/eHyt9adWr4s0Y+HPF11ZgYtbj95Ce20nj8jkVknivmK1Nwm0z4zEUnTqOLFpKKM1iYhRSZpKYDs0maTNJmgY7NdF4J0M694khidc20H72Y9sDoPxP9a5vNe3/AA60P+yPDi3Eq4ubzEr56hf4R+XP412YKj7Sqr7I78vw/tqyvsjsRhQABgClzTM0Zr6I+sH5ozTM0ZoAfmjNMzRmgB9JTc0ZoAdmjNNzRmmA/NGaZmlzQA7NGabmjNADs0U3NGaAHZopM0ZoAWikzRmgB1FNpc0ALRSUUALRmkzRmgBc0uabS0AFLSUUALRSUUALRSUUALRSZpaACiiigApKWigAopKWgAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiikoAWikzRmgBaKbmjNAC0ZpM0maQDqSm5o3UAOzSZpu6k3UAcD8WdB/tHw6upQrm4sG3HHUxn735cH8DXkCSiaJJO54P1r6YuIo7m3kglUNHIpVlPcEV816jp76F4gvtIlziNz5ZPdeqn8q8rMKN/fR4ea0L2qIZmjNMzRurx7Hg2HZpM03dSbqdh2HZozTN1a+geHNQ8R3YhtI8RA/vJm+6g/x9quEHJ2ii4U5TfLFalzwfoEmv65FGUJtYmDzt2wO34174uEUKowAMAVi+HtBtPDumraW3zMeZJD1c1rb69/CYf2MLPdn1GBwvsKdnu9yXNGai30b66jtJc0ZqLdRuoAlzRmo91G6gZJmjNR7qXdTAkzRmo80uaAH5ozTM0uaAH5ozTM0uaAHZpc0zNLmgB2aM03NGaAHZpc03NGaAHZopKM0AOopuaWgBc0ZpM0uaAClpKKAFopM0tABS0lFAC0UmaWgAooooAKKKKAFooooAKKSloAKKKKACiiigAooooAKKKKACiiigAopKKAFpKKSgBc0UmaTNAC5ozTc0maQDs0hNNzSFqAHk0m6oy1IWoAk3Um6oi1NL0CJS1IXqEyUwyUATl6QyVWMvvUZmHrQBbMleSfGDSvLmstchXn/Uykfmp/mK9NM/vWH4rsl1rw3fWRALNGWT2Ycisq0OeDRhXh7Sm4nh/mBwHHRhmk3VQsZiY3hfh4z0/nVkvXz86fLKx8tUpcsmiXdSFqiDFmCgEknAA716B4U8FoDHf6uv8AtR25/m3+FXSoSqOyNaGFnWlaJT8J+CLnW3W6vd0FiDnOMNJ9P8a9fsbW1020S1s4VhhQYCr/ADNU1uFVQq4VQMADoKcLkHvXt0MPGitNz6HDYWFBabmn5lHmVni4HrThN71udRe30u+qYl96eJPegC1vpd1Vw9ODUAT7qN1RBqUGmMlzS5qPNKDQBJmlzUeaXNAEmaM0zNLmgB+aXNMzS0AOzS5plLQA7NLTaWgBaXNNzS0ALS5ptLQAtLSUUALmlpKKAFzS0lFAC0UlLQAUtJRQAtFJS0AFFFFAC0UlLQAtFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFJRRQAUlFFABSZopKACkzRSGgAJpM0E000ABNNJpTTTSAQmmlqDTTQAhamF6GqJjQKwM9RNJQxqu5NAWHNNUD3GO9RSM1UZnftQFi292B3qpLqKL1cD8azLkzHOM1iXkFy+fvUCscZ4s0iHTNZOoWThrW4Y+YmeY2P9KxtzFtoGTXU6jo9xcIylGYHtXM3HhvVzN8ofb2rjq4RTlc4a2BjUlc6rw6unaawurh1luv4R1Cf/XrrU8Swt0avObLw3qPG8NXRWfh+4XG4muqFOMFaJ2U6UaceWKOsj11W6Grceqbqw7bR3XGTWrBp+3FWaGlHfFqspck1TitdtXI4cUAWUlJqdXNQJHU6rQBKrGpQaiUVKKAJAacDTBTxQA4GnA00U4UAOpRTRS0AOpaQUtAC0tJS0ALS0lLQAUtIKWgBaBRQKAFooooAWlpKBQAtFFFAC0UUUALRSUtABS0lLQAUUUUALRSUtABRRRQA6iiigBKWiigAooooAKKKKACiiigApM0UUAFJS0lABSGlpDQAUlFFACUlLSUANNIadSUANppp+KaRQAwimEVIRSEUAQsKjK1YK0wrQBVZKiaOrpSmGOgDPaEHtUL2wPatQx0wxe1IRkNZqe1RNYqf4RW0Yfak8n2pCMM2Cf3R+VNOnIf4B+VbvkD0pPIHpQBg/2Yn90U9dPA/hrc8gelHke1AGQtmB2qVbUDtWn5A9KcIfagDPW39qlWH2q6IvanCL2oAqLFipBHVkR04R0wIAlPC1KEpwSgZEFpwFSbKXbQAwClAp+2l20ANApcU7FLigY3FLilxS4oATFLilxRigBKXFLilxTASilxS4oASloxRQAUUtFABS0UUAFFFLQAUUUUAApaKKAClpKWgAooooAKWkpaACiiigB1FFFABRRRQAUUUUAFFFFABSUtFACUUUUAFJS0UAJRRRQAlJTqTFACUlOpMUANpMU6jFADMUmKfikxSAZikxT8UYoAj20m2pMUYoEQ7aQrU2KTbQBDspNlT7aTbQBBspPLqxto20AV/Lo8urG2jbQBX8ul8up9tG2gCDZS7Km20baAsRbKNlTbaNtAEWyl21JtpcUAR7aXbT8UYoAZtpcU/FGKAGYpcU7FGKAG4pcU7FGKAG4oxTsUYoATFGKdijFAxMUYpcUtADcUtLijFACYpaMUtMBKKWigBKWjFFABRS0UAFFFGKACjFLRQAUUUUAFFFLQAUUUUAFFFFADqKKKACiiigAooooAKKKKACiiigAooooATFFLRQAlFFFACUUtFACUlOpKQCYpMU7FFADcUmKdijFADcUmKfijFAhmKMU7FGKAG4pMU/FGKAGYoxT8UmKAG4pMU/FGKAGYoxT8UYoAZilxTsUYoAbijFOxS4oAZijFPxSYoAbijFPxRigBmKMU/FGKBjcUYp2KMUANxRinYoxQIbijFOxRigY3FGKdRigBMUYpaKAExRS0UAJRilooASlpaKAEopaKYCUYpaKAExS0UYoAKKKWgBMUUtFABRRRQAUUUUAFFLRQAmKWiigBaKKKAEooooAWiiigAooooAKKKKACiiigAooooAKKKKACiiigBKKKKADFFFFABRRRSATFGKKKADFFFFABijFFFABijFFFABiiiigAxRiiigAxRiiigAxRiiigAxRiiimAYoxRRQAUUUUAFFFFACUuKKKQCUtFFACYoxRRQAYoxRRQAUUUUALSUUUAFLRRQAUUUUAFFFFABRRRQAUUUUwCiiigAxRRRQAtFFFABRRRQAUUUUAf/9k="\n}\n
            \n

            in the client you can do this (in python for continuity)

            \n
            import base64\n\nmyFile = open("mock.jpg", "wb")\nimg = base64.b64decode(value)  #value is the returned string\nmyFile.write(img)\nmyFile.close()\n
            \n soup wrap:

            f you use the blobstore use the get_serving_url function to read the images from url in the client, or use the messages.ByteField in the ResourceContainer and serialize the image with base64.b64decode

            #the returned class
            class Img(messages.Message):
                 message = messages.BytesField (1)
            
            #The api class
            @endpoints.api(name='helloImg', version='v1')
            class HelloImgApi(remote.Service):
            
            ID_RESOURCE = endpoints.ResourceContainer(
                    message_types.VoidMessage,
                    id=messages.StringField(1, variant=messages.Variant.STRING))
            
            @endpoints.method(ID_RESOURCE, Img,
                              path='serveimage/{id}', http_method='GET', #ID is the blobstore key
                              name='greetings.getImage')
            def image_get(self, request):
                try:
                    blob_reader = blobstore.BlobReader(blob_key)
                    value = blob_reader.read()
            
                    return Img(message=value)
                except:
                    raise endpoints.NotFoundException('image %s not found.' %
                                                      (request.id,))        
            
            APPLICATION = endpoints.api_server([HelloImgApi])
            

            And this is the response (save it in the client with the proper format)

            {
              "message": "/9j/4AAQSkZJRgABAQAAAQABAAD//gA+Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2NjIpLCBkZWZhdWx0IHF1YWxpdHkK/9sAQwAIBgYHBgUIBwcHCQkICgwUDQwLCwwZEhMPFB0aHx4dGhwcICQuJyAiLCMcHCg3KSwwMTQ0NB8nOT04MjwuMzQy/9sAQwEJCQkMCwwYDQ0YMiEcITIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIy/8AAEQgBZwKAAwEiAAIRAQMRAf/EAB8AAAEFAQEBAQEBAAAAAAAAAAABAgMEBQYHCAkKC//EALUQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29/j5+v/EAB8BAAMBAQEBAQEBAQEAAAAAAAABAgMEBQYHCAkKC//EALURAAIBAgQEAwQHBQQEAAECdwABAgMRBAUhMQYSQVEHYXETIjKBCBRCkaGxwQkjM1LwFWJy0QoWJDThJfEXGBkaJicoKSo1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uLj5OXm5+jp6vLz9PX29/j5+v/aAAwDAQACEQMRAD8A9/ooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACkpaKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAoopKACkZgilmIAHUntWTr3iTTfD1qZr2YBiPkjHLN9BXjniTxzqniSRreItb2ZOBDGeWH+0e/wBK5a+KhRWu/Y48TjadBa6vsei6z8TNF0udreDzL2VeD5ONgPpuP9M1zF18U9VuVZrWzt7OH/npKS5/DpmvP9kVqMyYkl/uDoPrUEs0kzbnbPoOwryZ46tPZ2PDqZlXm9HZHp/g74g3V54gFlqU2+Kf5Y3YBcN+HTNeqCvlqORopFkRiroQykdiK+k/D+oDVNBsrwHJkiUn645rvwFdzTjLdHpZZiZVIuEndo06KKK9E9YKKKKACikpaACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACkpaKAEoopaACkpaKACkpaKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigApKWoLu7gsrZ7i5lSKJBlnc4AFF7CbS1ZKSAMk1wPi74j22lb7PSytxeDhn6pH/ia5bxh8RrjVTJY6SzwWf3Wl6PJ/gK4iOAbfOnJVP1b6V5WJx9vdp/eeLjMzt7lL7ya4uLzWLt7q7naRzy0jngVE9wsSmO34z1c9TTZpzLhQNkY6KKhryG23dnhtuTuxOvWilPSu68DeApdbdNR1JGj04HKJ0M3+C+/etaVKVWXLE2oUZ1pcsEUvB/ge68STC4nDQ6ep5fHMnsv+Ne4afYW2mWMVnaRiOGJcKoqWCCK2hWGGNY40GFVRgAVLXv4fDxoxstz6fC4WGHjZbhRRRXQdQUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUlFAC0UUlAC0UUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFJRmsLxN4psfDViZbhg07D91CDyx/wqZSUVdkznGEeaT0Les65Y6FYvdXsoRR91e7H0Arw3xT4wvvE9zhyYrNT+7gU8fVvU1Q1zXr/wAQ37XN3IWOcJGPuoPQCqyqtoMsA03Ydl/+vXh4rGOp7sdj5vG5hKr7sNI/mIkSQKJJhlj91P8AGopZWlfcx+ntTWYuxZjkmkrhPNCg0V3PgPwO+tzLqN+hWwQ5RT/y1P8AhWlKlKrLlibUKEq0+WJJ4E8Btq8ialqcZWxU5SM8GU+/t/OvZo41iRURQqqMAAYAFEcSRRrHGoVFGAoHAFPr6GhQjRjZH1WGw0KEOWIUUUVudAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRSUALRRRQAlFLRQAUUUUAFJS0UAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUlFcn4y8Z23hq1MUZWW/kHyR5+77n2qJzjCPNIipUjTjzSehL4u8Y2nhizxxNfSD91CD+p9BXheo6lea1qD3V5K0s8h/AewHYUy8vLrVL6S5uZGmuJTksf5fSnZW2XauDKerf3fpXgYrFSrO3Q+YxmNlXlboHy2gwMGY9T2X/wCvVckkkk5JoPNFchwBSUtb/hLwvceJtTES5S0jOZpfQeg9zV04SnLliaUqcqklGO5f8D+DZfEd4Lm5Vk02Fvnb/nof7o/rXuUEEVtCkMKKkaDaqqMACo7Cxt9OsorS1jWOGJdqqKsV9FhsPGjGy3Pq8JhY4eFlv1FoooroOoKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigBKWiigApKWkoAWiiigAopKWgAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigApM0ZrlvGXjC38M2O1dsl9KD5UWen+0faonNQjzSIqVI04uUthnjTxlb+G7QxRFZL+Qfu4/7v+0a8Muru51K9e4uJGlnlbJJ5JNF5eXOpXsl1dStLPK2WY08AWy4HMp6n+7XgYnEyrS8j5fGYyVeXkHFspVcGU/eb+77CoOtFFchwCUUtSWtrNe3UdtbxtJNIwVVHUmhJt2RSTbsi5omjXWvapFY2q/MxyzdkXuTX0BoWiWug6ZFZWq4VR8zd2Pcms7wd4Wg8NaWqEK15KA00nv6D2FdJ3r6DB4VUo3e7PqMBg1QjzS+JgKWiiu09AKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKAEopaKAEopaKAEopaKAEopaSgBaKKKAEpaKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKSsjxH4htfDulveXLAt0jjB5dvQUpSUVdkykormlsVfFnim28NaaZXIe5cYhizyx9T7V4HqOoXWrX8t5dyGSeQ5JPb2HtU2s6zd67qUl7dyFnY8L2UegqCNBAokcfOfuqe3vXz+LxTqystj5jHYx15WXwoVVFuuT/AK0/+O1ETk5pSSxJPU0lcR51wpKWg0AJ1r2T4deDhpdqNVvo/wDTZl/dow/1Sn+prm/hz4Q/tK5XV76P/RYW/cow/wBYw7/QV7GABwK9nAYW372XyPfyzB2/fT+QUUUYr1T2xaKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAopKWgAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAopKiuLiK1t3nncJFGpZmY4AFFxN2K+q6pa6Np8t7dyBIoxnnqT6D3r5+8S+I7rxLqjXU5KxDiGLPCL/jV/xt4ul8S6iViZl0+EkRJ/eP8AeNc5BEGy7/cH614eNxXtHyx2PnMwxvtXyR+FfiPhjCL5sgz/AHV9TSMxdixOSaV3Ltk8DsPSm15tzyb3EopaSgQVveE/Dc3iTV1gAK2yfNM/oPT6msmxsp9RvYbO2QvNK21QK+gfDHh+Dw7pEdpEAZD80sn95q7cFhvayu9kell+E9vPml8KNO1tYbK1itreMJFGoVVHYCpqMUV9ClY+oSsrIWiiigYUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAJS0UUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFJRQAhOASTwK8a+IvjM6ncNpFhJ/okTfvXU/6xh2+grofiR4y/s+3bR7CT/SpV/eup/1ant9TXjyqXYKOSa8rHYq37uPzPEzLGW/dQfqOiiMr46KOSfQVM7g4VRhF6ChsRp5aHj+I+ppleM2eA3cKKKKQgpDS123w88K/2zqI1C7TNlbNkAjiR/T6CtaNJ1ZqKNqFGVaahE634ceE/wCy7Iarex4vLhfkVhzGn+JrvhQAAMDpS19NSpRpwUYn2FGjGjBQj0CiiitDUKKKKACiiigAooooAKSlooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKAEpaSigBaSiigBaKYzqv3mA+tNE8RbAlQn0zSuhXRLRSUUxi0UlLQAUUUUAFFJRQAVzfjPxRD4Z0hpAQ13KCsEfqfX6CtnU9St9J0+a9u3CQxLlj6+w96+d/EevXHiPWJb64JCn5Yo88IvYVx4vEeyjZbs4Mdi1QhZfEzPubia8uZLidzJNIxZmPUmpUXyU5++36Co4EAHmN/wEe9PJJ5718/KVz5acrsSiiioMxKKWgKWIABJPAAosNK5oaHo8+u6tDY24OXOXbsq9zX0HpWm2+k6dDZWyhY41AHv71zvgLwuNB0oT3Cf6dcANJkcoOy119fQ4LDeyhzPdn1OXYT2NPml8TFoooruPSCiiigAooooAKKKKAEpaKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAopKKACiiuJ8Y+P7bQVezsys+oYwR/DH9ff2rOpUjTjzSZnVqwpR5ps6PWde07QrYz31wqD+FOrN9BXmOt/FPULxjDpEIto+0jjc5/DoK4m6vL3Wbt7q9naRicl3PA9hTfMWIbYRg93PU14uIzCcnaGiPncVmlSbtT0Rcu7/U79vM1DUZj6B3P8hVdZkhkDpLcGQchw+0iqxJJyTk0lcDqSbu2eY6k27tnZaR8RdU00qkrNdQjtMcsPxr0bw9430zX3ECMYbnH+rk7/Q968Hp8UjwyrJG7I6nKspwQa6qGOq03q7o7cPmVak7N3R9OClrhfAnjP+2YRp984F9GPlY/8tB/jXc179KrGpHmifT0a0K0FOAtFJS1oaiUhIUEk4ApTXA/ErxZ/ZGnf2baSYvLlfmI6onr+NZ1KipxcmZVqsaUHORxvxG8WnWdROnWkn+hW7ckHiR/X6CuJij8x/8AZHJNRjLN6k1bACJtH4185WqucnJnyWIryqTc31FY54HAHApKKK5zlCkpaSgA+ld78NfC/wDaN9/a93Hm1t2xEpHDv6/Qfzrk9D0efXdXgsIAQZD87f3V7mvoTTdPg0vT4LK2QJFEoVRXpZfhueXtJbI9fK8J7SftJbL8y3RS0V7p9KFFFFABRRRQAUUUUAFFFFABRRRQAlLRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUlFV72CS5spoIpmhd0KrIo5U+tJiZwfj3x8NKV9L0qQNesMSyjkQj0H+1/KvI0Rp3aaZ2bJyzE5LGtjxD4W1LQb9xfK0kTNlbgch8+/rWUTkY6AdBXzuLrVJztLQ+Tx9erOpaenkK7lgABhR0AptFFcR54UUUUAFFFFAEttczWdzHc27lJY2DIw7GvffCniGLxFo0d0uFmX5Jk/ut/h3r59rpfBPiFtA11DIx+yXGI5h2Ho34V3YHEOlOz2Z6WW4t0anLLZnvNLTVYOoZTkEZBFBIAya+iPqzO17WbfQdIn1C5b5I1+Ve7N2A+pr5w1TVLjWNTnv7pt0srbj6AdgPYV1HxJ8V/wBuax9htpM2NoxAweJH7t/QVxkKGR/bvXiY2vzy5Vsj53McT7SXKtkWIEwN569qlo4xxRXlt3Z4zd2FFFFIQUdeBRXYfD7w3/bWsC6nTNpakMc9GbsK1pUnUmoo2oUZVqihE7v4e+GRouk/a7hP9MugGbI5RewrtKAAAAOlFfT06apxUUfZUaUaUFCPQWikpa0NAooooAKKKKACiiigAooooAKKSloAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooASloooAKSlooAr3dnBfW7QXMSyxMMFWGa8y8RfDF0L3GjtuXr5LdR9DXqtJWFbD06ytJHNiMJSrq00fM9zaz2czQ3EbRyKcFWGMVDXt/jTwjHrdm91axgX0YyB08wen19DXis0LQuykEYJBBGCD6H3rwMThZUJeR8vjMHPDSs9V3IqKWiuU4xKKKKACiiigD2v4da/wD2roQtZnzc2mEOTyy9j/SoPiZ4rGgaKbO3fF7dgquDyidz/SvNvCmv/wDCO65HeOW+zkFZlHda5bxT4lm8SeILi/lJCs22JP7qDoK9qlinOhbrsfRUca54bl+1sVlYu/qTWnDH5ceO561S02LcPNYfStGvLqy1seLXlryoSiiisTAKKKKAJrO0mvryG1t0LzTMEUD1NfQnh7RYdB0aCxiAyoy7f3m7muF+F3hwBX125Tk5jtgR27t/T869Pr3svw/JD2j3Z9NleF9nD2kt3+QUtFFekesFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABSUtFABRRRQAlLRRQAlFLRQAlFLRQAleX/Evwx5W7X7OLKHAvI1HbtIPcd69QqOeGO4geGVA8bqVZSOCDWValGrBxZjiKEa1NwkfM7LtIwcqeQfUUlbHiXQn8O69NprZNu+ZLVz/AHT/AA/hWMRivmKtN05OLPja1J0puEgooorMyCiiigA61zlzZFdV8sfdc7hXR0xokaVZCPmXoa1pVeRs3oVvZthEgijVB0Ap9FFZN3MW7u4lFFFABWn4f0eXXdagsIwdrnMjD+FB1NZley/DXw9/ZukHUJ0xc3YyMjlU7CuvB0PbVEuiO3AYb29VJ7Lc7O0tYrK0itoECRRKFUDsBU9FLX0iVtEfXJWVkFFFFMYUUUUAFFFFABRRRQAUUUlAC0UUUAFFFFABRRSUALRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUlAC0UUUAFFFFABRRSUALSUtJQBx/xE8Of274eeWFP9MtP3sRHU+orxFX86MSdG6MPQ19PMAykHoa8B8b6J/wAI94qlVVxZ3v7yP0BPUfnXlZjQuvaI8TNsNzL2qOfooIIJB60V4h86FFFFABRRSUDFpKKKACiijNOwG/4P0Jtf8QQwMD5EZ8yY/wCyO34178iLHGqIAFUYAHYVyPw90D+x9AWeVMXN1iR8jkL2FdhX0eCoeyp67s+sy7DexpK+7ClpM0V2HoC0UlFAC0UlFAC0UlFAC0UmaKAClpKKAFopKM0ALRSZozQAtFJmigBaKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigBKWiigAooooAKKKKACiiigAooooAKKKKACiiigBK474keH/AO2/DEskSZubTMseOpA6j8v5V2OaawDKQRkHqDUTipxcWRUgpxcX1PmGKTzoFc/eHyt9adWr4s0Y+HPF11ZgYtbj95Ce20nj8jkVknivmK1Nwm0z4zEUnTqOLFpKKM1iYhRSZpKYDs0maTNJmgY7NdF4J0M694khidc20H72Y9sDoPxP9a5vNe3/AA60P+yPDi3Eq4ubzEr56hf4R+XP412YKj7Sqr7I78vw/tqyvsjsRhQABgClzTM0Zr6I+sH5ozTM0ZoAfmjNMzRmgB9JTc0ZoAdmjNNzRmmA/NGaZmlzQA7NGabmjNADs0U3NGaAHZopM0ZoAWikzRmgB1FNpc0ALRSUUALRmkzRmgBc0uabS0AFLSUUALRSUUALRSUUALRSZpaACiiigApKWigAopKWgAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiikoAWikzRmgBaKbmjNAC0ZpM0maQDqSm5o3UAOzSZpu6k3UAcD8WdB/tHw6upQrm4sG3HHUxn735cH8DXkCSiaJJO54P1r6YuIo7m3kglUNHIpVlPcEV816jp76F4gvtIlziNz5ZPdeqn8q8rMKN/fR4ea0L2qIZmjNMzRurx7Hg2HZpM03dSbqdh2HZozTN1a+geHNQ8R3YhtI8RA/vJm+6g/x9quEHJ2ii4U5TfLFalzwfoEmv65FGUJtYmDzt2wO34174uEUKowAMAVi+HtBtPDumraW3zMeZJD1c1rb69/CYf2MLPdn1GBwvsKdnu9yXNGai30b66jtJc0ZqLdRuoAlzRmo91G6gZJmjNR7qXdTAkzRmo80uaAH5ozTM0uaAH5ozTM0uaAHZpc0zNLmgB2aM03NGaAHZpc03NGaAHZopKM0AOopuaWgBc0ZpM0uaAClpKKAFopM0tABS0lFAC0UmaWgAooooAKKKKAFooooAKKSloAKKKKACiiigAooooAKKKKACiiigAopKKAFpKKSgBc0UmaTNAC5ozTc0maQDs0hNNzSFqAHk0m6oy1IWoAk3Um6oi1NL0CJS1IXqEyUwyUATl6QyVWMvvUZmHrQBbMleSfGDSvLmstchXn/Uykfmp/mK9NM/vWH4rsl1rw3fWRALNGWT2Ycisq0OeDRhXh7Sm4nh/mBwHHRhmk3VQsZiY3hfh4z0/nVkvXz86fLKx8tUpcsmiXdSFqiDFmCgEknAA716B4U8FoDHf6uv8AtR25/m3+FXSoSqOyNaGFnWlaJT8J+CLnW3W6vd0FiDnOMNJ9P8a9fsbW1020S1s4VhhQYCr/ADNU1uFVQq4VQMADoKcLkHvXt0MPGitNz6HDYWFBabmn5lHmVni4HrThN71udRe30u+qYl96eJPegC1vpd1Vw9ODUAT7qN1RBqUGmMlzS5qPNKDQBJmlzUeaXNAEmaM0zNLmgB+aXNMzS0AOzS5plLQA7NLTaWgBaXNNzS0ALS5ptLQAtLSUUALmlpKKAFzS0lFAC0UlLQAUtJRQAtFJS0AFFFFAC0UlLQAtFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFJRRQAUlFFABSZopKACkzRSGgAJpM0E000ABNNJpTTTSAQmmlqDTTQAhamF6GqJjQKwM9RNJQxqu5NAWHNNUD3GO9RSM1UZnftQFi292B3qpLqKL1cD8azLkzHOM1iXkFy+fvUCscZ4s0iHTNZOoWThrW4Y+YmeY2P9KxtzFtoGTXU6jo9xcIylGYHtXM3HhvVzN8ofb2rjq4RTlc4a2BjUlc6rw6unaawurh1luv4R1Cf/XrrU8Swt0avObLw3qPG8NXRWfh+4XG4muqFOMFaJ2U6UaceWKOsj11W6Grceqbqw7bR3XGTWrBp+3FWaGlHfFqspck1TitdtXI4cUAWUlJqdXNQJHU6rQBKrGpQaiUVKKAJAacDTBTxQA4GnA00U4UAOpRTRS0AOpaQUtAC0tJS0ALS0lLQAUtIKWgBaBRQKAFooooAWlpKBQAtFFFAC0UUUALRSUtABS0lLQAUUUUALRSUtABRRRQA6iiigBKWiigAooooAKKKKACiiigApM0UUAFJS0lABSGlpDQAUlFFACUlLSUANNIadSUANppp+KaRQAwimEVIRSEUAQsKjK1YK0wrQBVZKiaOrpSmGOgDPaEHtUL2wPatQx0wxe1IRkNZqe1RNYqf4RW0Yfak8n2pCMM2Cf3R+VNOnIf4B+VbvkD0pPIHpQBg/2Yn90U9dPA/hrc8gelHke1AGQtmB2qVbUDtWn5A9KcIfagDPW39qlWH2q6IvanCL2oAqLFipBHVkR04R0wIAlPC1KEpwSgZEFpwFSbKXbQAwClAp+2l20ANApcU7FLigY3FLilxS4oATFLilxRigBKXFLilxTASilxS4oASloxRQAUUtFABS0UUAFFFLQAUUUUAApaKKAClpKWgAooooAKWkpaACiiigB1FFFABRRRQAUUUUAFFFFABSUtFACUUUUAFJS0UAJRRRQAlJTqTFACUlOpMUANpMU6jFADMUmKfikxSAZikxT8UYoAj20m2pMUYoEQ7aQrU2KTbQBDspNlT7aTbQBBspPLqxto20AV/Lo8urG2jbQBX8ul8up9tG2gCDZS7Km20baAsRbKNlTbaNtAEWyl21JtpcUAR7aXbT8UYoAZtpcU/FGKAGYpcU7FGKAG4pcU7FGKAG4oxTsUYoATFGKdijFAxMUYpcUtADcUtLijFACYpaMUtMBKKWigBKWjFFABRS0UAFFFGKACjFLRQAUUUUAFFFLQAUUUUAFFFFADqKKKACiiigAooooAKKKKACiiigAooooATFFLRQAlFFFACUUtFACUlOpKQCYpMU7FFADcUmKdijFADcUmKfijFAhmKMU7FGKAG4pMU/FGKAGYoxT8UmKAG4pMU/FGKAGYoxT8UYoAZilxTsUYoAbijFOxS4oAZijFPxSYoAbijFPxRigBmKMU/FGKBjcUYp2KMUANxRinYoxQIbijFOxRigY3FGKdRigBMUYpaKAExRS0UAJRilooASlpaKAEopaKYCUYpaKAExS0UYoAKKKWgBMUUtFABRRRQAUUUUAFFLRQAmKWiigBaKKKAEooooAWiiigAooooAKKKKACiiigAooooAKKKKACiiigBKKKKADFFFFABRRRSATFGKKKADFFFFABijFFFABijFFFABiiiigAxRiiigAxRiiigAxRiiigAxRiiimAYoxRRQAUUUUAFFFFACUuKKKQCUtFFACYoxRRQAYoxRRQAUUUUALSUUUAFLRRQAUUUUAFFFFABRRRQAUUUUwCiiigAxRRRQAtFFFABRRRQAUUUUAf/9k="
            }
            

            in the client you can do this (in python for continuity)

            import base64
            
            myFile = open("mock.jpg", "wb")
            img = base64.b64decode(value)  #value is the returned string
            myFile.write(img)
            myFile.close()
            
            qid & accept id: (26498794, 26498838) query: How to calculate the value of the sum of squares defined as 1^2 + 2^2 + 3^2 + ... +n2 until a user specified sum has been reached soup:

            If you don't know the bounds of the range in advance, one solution is to just use while True:, and use a break when you've reached the target:

            \n
            while True:\n    x = x + 1\n    y = x ** 2\n    total = total + y\n    if total >= n:\n        break\nprint(total)\n
            \n
            \n

            If you want to get clever, though, you can think of this in terms of iterator pipelines. Like this:

            \n
            numbers = itertools.count(1) # all positive integers\nsquares = (x**2 for x in numbers) # all squares of positive integers\ntotals = itertools.accumulate(squares) # all running totals of squares of ...\nbigtotals = itertools.dropwhile(lambda total: total < n, totals) # all ... starting >= n\ntotal = next(bigtotals) # first ... starting >= n\n
            \n soup wrap:

            If you don't know the bounds of the range in advance, one solution is to just use while True:, and use a break when you've reached the target:

            while True:
                x = x + 1
                y = x ** 2
                total = total + y
                if total >= n:
                    break
            print(total)
            

            If you want to get clever, though, you can think of this in terms of iterator pipelines. Like this:

            numbers = itertools.count(1) # all positive integers
            squares = (x**2 for x in numbers) # all squares of positive integers
            totals = itertools.accumulate(squares) # all running totals of squares of ...
            bigtotals = itertools.dropwhile(lambda total: total < n, totals) # all ... starting >= n
            total = next(bigtotals) # first ... starting >= n
            
            qid & accept id: (26499216, 26499299) query: Scraping website that uses javascript soup:

            There is no need for scraping HTML, or using a high-level selenium approach.

            \n

            Simulate the underlying XHR request(s) going to the server and returning the JSON data that is used to fill up the table on the page.

            \n

            Here's an example using requests:

            \n
            import requests\n\nurl = 'http://stats.nba.com/stats/playergamelog'\n\nparams = {\n    'Season': '2013-14',\n    'SeasonType': 'Regular Season',\n    'LeagueID': '00',\n    'PlayerID': '2544',\n    'pageNo': '1',\n    'rowsPerPage': '100'\n}\nresponse = requests.post(url, data=params)\n\nprint response.json()\n
            \n

            Prints the JSON structure containing the player game logs:

            \n
            {u'parameters': {u'LeagueID': u'00',\n                 u'PlayerID': 2544,\n                 u'Season': u'2013-14',\n                 u'SeasonType': u'Regular Season'},\n u'resource': u'playergamelog',\n u'resultSets': [{u'headers': [u'SEASON_ID',\n                               u'Player_ID',\n                               u'Game_ID',\n                               u'GAME_DATE',\n                               u'MATCHUP',\n                               u'WL',\n                               u'MIN',\n                               u'FGM',\n                               u'FGA',\n                               u'FG_PCT',\n                               u'FG3M',\n                               u'FG3A',\n                               u'FG3_PCT',\n                               u'FTM',\n                               u'FTA',\n                               u'FT_PCT',\n                               u'OREB',\n                               u'DREB',\n                               u'REB',\n                               u'AST',\n                               u'STL',\n                               u'BLK',\n                               u'TOV',\n                               u'PF',\n                               u'PTS',\n                               u'PLUS_MINUS',\n                               u'VIDEO_AVAILABLE'],\n                  u'name': u'PlayerGameLog',\n                  u'rowSet': [[u'22013',\n                               2544,\n                               u'0021301192',\n                               u'APR 12, 2014',\n                               u'MIA @ ATL',\n                               u'L',\n                               37,\n                               10,\n                               22,\n                               0.455,\n                               3,\n                               7,\n                               0.429,\n                               4,\n                               8,\n                               0.5,\n                               3,\n                               5,\n                               8,\n                               5,\n                               0,\n                               1,\n                               3,\n                               2,\n                               27,\n                               -13,\n                               1],\n                              [u'22013',\n                               2544,\n                               u'0021301180',\n                               u'APR 11, 2014',\n                               u'MIA vs. IND',\n                               u'W',\n                               35,\n                               11,\n                               20,\n                               0.55,\n                               2,\n                               4,\n                               0.5,\n                               12,\n                               13,\n                               0.923,\n                               1,\n                               5,\n                               6,\n                               1,\n                               1,\n                               1,\n                               2,\n                               1,\n                               36,\n                               13,\n                               1],\n                              [u'22013',\n                               2544,\n                               u'0021301167',\n                               u'APR 09, 2014',\n                               u'MIA @ MEM',\n                               u'L',\n                               41,\n                               14,\n                               23,\n                               0.609,\n                               3,\n                               5,\n                               0.6,\n                               6,\n                               7,\n                               0.857,\n                               1,\n                               5,\n                               6,\n                               5,\n                               2,\n                               0,\n                               5,\n                               1,\n                               37,\n                               -8,\n                               1],\n    ...\n}\n
            \n
            \n

            Alternative solution would be to use an NBA API, see several options here:

            \n\n soup wrap:

            There is no need for scraping HTML, or using a high-level selenium approach.

            Simulate the underlying XHR request(s) going to the server and returning the JSON data that is used to fill up the table on the page.

            Here's an example using requests:

            import requests
            
            url = 'http://stats.nba.com/stats/playergamelog'
            
            params = {
                'Season': '2013-14',
                'SeasonType': 'Regular Season',
                'LeagueID': '00',
                'PlayerID': '2544',
                'pageNo': '1',
                'rowsPerPage': '100'
            }
            response = requests.post(url, data=params)
            
            print response.json()
            

            Prints the JSON structure containing the player game logs:

            {u'parameters': {u'LeagueID': u'00',
                             u'PlayerID': 2544,
                             u'Season': u'2013-14',
                             u'SeasonType': u'Regular Season'},
             u'resource': u'playergamelog',
             u'resultSets': [{u'headers': [u'SEASON_ID',
                                           u'Player_ID',
                                           u'Game_ID',
                                           u'GAME_DATE',
                                           u'MATCHUP',
                                           u'WL',
                                           u'MIN',
                                           u'FGM',
                                           u'FGA',
                                           u'FG_PCT',
                                           u'FG3M',
                                           u'FG3A',
                                           u'FG3_PCT',
                                           u'FTM',
                                           u'FTA',
                                           u'FT_PCT',
                                           u'OREB',
                                           u'DREB',
                                           u'REB',
                                           u'AST',
                                           u'STL',
                                           u'BLK',
                                           u'TOV',
                                           u'PF',
                                           u'PTS',
                                           u'PLUS_MINUS',
                                           u'VIDEO_AVAILABLE'],
                              u'name': u'PlayerGameLog',
                              u'rowSet': [[u'22013',
                                           2544,
                                           u'0021301192',
                                           u'APR 12, 2014',
                                           u'MIA @ ATL',
                                           u'L',
                                           37,
                                           10,
                                           22,
                                           0.455,
                                           3,
                                           7,
                                           0.429,
                                           4,
                                           8,
                                           0.5,
                                           3,
                                           5,
                                           8,
                                           5,
                                           0,
                                           1,
                                           3,
                                           2,
                                           27,
                                           -13,
                                           1],
                                          [u'22013',
                                           2544,
                                           u'0021301180',
                                           u'APR 11, 2014',
                                           u'MIA vs. IND',
                                           u'W',
                                           35,
                                           11,
                                           20,
                                           0.55,
                                           2,
                                           4,
                                           0.5,
                                           12,
                                           13,
                                           0.923,
                                           1,
                                           5,
                                           6,
                                           1,
                                           1,
                                           1,
                                           2,
                                           1,
                                           36,
                                           13,
                                           1],
                                          [u'22013',
                                           2544,
                                           u'0021301167',
                                           u'APR 09, 2014',
                                           u'MIA @ MEM',
                                           u'L',
                                           41,
                                           14,
                                           23,
                                           0.609,
                                           3,
                                           5,
                                           0.6,
                                           6,
                                           7,
                                           0.857,
                                           1,
                                           5,
                                           6,
                                           5,
                                           2,
                                           0,
                                           5,
                                           1,
                                           37,
                                           -8,
                                           1],
                ...
            }
            

            Alternative solution would be to use an NBA API, see several options here:

            qid & accept id: (26511109, 26514056) query: Filtering for row-wise patterns in columns with a sequence of 0 and 1 soup:
            res = {}\nt = df - df.shift(1)\nfor col in df.columns:\n    res[col] = t[col][t[col] != 0]\n
            \n

            when a value for a particular column is 1, it means that the time frame has started, when it's -1, it means it's over

            \n

            also, you could use a dict comprehension instead:

            \n
            res = {col: t[col][t[col] != 0] for col in df.columns}\n
            \n soup wrap:
            res = {}
            t = df - df.shift(1)
            for col in df.columns:
                res[col] = t[col][t[col] != 0]
            

            when a value for a particular column is 1, it means that the time frame has started, when it's -1, it means it's over

            also, you could use a dict comprehension instead:

            res = {col: t[col][t[col] != 0] for col in df.columns}
            
            qid & accept id: (26552266, 26552515) query: I need to generate x random numbers in an interval from 1 to x but each number have to occur only once soup:

            Use numpy's random.permutation function, which, if given a single scalar argument x, will return a random permutation of the numbers from 0 to x. For instance:

            \n
            np.random.permutation(10)\n
            \n

            Gives:

            \n
            array([3, 2, 8, 7, 0, 9, 6, 4, 5, 1])\n
            \n

            So, in particular, np.random.permutation(70128) + 1 does precisely what you'd like.

            \n soup wrap:

            Use numpy's random.permutation function, which, if given a single scalar argument x, will return a random permutation of the numbers from 0 to x. For instance:

            np.random.permutation(10)
            

            Gives:

            array([3, 2, 8, 7, 0, 9, 6, 4, 5, 1])
            

            So, in particular, np.random.permutation(70128) + 1 does precisely what you'd like.

            qid & accept id: (26562487, 26562586) query: saving the number into the variable in every run of cycle python soup:

            I modified your code:

            \n
            def try_parse(string):\n    string2 = ""\n    for c in string:\n        if not c.isdigit() and c != '.':\n            break\n        string2 += c\n    return string2\n
            \n

            You can see that now I use string2 as a string and not an int (When the + sign is used on an int you sum, and with a string + is used for concatenation).

            \n

            Also, I used a more readable if condition.

            \n


            Update:

            \n

            Now the condition is ignoring the '.'.

            \n

            Tests:

            \n
            >>> try_parse('123')\n'123'\n>>> try_parse('12n3')\n'12'\n>>> try_parse('')\n''\n>>> try_parse('4.13n3')\n'4.13'\n
            \n


            Note

            \n

            The return type is string you can use the float() function wherever you like :)

            \n soup wrap:

            I modified your code:

            def try_parse(string):
                string2 = ""
                for c in string:
                    if not c.isdigit() and c != '.':
                        break
                    string2 += c
                return string2
            

            You can see that now I use string2 as a string and not an int (When the + sign is used on an int you sum, and with a string + is used for concatenation).

            Also, I used a more readable if condition.


            Update:

            Now the condition is ignoring the '.'.

            Tests:

            >>> try_parse('123')
            '123'
            >>> try_parse('12n3')
            '12'
            >>> try_parse('')
            ''
            >>> try_parse('4.13n3')
            '4.13'
            


            Note

            The return type is string you can use the float() function wherever you like :)

            qid & accept id: (26563683, 26564119) query: Adding a constant to a closure expression soup:

            You can create a new function that wraps the closure function.

            \n
            def make_subtract(fn, val):\n    def wrapper(x):\n        return fn(x) - val\n    return wrapper\n
            \n

            Then call it like this:

            \n
            new_a = make_subtract(a, a(100))\n
            \n soup wrap:

            You can create a new function that wraps the closure function.

            def make_subtract(fn, val):
                def wrapper(x):
                    return fn(x) - val
                return wrapper
            

            Then call it like this:

            new_a = make_subtract(a, a(100))
            
            qid & accept id: (26613341, 26613672) query: Python 3.x.x one variable spread across multiple .py files soup:

            If you want to use the variable once you can do this.

            \n
            # part2.py\n\ndef scream():\n print(sound)  \n\n# part1.py\nimport part2\n\nif __name__=="__main__":\n    part2.sound = "Yoooo"\n    part2.scream()\n\n#Output:\nYoooo\n
            \n

            If you want to be able to change the variable later. You can create a property
            \nOr simply do this:

            \n
            # part2.py\n# gvars is defined later\ndef scream():\n print(gvars.sound)\n\n\n# part1.py\nimport part2\n\nclass GameVariables:\n    pass\n\nif __name__=="__main__":\n    gvars = GameVariables()\n    part2.gvars = gvars\n    gvars.sound = "Yooo"\n    part2.scream()\n    gvars.sound = "Whaa"\n    part2.scream()\n\n#output\nYooo\nWhaa\n
            \n soup wrap:

            If you want to use the variable once you can do this.

            # part2.py
            
            def scream():
             print(sound)  
            
            # part1.py
            import part2
            
            if __name__=="__main__":
                part2.sound = "Yoooo"
                part2.scream()
            
            #Output:
            Yoooo
            

            If you want to be able to change the variable later. You can create a property
            Or simply do this:

            # part2.py
            # gvars is defined later
            def scream():
             print(gvars.sound)
            
            
            # part1.py
            import part2
            
            class GameVariables:
                pass
            
            if __name__=="__main__":
                gvars = GameVariables()
                part2.gvars = gvars
                gvars.sound = "Yooo"
                part2.scream()
                gvars.sound = "Whaa"
                part2.scream()
            
            #output
            Yooo
            Whaa
            
            qid & accept id: (26629789, 26629812) query: Creating multiple copies of list elements soup:

            Using a list comprehension with a nested loop:

            \n
            unrolled = [c for c, count in weighted for _ in range(count)]\n
            \n

            If you are using Python 2 you could use xrange() instead.

            \n

            If you like itertools, you can use itertools.chain.from_iterable() to make this into a lazy iterable:

            \n
            from itertools import chain\n\nchain.from_iterable([c] * count for c, count in weighted)\n
            \n

            Demo:

            \n
            >>> weighted = [ ("a", 3), ("b", 1), ("c", 4) ]\n>>> [c for c, count in weighted for _ in range(count)]\n['a', 'a', 'a', 'b', 'c', 'c', 'c', 'c']\n>>> from itertools import chain\n>>> list(chain.from_iterable([c] * count for c, count in weighted))\n['a', 'a', 'a', 'b', 'c', 'c', 'c', 'c']\n
            \n

            I used list() to turn the chain iterator back into a sequence.

            \n soup wrap:

            Using a list comprehension with a nested loop:

            unrolled = [c for c, count in weighted for _ in range(count)]
            

            If you are using Python 2 you could use xrange() instead.

            If you like itertools, you can use itertools.chain.from_iterable() to make this into a lazy iterable:

            from itertools import chain
            
            chain.from_iterable([c] * count for c, count in weighted)
            

            Demo:

            >>> weighted = [ ("a", 3), ("b", 1), ("c", 4) ]
            >>> [c for c, count in weighted for _ in range(count)]
            ['a', 'a', 'a', 'b', 'c', 'c', 'c', 'c']
            >>> from itertools import chain
            >>> list(chain.from_iterable([c] * count for c, count in weighted))
            ['a', 'a', 'a', 'b', 'c', 'c', 'c', 'c']
            

            I used list() to turn the chain iterator back into a sequence.

            qid & accept id: (26644002, 26644203) query: How to create a dictionary with columns given as keys and values soup:

            For People who just want to create a simple dictionary from colums without recurring keyvalues this should work:

            \n
             edges = open('romEdges.txt')\n dict = {line[:1]:line[1:] for line in edges}\n print dict\n edges.close()\n
            \n

            now you have possibly some whitespaces or Backspaces in the values, then you can replace() that with empty strings:

            \n
             edges = open('romEdges.txt')\n dict = {line[:1]:line[1:].split()[0] for line in edges}\n print dict\n edges.close()\n
            \n

            if you have multiple colums, and you want to have a list out of the following colums to that keyvalue:

            \n
             edges = open('romEdges.txt')\n dict = {line[:1]:line[1:].split() for line in edges}\n print dict\n edges.close()\n
            \n soup wrap:

            For People who just want to create a simple dictionary from colums without recurring keyvalues this should work:

             edges = open('romEdges.txt')
             dict = {line[:1]:line[1:] for line in edges}
             print dict
             edges.close()
            

            now you have possibly some whitespaces or Backspaces in the values, then you can replace() that with empty strings:

             edges = open('romEdges.txt')
             dict = {line[:1]:line[1:].split()[0] for line in edges}
             print dict
             edges.close()
            

            if you have multiple colums, and you want to have a list out of the following colums to that keyvalue:

             edges = open('romEdges.txt')
             dict = {line[:1]:line[1:].split() for line in edges}
             print dict
             edges.close()
            
            qid & accept id: (26666329, 26666388) query: Convert a hashcode to its binary representation soup:

            You can do this via the bin() builtin function.

            \n

            So for your number this would look like this:

            \n
            bin(260768607) # Result: '0b1111100010110000001101011111'\n
            \n

            Hope this helped, good luck!

            \n

            Edit: If you need too remove the 0b part, you can use this code:

            \n
            int(str(temp)[2:])\n
            \n soup wrap:

            You can do this via the bin() builtin function.

            So for your number this would look like this:

            bin(260768607) # Result: '0b1111100010110000001101011111'
            

            Hope this helped, good luck!

            Edit: If you need too remove the 0b part, you can use this code:

            int(str(temp)[2:])
            
            qid & accept id: (26683175, 26683552) query: How can I keep the indentation between lines? soup:
            import re\n\nINDENT_RE = re.compile(r'^\s*$')\n\ndef matching_indent(line, pattern):\n    """\n    Returns indent if line matches pattern, else returns None.\n    """\n    if line.endswith(pattern):\n        indent = line[:-len(pattern)]\n        if INDENT_RE.match(indent):\n            return indent\n    return None\n\ndef replace_line(lines, pattern, replacements):\n    for line in lines:\n        indent = matching_indent(line, pattern)\n        if indent is None:\n            yield line\n        else:\n            for replacement in replacements:\n                yield indent + replacement\n
            \n

            You can use it like this:

            \n
            code = '''line 1\n    __LINE TO CHANGE__\nline 3'''\n\nprint('\n'.join(replace_line(\n    code.split('\n'),                           # one string per line\n    '__LINE TO CHANGE__',                       # the string to replace\n    ["added code line a", "added code line b"]  # the strings to replace with\n)))\n
            \n

            The output:

            \n
            line 1\n    added code line a\n    added code line b\nline 3\n
            \n

            You can also use this with a file, by doing something like:

            \n
            with open("input") as f:\n    print(''.join(replace_line(f, 'some pattern\n', ['foo\n', 'bar\n'])))\n
            \n

            Note that here I've added a '\n' to the ends of the pattern and replacements. If you intend on using this with the output of readlines (which includes a \n at the end of each line) then you may want to adjust the function to expect them and do this for you.

            \n soup wrap:
            import re
            
            INDENT_RE = re.compile(r'^\s*$')
            
            def matching_indent(line, pattern):
                """
                Returns indent if line matches pattern, else returns None.
                """
                if line.endswith(pattern):
                    indent = line[:-len(pattern)]
                    if INDENT_RE.match(indent):
                        return indent
                return None
            
            def replace_line(lines, pattern, replacements):
                for line in lines:
                    indent = matching_indent(line, pattern)
                    if indent is None:
                        yield line
                    else:
                        for replacement in replacements:
                            yield indent + replacement
            

            You can use it like this:

            code = '''line 1
                __LINE TO CHANGE__
            line 3'''
            
            print('\n'.join(replace_line(
                code.split('\n'),                           # one string per line
                '__LINE TO CHANGE__',                       # the string to replace
                ["added code line a", "added code line b"]  # the strings to replace with
            )))
            

            The output:

            line 1
                added code line a
                added code line b
            line 3
            

            You can also use this with a file, by doing something like:

            with open("input") as f:
                print(''.join(replace_line(f, 'some pattern\n', ['foo\n', 'bar\n'])))
            

            Note that here I've added a '\n' to the ends of the pattern and replacements. If you intend on using this with the output of readlines (which includes a \n at the end of each line) then you may want to adjust the function to expect them and do this for you.

            qid & accept id: (26694500, 26694599) query: How to convert text from a file into a list in Python? soup:

            It seems like you've just copied and pasted the textual representation (within Python) of your data into a text file, and now you want to take that text and convert it (from text) back into a Python data structure. This is an overly complicated way of doing that, I think. Parsers and the grammar involved in parsing are fairly complicated, and it would be very difficult for you to reproduce that. Plus I just don't think it's a valuable use of your time.

            \n

            That leaves you with two choices:

            \n
              \n
            1. If you actually want to serialize your python types to disk, I would strongly recommend trying to use the pickle library. This is a little complicated, but once you get the hang of it you should just be able to pickle and pickle objects of totally arbitrary complexity, rather than trying to parse them from plain old text.

            2. \n
            3. If you don't do that, you should find a better, more predictable, more easily-parsed way of saving the data to text.

            4. \n
            5. Within your program itself, you should create classes to more easily encapsulate your data - you have a list of tuples of strings and integers and lists of integers. It's a little much to walk through absent any object-oriented structure.

            6. \n
            \n

            For example, if you were to use a different textual representation that's not tied to the way python types look:

            \n
            name:Zara highscore:9 averagescore:6 attempt1:3 attempt2:9 attempt3:6\nname:Albert highscore:6 averagescore:2 attempt1:6 attempt2:0 attempt3:0\n
            \n

            Or if you were to use XML, you could save your document something like this:

            \n
            \n           \n    \n    \n        ...\n\n
            \n

            And you could use xml.etree.ElementTree to walk through the nodes and pick out each piece of information you needed.

            \n

            I guess the biggest question about this, though, isn't why you're storing data the way you are but why you're storing a lot of it in the first place. 40% of your data - all high scores and average scores - has no reason at all to be stored. These figures are trivially calculated if you have access to the three attempts, and create so, so much more work for you than just using (one + two + three) / 3 or min([one, two, three]).

            \n soup wrap:

            It seems like you've just copied and pasted the textual representation (within Python) of your data into a text file, and now you want to take that text and convert it (from text) back into a Python data structure. This is an overly complicated way of doing that, I think. Parsers and the grammar involved in parsing are fairly complicated, and it would be very difficult for you to reproduce that. Plus I just don't think it's a valuable use of your time.

            That leaves you with two choices:

            1. If you actually want to serialize your python types to disk, I would strongly recommend trying to use the pickle library. This is a little complicated, but once you get the hang of it you should just be able to pickle and pickle objects of totally arbitrary complexity, rather than trying to parse them from plain old text.

            2. If you don't do that, you should find a better, more predictable, more easily-parsed way of saving the data to text.

            3. Within your program itself, you should create classes to more easily encapsulate your data - you have a list of tuples of strings and integers and lists of integers. It's a little much to walk through absent any object-oriented structure.

            For example, if you were to use a different textual representation that's not tied to the way python types look:

            name:Zara highscore:9 averagescore:6 attempt1:3 attempt2:9 attempt3:6
            name:Albert highscore:6 averagescore:2 attempt1:6 attempt2:0 attempt3:0
            

            Or if you were to use XML, you could save your document something like this:

            
                       
                
                
                    ...
            
            

            And you could use xml.etree.ElementTree to walk through the nodes and pick out each piece of information you needed.

            I guess the biggest question about this, though, isn't why you're storing data the way you are but why you're storing a lot of it in the first place. 40% of your data - all high scores and average scores - has no reason at all to be stored. These figures are trivially calculated if you have access to the three attempts, and create so, so much more work for you than just using (one + two + three) / 3 or min([one, two, three]).

            qid & accept id: (26700006, 26700033) query: Print tree without recursion soup:

            You'd use a queue to track nodes still to process, adding to it as you process them:

            \n
            def print_nonrec_breathfirst(node):\n    queue = [node]\n    while queue:\n        node, queue = queue[0], queue[1:]\n        print node\n        for c in node.children:\n            queue.append(c)\n
            \n

            or you could use a stack, processing children first:

            \n
            def print_nonrec_depthfirst(node):\n    stack = [node]\n    while stack:\n        node = stack.pop()\n        print node\n        for c in node.children:\n            stack.append(c)\n
            \n

            Either way, you keep track of what nodes you haven't yet printed, and as you process nodes you also figure out what child nodes you still need to process.

            \n

            Demo:

            \n
            >>> print_nonrec_breathfirst(n1)\n1\n2\n3\n4\n5\n6\n7\n>>> print_nonrec_depthfirst(n1)\n1\n4\n3\n6\n7\n5\n2\n
            \n soup wrap:

            You'd use a queue to track nodes still to process, adding to it as you process them:

            def print_nonrec_breathfirst(node):
                queue = [node]
                while queue:
                    node, queue = queue[0], queue[1:]
                    print node
                    for c in node.children:
                        queue.append(c)
            

            or you could use a stack, processing children first:

            def print_nonrec_depthfirst(node):
                stack = [node]
                while stack:
                    node = stack.pop()
                    print node
                    for c in node.children:
                        stack.append(c)
            

            Either way, you keep track of what nodes you haven't yet printed, and as you process nodes you also figure out what child nodes you still need to process.

            Demo:

            >>> print_nonrec_breathfirst(n1)
            1
            2
            3
            4
            5
            6
            7
            >>> print_nonrec_depthfirst(n1)
            1
            4
            3
            6
            7
            5
            2
            
            qid & accept id: (26705235, 26705643) query: How do I implement SelectionSort and InsertionSort on a linked list in Python? soup:

            I tried to implement insertionSort, The code is readable. SelectionSort should be similar, try to implement it.

            \n
            def insertionSort(h):\n    if h == None:\n        return None\n    #Make the first node the start of the sorted list.\n    sortedList= h\n    h=h.next\n    sortedList.next= None\n    while h != None:\n        curr= h\n        h=h.next\n        if curr.data search.next.data:\n                search= search.next\n            #current goes after search.\n            curr.next= search.next\n            search.next= curr\n    return sortedList\n\ndef printList(d):\n    s=''\n    while d:\n        s+=str(d.data)+"->"\n        d=d.next\n    print s[:-2]\n\nl= unorderedList()\nl.add(10)\nl.add(12)\nl.add(1)\nl.add(4)\nh= l.head\nprintList(h)\n\nresult= insertionSort(l.head)\nd= result\nprintList(d)\n
            \n

            Output:

            \n
            4->1->12->10\n1->4->10->12\n
            \n soup wrap:

            I tried to implement insertionSort, The code is readable. SelectionSort should be similar, try to implement it.

            def insertionSort(h):
                if h == None:
                    return None
                #Make the first node the start of the sorted list.
                sortedList= h
                h=h.next
                sortedList.next= None
                while h != None:
                    curr= h
                    h=h.next
                    if curr.data search.next.data:
                            search= search.next
                        #current goes after search.
                        curr.next= search.next
                        search.next= curr
                return sortedList
            
            def printList(d):
                s=''
                while d:
                    s+=str(d.data)+"->"
                    d=d.next
                print s[:-2]
            
            l= unorderedList()
            l.add(10)
            l.add(12)
            l.add(1)
            l.add(4)
            h= l.head
            printList(h)
            
            result= insertionSort(l.head)
            d= result
            printList(d)
            

            Output:

            4->1->12->10
            1->4->10->12
            
            qid & accept id: (26716576, 26716739) query: Auto Incrementing natural keys with django / postgres soup:

            If you want true composed primary keys, you might want to use django-compositepks, but that is not ideal. You might be better off breaking DRY and recording the number (see the category_auto_key field and default).

            \n

            Transactions will solve it this way:

            \n
            from django.db import transaction\n\nclass Group(models.model):\n    # your fields\n    img_count = models.IntegerField()\n\n    @transaction.atomic\n    def next_sku(self):\n        self.img_count += 1\n        self.save()\n        return self.img_count\n\nclass Photo(models.Model):\n    # your fields\n    category_auto_key = models.IntegerField(editable=False)\n\n    def category_image(self):\n        return self.group.abbv+"-"+str(self.category_auto_key)\n\n    def save(self, *args, **kwargs):\n        if not self.category_auto_key:\n            self.category_auto_key = self.group.next_sku()\n        super(Photo, self).save(*args, **kwargs)\n
            \n

            When you need this in your templates, just enclose it in double brackets:

            \n
            {{ photo.category_image }}\n
            \n soup wrap:

            If you want true composed primary keys, you might want to use django-compositepks, but that is not ideal. You might be better off breaking DRY and recording the number (see the category_auto_key field and default).

            Transactions will solve it this way:

            from django.db import transaction
            
            class Group(models.model):
                # your fields
                img_count = models.IntegerField()
            
                @transaction.atomic
                def next_sku(self):
                    self.img_count += 1
                    self.save()
                    return self.img_count
            
            class Photo(models.Model):
                # your fields
                category_auto_key = models.IntegerField(editable=False)
            
                def category_image(self):
                    return self.group.abbv+"-"+str(self.category_auto_key)
            
                def save(self, *args, **kwargs):
                    if not self.category_auto_key:
                        self.category_auto_key = self.group.next_sku()
                    super(Photo, self).save(*args, **kwargs)
            

            When you need this in your templates, just enclose it in double brackets:

            {{ photo.category_image }}
            
            qid & accept id: (26725669, 26733999) query: How to get a list of datatypes on Django model? soup:

            You can use:

            \n
            raw_list = c._meta.get_fields_with_model()    \n
            \n

            When you do raw_list = c._meta.get_fields_with_model() raw_list contains something like:

            \n
            ((, None), (, None) etc...\n
            \n

            To get a "parsed" list that only contains the name of the datatype we can do:

            \n
            [item[0].__class__.__name__ for item in raw_list._meta.get_fields_with_model()]\n
            \n

            or using get_internal_type:

            \n
            [item[0].get_internal_type() for item in raw_list._meta.get_fields_with_model()]\n
            \n

            In both ways you'll get a list like :

            \n
            ['AutoField', 'TextField', 'TextField', 'FloatField', 'CharField', 'BooleanField', 'IntegerField', 'ImageField', 'BooleanField'...\n
            \n

            Just the code:

            \n
            raw_list = c._meta.get_fields_with_model() \nparsed_list = [item[0].__class__.__name__ for item in raw_list._meta.get_fields_with_model()]\n
            \n soup wrap:

            You can use:

            raw_list = c._meta.get_fields_with_model()    
            

            When you do raw_list = c._meta.get_fields_with_model() raw_list contains something like:

            ((, None), (, None) etc...
            

            To get a "parsed" list that only contains the name of the datatype we can do:

            [item[0].__class__.__name__ for item in raw_list._meta.get_fields_with_model()]
            

            or using get_internal_type:

            [item[0].get_internal_type() for item in raw_list._meta.get_fields_with_model()]
            

            In both ways you'll get a list like :

            ['AutoField', 'TextField', 'TextField', 'FloatField', 'CharField', 'BooleanField', 'IntegerField', 'ImageField', 'BooleanField'...
            

            Just the code:

            raw_list = c._meta.get_fields_with_model() 
            parsed_list = [item[0].__class__.__name__ for item in raw_list._meta.get_fields_with_model()]
            
            qid & accept id: (26742987, 26743328) query: Calling/Passing dictionary objects in python soup:

            Python has dictionary literals, which are both faster and much nicer to read than constructing it by setting keys individually.

            \n
            def dict_function(self):\n    return {\n        'sky': 'blue'\n        'clouds': 'white'\n        'grass': 'green'\n    }\n
            \n

            As mentioned in the comments, if the dictionary is static like this, there is no reason to have a function, just store the dictionary and access it.

            \n

            The only reason you would want to recreate the dictionary each time would be if you intend to mutate the dictionary each time you use it, without that affecting future uses. It sounds like your use case, however, just requires you access the dictionary in other functions.

            \n
            class Something:\n    def __init__(self):\n        self.colours = {\n            'sky': 'blue'\n            'clouds': 'white'\n            'grass': 'green'\n        }\n\n    def draw():\n        screen.rect(100, 100, 200, 300, self.colours["sky"])\n        ...\n
            \n soup wrap:

            Python has dictionary literals, which are both faster and much nicer to read than constructing it by setting keys individually.

            def dict_function(self):
                return {
                    'sky': 'blue'
                    'clouds': 'white'
                    'grass': 'green'
                }
            

            As mentioned in the comments, if the dictionary is static like this, there is no reason to have a function, just store the dictionary and access it.

            The only reason you would want to recreate the dictionary each time would be if you intend to mutate the dictionary each time you use it, without that affecting future uses. It sounds like your use case, however, just requires you access the dictionary in other functions.

            class Something:
                def __init__(self):
                    self.colours = {
                        'sky': 'blue'
                        'clouds': 'white'
                        'grass': 'green'
                    }
            
                def draw():
                    screen.rect(100, 100, 200, 300, self.colours["sky"])
                    ...
            
            qid & accept id: (26787410, 26787543) query: to delete records from a file in python soup:
            with open(your_f) as f:\n    lines = f.readlines()\n    for ind, line in enumerate(lines): \n        if your condition: # if line contains a match \n            lines[ind] ="" # set line to empty string\n    with open(your_f,"w") as f: # reopen with w to overwrite\n        f.writelines(lines) # write updated lines\n
            \n

            For example removing a line from a txt file that starts with 55:

            \n
            with open("in.txt") as f:\n    lines = f.readlines()\n    for ind, line in enumerate(lines):\n        if line.startswith("55"):\n            lines[ind] = ""\n    with open("in.txt","w") as f:\n        f.writelines(lines)\n
            \n

            input:

            \n
            foo\nbar\n55 foobar\n44 foo\n
            \n

            output:

            \n
            foo\nbar\n44 foo\n
            \n soup wrap:
            with open(your_f) as f:
                lines = f.readlines()
                for ind, line in enumerate(lines): 
                    if your condition: # if line contains a match 
                        lines[ind] ="" # set line to empty string
                with open(your_f,"w") as f: # reopen with w to overwrite
                    f.writelines(lines) # write updated lines
            

            For example removing a line from a txt file that starts with 55:

            with open("in.txt") as f:
                lines = f.readlines()
                for ind, line in enumerate(lines):
                    if line.startswith("55"):
                        lines[ind] = ""
                with open("in.txt","w") as f:
                    f.writelines(lines)
            

            input:

            foo
            bar
            55 foobar
            44 foo
            

            output:

            foo
            bar
            44 foo
            
            qid & accept id: (26812595, 26812635) query: How can I properly join these strings together (by column then row)? soup:

            if you want to use join:

            \n
            print '\n'.join( ''.join('#' for column in range(10)) for row in range(10))\n
            \n

            but much easier would be:

            \n
            print ('#'*10 + '\n')*10\n
            \n soup wrap:

            if you want to use join:

            print '\n'.join( ''.join('#' for column in range(10)) for row in range(10))
            

            but much easier would be:

            print ('#'*10 + '\n')*10
            
            qid & accept id: (26826838, 26827176) query: How to match phone number prefixes? soup:
            >>> data = [['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks'],\n... ['AAA', 'Some Mobile', '111', '12, 23, 34, 46','Some remarks'],\n... ['AAA', 'Some city A', '111', '55, 56, 57, 51', 'Some more remarks'],\n... ['BBB', 'Some city B', '222', '234, 345, 456', 'Other remarks']]\n>>> \n>>> op=[data[0]]\n>>> for i in data[1:]:\n...    for j in i.pop(3).split(','):\n...       op.append([k+j.strip() if i.index(k)==2 else k for k in i])\n... \n\n>>> for i in op:\n...    print i\n... \n['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks']\n['AAA', 'Some Mobile', '11112', 'Some remarks']\n['AAA', 'Some Mobile', '11123', 'Some remarks']\n['AAA', 'Some Mobile', '11134', 'Some remarks']\n['AAA', 'Some Mobile', '11146', 'Some remarks']\n['AAA', 'Some city A', '11155', 'Some more remarks']\n['AAA', 'Some city A', '11156', 'Some more remarks']\n['AAA', 'Some city A', '11157', 'Some more remarks']\n['AAA', 'Some city A', '11151', 'Some more remarks']\n['BBB', 'Some city B', '222234', 'Other remarks']\n['BBB', 'Some city B', '222345', 'Other remarks']\n['BBB', 'Some city B', '222456', 'Other remarks']\n
            \n

            Solution for your updated problem:

            \n
            >>> data = [['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks'],\n...  ['AAA', 'Some Mobile', '111', '12, 23, 34, 46','Some remarks'],\n...  ['AAA', 'Some city A', '111', '55, 56, 57, 51', 'Some more remarks'],\n...  ['BBB', 'Some city B', '222', '234, 345, 456', 'Other remarks']]\n>>>  \n>>> op=[data[0]]\n>>> for i in data[1:]:\n...    for id,j in enumerate(i.pop(3).split(',')):\n...       k=i[:]\n...       k.insert(3,i[2]+j.strip())\n...       op.append(k)\n... \n>>> for i in op:\n...    print i\n... \n['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks']\n['AAA', 'Some Mobile', '111', '11112', 'Some remarks']\n['AAA', 'Some Mobile', '111', '11123', 'Some remarks']\n['AAA', 'Some Mobile', '111', '11134', 'Some remarks']\n['AAA', 'Some Mobile', '111', '11146', 'Some remarks']\n['AAA', 'Some city A', '111', '11155', 'Some more remarks']\n['AAA', 'Some city A', '111', '11156', 'Some more remarks']\n['AAA', 'Some city A', '111', '11157', 'Some more remarks']\n['AAA', 'Some city A', '111', '11151', 'Some more remarks']\n['BBB', 'Some city B', '222', '222234', 'Other remarks']\n['BBB', 'Some city B', '222', '222345', 'Other remarks']\n['BBB', 'Some city B', '222', '222456', 'Other remarks']\n
            \n soup wrap:
            >>> data = [['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks'],
            ... ['AAA', 'Some Mobile', '111', '12, 23, 34, 46','Some remarks'],
            ... ['AAA', 'Some city A', '111', '55, 56, 57, 51', 'Some more remarks'],
            ... ['BBB', 'Some city B', '222', '234, 345, 456', 'Other remarks']]
            >>> 
            >>> op=[data[0]]
            >>> for i in data[1:]:
            ...    for j in i.pop(3).split(','):
            ...       op.append([k+j.strip() if i.index(k)==2 else k for k in i])
            ... 
            
            >>> for i in op:
            ...    print i
            ... 
            ['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks']
            ['AAA', 'Some Mobile', '11112', 'Some remarks']
            ['AAA', 'Some Mobile', '11123', 'Some remarks']
            ['AAA', 'Some Mobile', '11134', 'Some remarks']
            ['AAA', 'Some Mobile', '11146', 'Some remarks']
            ['AAA', 'Some city A', '11155', 'Some more remarks']
            ['AAA', 'Some city A', '11156', 'Some more remarks']
            ['AAA', 'Some city A', '11157', 'Some more remarks']
            ['AAA', 'Some city A', '11151', 'Some more remarks']
            ['BBB', 'Some city B', '222234', 'Other remarks']
            ['BBB', 'Some city B', '222345', 'Other remarks']
            ['BBB', 'Some city B', '222456', 'Other remarks']
            

            Solution for your updated problem:

            >>> data = [['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks'],
            ...  ['AAA', 'Some Mobile', '111', '12, 23, 34, 46','Some remarks'],
            ...  ['AAA', 'Some city A', '111', '55, 56, 57, 51', 'Some more remarks'],
            ...  ['BBB', 'Some city B', '222', '234, 345, 456', 'Other remarks']]
            >>>  
            >>> op=[data[0]]
            >>> for i in data[1:]:
            ...    for id,j in enumerate(i.pop(3).split(',')):
            ...       k=i[:]
            ...       k.insert(3,i[2]+j.strip())
            ...       op.append(k)
            ... 
            >>> for i in op:
            ...    print i
            ... 
            ['Country', 'Destination', 'Country Code', 'Destination Code', 'Remarks']
            ['AAA', 'Some Mobile', '111', '11112', 'Some remarks']
            ['AAA', 'Some Mobile', '111', '11123', 'Some remarks']
            ['AAA', 'Some Mobile', '111', '11134', 'Some remarks']
            ['AAA', 'Some Mobile', '111', '11146', 'Some remarks']
            ['AAA', 'Some city A', '111', '11155', 'Some more remarks']
            ['AAA', 'Some city A', '111', '11156', 'Some more remarks']
            ['AAA', 'Some city A', '111', '11157', 'Some more remarks']
            ['AAA', 'Some city A', '111', '11151', 'Some more remarks']
            ['BBB', 'Some city B', '222', '222234', 'Other remarks']
            ['BBB', 'Some city B', '222', '222345', 'Other remarks']
            ['BBB', 'Some city B', '222', '222456', 'Other remarks']
            
            qid & accept id: (26830752, 26830867) query: Making new column in pandas DataFrame based on filter soup:

            One way would be to create a column of boolean values like this:

            \n
            >>> df['filter'] = (df['a'] >= 20) & (df['b'] >= 20)\n    a   b   c filter\n0   1  50   1  False\n1  10  60  30  False\n2  20  55   1   True\n3   3   0   0  False\n4  10   0   0  False\n
            \n

            You can then change the boolean values to 'pass' or 'fail' using replace:

            \n
            >>> df['filter'].astype(object).replace({False: 'fail', True: 'pass'})\n0    fail\n1    fail\n2    pass\n3    fail\n4    fail\n
            \n
            \n

            You can extend this to more columns using all. For example, to find rows across the columns with entries greater than 0:

            \n
            >>> cols = ['a', 'b', 'c'] # a list of columns to test\n>>> df[cols] > 0 \n      a      b      c\n0  True   True   True\n1  True   True   True\n2  True   True   True\n3  True  False  False\n4  True  False  False\n
            \n

            Using all across axis 1 of this DataFrame creates the new column:

            \n
            >>> (df[cols] > 0).all(axis=1)\n0     True\n1     True\n2     True\n3    False\n4    False\ndtype: bool\n
            \n soup wrap:

            One way would be to create a column of boolean values like this:

            >>> df['filter'] = (df['a'] >= 20) & (df['b'] >= 20)
                a   b   c filter
            0   1  50   1  False
            1  10  60  30  False
            2  20  55   1   True
            3   3   0   0  False
            4  10   0   0  False
            

            You can then change the boolean values to 'pass' or 'fail' using replace:

            >>> df['filter'].astype(object).replace({False: 'fail', True: 'pass'})
            0    fail
            1    fail
            2    pass
            3    fail
            4    fail
            

            You can extend this to more columns using all. For example, to find rows across the columns with entries greater than 0:

            >>> cols = ['a', 'b', 'c'] # a list of columns to test
            >>> df[cols] > 0 
                  a      b      c
            0  True   True   True
            1  True   True   True
            2  True   True   True
            3  True  False  False
            4  True  False  False
            

            Using all across axis 1 of this DataFrame creates the new column:

            >>> (df[cols] > 0).all(axis=1)
            0     True
            1     True
            2     True
            3    False
            4    False
            dtype: bool
            
            qid & accept id: (26856378, 26860485) query: Python thread-safe access without blocking or uncontrolled queue growth? soup:

            Normally I would try to use the built-in Queue class, but this situation may be too complicated for this.

            \n

            If I understand correctly, you need to store exactly three seconds worth of data, regardless of the number of requests made by the user. You can solve this problem by making use of the fact that Python always accesses objects through references. Assigning a reference is an atomic action in Python, so you can safely use a singleton to store the latest result. There are two conditions:

            \n
              \n
            1. There must be no assignments to members of the result after it has been published
            2. \n
            3. The users of the result must retrieve the results once and only once during a request: if the results are retrieved a second time, the result will probably have changed.
            4. \n
            \n

            So use a singleton to enforce this behaviour:

            \n
            class Buffer:\n    ''' A simple buffer that stores exactly one value '''\n    latest = None\n    @staticmethod\n    def onNewReading(*args, **kwds): # or any parameters one fancies\n        # Pack the results in a single object and store it\n        Result.latest = (args, kwds)\n    @staticmethod\n    def onUserRequest():\n        return Result.latest\n
            \n

            Writing a value is as follows:

            \n
            def process():\n    # Do the measurement\n    # Do the calculation\n    Buffer.onNewReading(data1, data2, etc)\n
            \n

            Using the value:

            \n
            def handleRequest():\n    results = Buffer.onUserRequest()\n    # Format a response, using the results\n    if not results:\n        return 'No data available'  # or any other useful error message\n    return response(results[0][0], results[0][1])\n
            \n soup wrap:

            Normally I would try to use the built-in Queue class, but this situation may be too complicated for this.

            If I understand correctly, you need to store exactly three seconds worth of data, regardless of the number of requests made by the user. You can solve this problem by making use of the fact that Python always accesses objects through references. Assigning a reference is an atomic action in Python, so you can safely use a singleton to store the latest result. There are two conditions:

            1. There must be no assignments to members of the result after it has been published
            2. The users of the result must retrieve the results once and only once during a request: if the results are retrieved a second time, the result will probably have changed.

            So use a singleton to enforce this behaviour:

            class Buffer:
                ''' A simple buffer that stores exactly one value '''
                latest = None
                @staticmethod
                def onNewReading(*args, **kwds): # or any parameters one fancies
                    # Pack the results in a single object and store it
                    Result.latest = (args, kwds)
                @staticmethod
                def onUserRequest():
                    return Result.latest
            

            Writing a value is as follows:

            def process():
                # Do the measurement
                # Do the calculation
                Buffer.onNewReading(data1, data2, etc)
            

            Using the value:

            def handleRequest():
                results = Buffer.onUserRequest()
                # Format a response, using the results
                if not results:
                    return 'No data available'  # or any other useful error message
                return response(results[0][0], results[0][1])
            
            qid & accept id: (26871083, 26872889) query: How can I vectorize the averaging of 2x2 sub-arrays of numpy array? soup:

            If we form the reshaped matrix y = x.reshape(2,2,3,2), then the (i,j) 2x2 submatrix is given by y[i,:,j,:]. E.g.:

            \n
            In [340]: x\nOut[340]: \narray([[  0.,   1.,   2.,   3.,   4.,   5.],\n       [  6.,   7.,   8.,   9.,  10.,  11.],\n       [ 12.,  13.,  14.,  15.,  16.,  17.],\n       [ 18.,  19.,  20.,  21.,  22.,  23.]])\n\nIn [341]: y = x.reshape(2,2,3,2)\n\nIn [342]: y[0,:,0,:]\nOut[342]: \narray([[ 0.,  1.],\n       [ 6.,  7.]])\n\nIn [343]: y[1,:,2,:]\nOut[343]: \narray([[ 16.,  17.],\n       [ 22.,  23.]])\n
            \n

            To get the mean of the 2x2 submatrices, use the mean method, with axis=(1,3):

            \n
            In [344]: y.mean(axis=(1,3))\nOut[344]: \narray([[  3.5,   5.5,   7.5],\n       [ 15.5,  17.5,  19.5]])\n
            \n

            If you are using an older version of numpy that doesn't support using a tuple for the axis, you could do:

            \n
            In [345]: y.mean(axis=1).mean(axis=-1)\nOut[345]: \narray([[  3.5,   5.5,   7.5],\n       [ 15.5,  17.5,  19.5]])\n
            \n

            See the link given by @dashesy in a comment for more background on the reshaping "trick".

            \n
            \n

            To generalize this to a 2-d array with shape (m, n), where m and n are even, use

            \n
            y = x.reshape(x.shape[0]/2, 2, x.shape[1], 2)\n
            \n

            y can then be interpreted as an array of 2x2 arrays. The first and third index slots of the 4-d array act as the indices that select one of the 2x2 blocks. To get the upper left 2x2 block, use y[0, :, 0, :]; to the block in the second row and third column of blocks, use y[1, :, 2, :]; and in general, to acces block (j, k), use y[j, :, k, :].

            \n

            To compute the reduced array of averages of these blocks, use the mean method, with axis=(1, 3) (i.e. average over axes 1 and 3):

            \n
            avg = y.mean(axis=(1, 3))\n
            \n

            Here's an example where x has shape (8, 10), so the array of averages of the 2x2 blocks has shape (4, 5):

            \n
            In [10]: np.random.seed(123)\n\nIn [11]: x = np.random.randint(0, 4, size=(8, 10))\n\nIn [12]: x\nOut[12]: \narray([[2, 1, 2, 2, 0, 2, 2, 1, 3, 2],\n       [3, 1, 2, 1, 0, 1, 2, 3, 1, 0],\n       [2, 0, 3, 1, 3, 2, 1, 0, 0, 0],\n       [0, 1, 3, 3, 2, 0, 3, 2, 0, 3],\n       [0, 1, 0, 3, 1, 3, 0, 0, 0, 2],\n       [1, 1, 2, 2, 3, 2, 1, 0, 0, 3],\n       [2, 1, 0, 3, 2, 2, 2, 2, 1, 2],\n       [0, 3, 3, 3, 1, 0, 2, 0, 2, 1]])\n\nIn [13]: y = x.reshape(x.shape[0]/2, 2, x.shape[1]/2, 2)\n
            \n

            Take a look at a couple of the 2x2 blocks:

            \n
            In [14]: y[0, :, 0, :]\nOut[14]: \narray([[2, 1],\n       [3, 1]])\n\nIn [15]: y[1, :, 2, :]\nOut[15]: \narray([[3, 2],\n       [2, 0]])\n
            \n

            Compute the averages of the blocks:

            \n
            In [16]: avg = y.mean(axis=(1, 3))\n\nIn [17]: avg\nOut[17]: \narray([[ 1.75,  1.75,  0.75,  2.  ,  1.5 ],\n       [ 0.75,  2.5 ,  1.75,  1.5 ,  0.75],\n       [ 0.75,  1.75,  2.25,  0.25,  1.25],\n       [ 1.5 ,  2.25,  1.25,  1.5 ,  1.5 ]])\n
            \n soup wrap:

            If we form the reshaped matrix y = x.reshape(2,2,3,2), then the (i,j) 2x2 submatrix is given by y[i,:,j,:]. E.g.:

            In [340]: x
            Out[340]: 
            array([[  0.,   1.,   2.,   3.,   4.,   5.],
                   [  6.,   7.,   8.,   9.,  10.,  11.],
                   [ 12.,  13.,  14.,  15.,  16.,  17.],
                   [ 18.,  19.,  20.,  21.,  22.,  23.]])
            
            In [341]: y = x.reshape(2,2,3,2)
            
            In [342]: y[0,:,0,:]
            Out[342]: 
            array([[ 0.,  1.],
                   [ 6.,  7.]])
            
            In [343]: y[1,:,2,:]
            Out[343]: 
            array([[ 16.,  17.],
                   [ 22.,  23.]])
            

            To get the mean of the 2x2 submatrices, use the mean method, with axis=(1,3):

            In [344]: y.mean(axis=(1,3))
            Out[344]: 
            array([[  3.5,   5.5,   7.5],
                   [ 15.5,  17.5,  19.5]])
            

            If you are using an older version of numpy that doesn't support using a tuple for the axis, you could do:

            In [345]: y.mean(axis=1).mean(axis=-1)
            Out[345]: 
            array([[  3.5,   5.5,   7.5],
                   [ 15.5,  17.5,  19.5]])
            

            See the link given by @dashesy in a comment for more background on the reshaping "trick".


            To generalize this to a 2-d array with shape (m, n), where m and n are even, use

            y = x.reshape(x.shape[0]/2, 2, x.shape[1], 2)
            

            y can then be interpreted as an array of 2x2 arrays. The first and third index slots of the 4-d array act as the indices that select one of the 2x2 blocks. To get the upper left 2x2 block, use y[0, :, 0, :]; to the block in the second row and third column of blocks, use y[1, :, 2, :]; and in general, to acces block (j, k), use y[j, :, k, :].

            To compute the reduced array of averages of these blocks, use the mean method, with axis=(1, 3) (i.e. average over axes 1 and 3):

            avg = y.mean(axis=(1, 3))
            

            Here's an example where x has shape (8, 10), so the array of averages of the 2x2 blocks has shape (4, 5):

            In [10]: np.random.seed(123)
            
            In [11]: x = np.random.randint(0, 4, size=(8, 10))
            
            In [12]: x
            Out[12]: 
            array([[2, 1, 2, 2, 0, 2, 2, 1, 3, 2],
                   [3, 1, 2, 1, 0, 1, 2, 3, 1, 0],
                   [2, 0, 3, 1, 3, 2, 1, 0, 0, 0],
                   [0, 1, 3, 3, 2, 0, 3, 2, 0, 3],
                   [0, 1, 0, 3, 1, 3, 0, 0, 0, 2],
                   [1, 1, 2, 2, 3, 2, 1, 0, 0, 3],
                   [2, 1, 0, 3, 2, 2, 2, 2, 1, 2],
                   [0, 3, 3, 3, 1, 0, 2, 0, 2, 1]])
            
            In [13]: y = x.reshape(x.shape[0]/2, 2, x.shape[1]/2, 2)
            

            Take a look at a couple of the 2x2 blocks:

            In [14]: y[0, :, 0, :]
            Out[14]: 
            array([[2, 1],
                   [3, 1]])
            
            In [15]: y[1, :, 2, :]
            Out[15]: 
            array([[3, 2],
                   [2, 0]])
            

            Compute the averages of the blocks:

            In [16]: avg = y.mean(axis=(1, 3))
            
            In [17]: avg
            Out[17]: 
            array([[ 1.75,  1.75,  0.75,  2.  ,  1.5 ],
                   [ 0.75,  2.5 ,  1.75,  1.5 ,  0.75],
                   [ 0.75,  1.75,  2.25,  0.25,  1.25],
                   [ 1.5 ,  2.25,  1.25,  1.5 ,  1.5 ]])
            
            qid & accept id: (26889734, 26889941) query: Using fancy indexing to get one value from each column of a numpy matrix soup:

            Getting:

            \n
            >>> a = np.arange(25).reshape((5,5))\n>>> a\narray([[ 0,  1,  2,  3,  4],\n       [ 5,  6,  7,  8,  9],\n       [10, 11, 12, 13, 14],\n       [15, 16, 17, 18, 19],\n       [20, 21, 22, 23, 24]])\n>>> rows = b = np.array([0,4,2,3,3])\n>>> cols = np.arange(len(b))\n>>> [a[b[i], i] for i in range(5)]\n[0, 21, 12, 18, 19]\n>>> a[rows, cols]\narray([ 0, 21, 12, 18, 19])\n
            \n

            Setting:

            \n
            >>> a[rows, cols] = 69\n>>> a\narray([[69,  1,  2,  3,  4],\n       [ 5,  6,  7,  8,  9],\n       [10, 11, 69, 13, 14],\n       [15, 16, 17, 69, 69],\n       [20, 69, 22, 23, 24]])\n>>> a[rows, cols] = np.array([-111, -111, -222, -333, -666])\n>>> a\narray([[-111,    1,    2,    3,    4],\n       [   5,    6,    7,    8,    9],\n       [  10,   11, -222,   13,   14],\n       [  15,   16,   17, -333, -666],\n       [  20, -111,   22,   23,   24]])\n
            \n soup wrap:

            Getting:

            >>> a = np.arange(25).reshape((5,5))
            >>> a
            array([[ 0,  1,  2,  3,  4],
                   [ 5,  6,  7,  8,  9],
                   [10, 11, 12, 13, 14],
                   [15, 16, 17, 18, 19],
                   [20, 21, 22, 23, 24]])
            >>> rows = b = np.array([0,4,2,3,3])
            >>> cols = np.arange(len(b))
            >>> [a[b[i], i] for i in range(5)]
            [0, 21, 12, 18, 19]
            >>> a[rows, cols]
            array([ 0, 21, 12, 18, 19])
            

            Setting:

            >>> a[rows, cols] = 69
            >>> a
            array([[69,  1,  2,  3,  4],
                   [ 5,  6,  7,  8,  9],
                   [10, 11, 69, 13, 14],
                   [15, 16, 17, 69, 69],
                   [20, 69, 22, 23, 24]])
            >>> a[rows, cols] = np.array([-111, -111, -222, -333, -666])
            >>> a
            array([[-111,    1,    2,    3,    4],
                   [   5,    6,    7,    8,    9],
                   [  10,   11, -222,   13,   14],
                   [  15,   16,   17, -333, -666],
                   [  20, -111,   22,   23,   24]])
            
            qid & accept id: (26897708, 26897740) query: Selecting a random value from a dictionary in python soup:

            If you have a list lst, then you can simply do:

            \n
            random_word = random.choice(lst)\n
            \n

            to get a random entry of the list. So here, you will want something like:

            \n
            return random.choice(max(result.items(), key=lambda kv: len(kv[1]))[1])\n#      ^^^^^^^^^^^^^^                                              ^^^^\n
            \n soup wrap:

            If you have a list lst, then you can simply do:

            random_word = random.choice(lst)
            

            to get a random entry of the list. So here, you will want something like:

            return random.choice(max(result.items(), key=lambda kv: len(kv[1]))[1])
            #      ^^^^^^^^^^^^^^                                              ^^^^
            
            qid & accept id: (26911426, 26914116) query: Python Pandas: Eliminate a row from a dataframe if a value in a any preceding row in a groupby meets a certain criteria soup:

            This should work

            \n
            df.groupby(['Country', 'Product']).apply(lambda sdf: sdf[(sdf.Week.diff(1).fillna(1) != 1).astype('int').cumsum() == 0]).reset_index(drop=True)\n
            \n

            Another method, that might be more readable (i.e. generate a set of consecutive weeks and check against the observed week)

            \n
            df['expected_week'] = df.groupby(['Country', 'Product']).Week.transform(lambda s: range(s.min(), s.min() + s.size))\ndf[df.Week == df.expected_week]\n
            \n soup wrap:

            This should work

            df.groupby(['Country', 'Product']).apply(lambda sdf: sdf[(sdf.Week.diff(1).fillna(1) != 1).astype('int').cumsum() == 0]).reset_index(drop=True)
            

            Another method, that might be more readable (i.e. generate a set of consecutive weeks and check against the observed week)

            df['expected_week'] = df.groupby(['Country', 'Product']).Week.transform(lambda s: range(s.min(), s.min() + s.size))
            df[df.Week == df.expected_week]
            
            qid & accept id: (26922284, 26924713) query: Filling gaps for cumulative sum with Pandas soup:

            I think you want to use pivot_table:

            \n
            In [11]: df.pivot_table(values="incoming", index="month", columns="goods", aggfunc="sum")\nOut[11]:\ngoods   a   b   c\nmonth\n1       0  30 NaN\n2      30 NaN  10\n3     NaN  70 NaN\n5     NaN  40  50\n6      20 NaN NaN\n
            \n

            To get the filled in months, you can reindex (this feels a little hacky, there may be a neater way):

            \n
            In [12]: res.reindex(np.arange(res.index[0], res.index[-1] + 1))\nOut[12]:\ngoods   a   b   c\n1       0  30 NaN\n2      30 NaN  10\n3     NaN  70 NaN\n4     NaN NaN NaN\n5     NaN  40  50\n6      20 NaN NaN\n
            \n
            \n

            One issue here is that month is independent of year, in may be preferable to have a period index:

            \n
            In [21]: df.pivot_table(values="incoming", index=pd.DatetimeIndex(df.date).to_period("M"), columns="goods", aggfunc="sum")\nOut[21]:\ngoods     a   b   c\n2014-01   0  30 NaN\n2014-02  30 NaN  10\n2014-03 NaN  70 NaN\n2014-05 NaN  40  50\n2014-06  20 NaN NaN\n
            \n

            and then you can reindex by the period range:

            \n
            In [22]: res2.reindex(pd.period_range(res2.index[0], res2.index[-1], freq="M"))\nOut[22]:\ngoods     a   b   c\n2014-01   0  30 NaN\n2014-02  30 NaN  10\n2014-03 NaN  70 NaN\n2014-04 NaN NaN NaN\n2014-05 NaN  40  50\n2014-06  20 NaN NaN\n
            \n
            \n

            Which is to say, you can do the same with your dfg:

            \n
            In [31]: dfg.pivot_table(["incoming", "level"], "month", "goods")\nOut[31]:\n      incoming         level\ngoods        a   b   c     a    b   c\nmonth\n1            0  30 NaN     0   30 NaN\n2           30 NaN  10    30  NaN  10\n3          NaN  70 NaN   NaN  100 NaN\n5          NaN  40  50   NaN  140  60\n6           20 NaN NaN    50  NaN NaN\n
            \n

            and reindex.

            \n soup wrap:

            I think you want to use pivot_table:

            In [11]: df.pivot_table(values="incoming", index="month", columns="goods", aggfunc="sum")
            Out[11]:
            goods   a   b   c
            month
            1       0  30 NaN
            2      30 NaN  10
            3     NaN  70 NaN
            5     NaN  40  50
            6      20 NaN NaN
            

            To get the filled in months, you can reindex (this feels a little hacky, there may be a neater way):

            In [12]: res.reindex(np.arange(res.index[0], res.index[-1] + 1))
            Out[12]:
            goods   a   b   c
            1       0  30 NaN
            2      30 NaN  10
            3     NaN  70 NaN
            4     NaN NaN NaN
            5     NaN  40  50
            6      20 NaN NaN
            

            One issue here is that month is independent of year, in may be preferable to have a period index:

            In [21]: df.pivot_table(values="incoming", index=pd.DatetimeIndex(df.date).to_period("M"), columns="goods", aggfunc="sum")
            Out[21]:
            goods     a   b   c
            2014-01   0  30 NaN
            2014-02  30 NaN  10
            2014-03 NaN  70 NaN
            2014-05 NaN  40  50
            2014-06  20 NaN NaN
            

            and then you can reindex by the period range:

            In [22]: res2.reindex(pd.period_range(res2.index[0], res2.index[-1], freq="M"))
            Out[22]:
            goods     a   b   c
            2014-01   0  30 NaN
            2014-02  30 NaN  10
            2014-03 NaN  70 NaN
            2014-04 NaN NaN NaN
            2014-05 NaN  40  50
            2014-06  20 NaN NaN
            

            Which is to say, you can do the same with your dfg:

            In [31]: dfg.pivot_table(["incoming", "level"], "month", "goods")
            Out[31]:
                  incoming         level
            goods        a   b   c     a    b   c
            month
            1            0  30 NaN     0   30 NaN
            2           30 NaN  10    30  NaN  10
            3          NaN  70 NaN   NaN  100 NaN
            5          NaN  40  50   NaN  140  60
            6           20 NaN NaN    50  NaN NaN
            

            and reindex.

            qid & accept id: (26934349, 27019388) query: How to assign scipy.sparse matrix to NumPy array via indexing? soup:

            As mentioned in a comment to my question, the sequence interface won't work for sparse matrices, because they don't lose a dimension when indexed with a single number.\nTo try it anyway, I created a very limited quick-and-dirty sparse array class in pure Python, which, when indexed with a single number, returns a "row" class (which holds a view to the original data), which again can be indexed with a single number to yield the actual value at this index. Using an instance s of my class, assigning to a NumPy array a works exactly as requested:

            \n
            a[:] = s\n
            \n

            I expected this to be somewhat inefficient, but it is really, really, really, extremely slow. Assigning a 500.000 x 100 sparse array took several minutes!\nThe good news, though, is that no full-sized temporary array is created during the assignment. The memory usage stays about constant during the assignment (while one of the CPUs maxes out).

            \n

            So this is basically one solution to the original question.

            \n

            To make the assignment more efficient and still use no temporary copy of the dense array data, NumPy would have to internally do something similar to

            \n
            s.toarray(out=a)\n
            \n

            As far as I know, there is currently no way to get NumPy to do that.

            \n

            However, there is a way to do something very similar, by providing an __array__() method that returns a NumPy array. Incidentally, SciPy sparse matrices already have such a method, just with a different name: toarray(). So I just renamed it:

            \n
            scipy.sparse.dok_matrix.__array__ = scipy.sparse.dok_matrix.toarray\na[:] = s\n
            \n

            This works like a charm (also with the other sparse matrix classes) and is totally fast!

            \n

            According to my limited understanding of the situation, this should create a temporary NumPy array with the same size as a which holds all the values from s (and many zeros) and which is then assigned to a.\nBut strangely, even when I use a very large a that occupies nearly all my available RAM, the assignment still happens very quickly and no additional RAM is used.

            \n

            So I guess this is another, much better solution to my original question.

            \n

            Which leaves another question: why does this work without a temporary array?

            \n soup wrap:

            As mentioned in a comment to my question, the sequence interface won't work for sparse matrices, because they don't lose a dimension when indexed with a single number. To try it anyway, I created a very limited quick-and-dirty sparse array class in pure Python, which, when indexed with a single number, returns a "row" class (which holds a view to the original data), which again can be indexed with a single number to yield the actual value at this index. Using an instance s of my class, assigning to a NumPy array a works exactly as requested:

            a[:] = s
            

            I expected this to be somewhat inefficient, but it is really, really, really, extremely slow. Assigning a 500.000 x 100 sparse array took several minutes! The good news, though, is that no full-sized temporary array is created during the assignment. The memory usage stays about constant during the assignment (while one of the CPUs maxes out).

            So this is basically one solution to the original question.

            To make the assignment more efficient and still use no temporary copy of the dense array data, NumPy would have to internally do something similar to

            s.toarray(out=a)
            

            As far as I know, there is currently no way to get NumPy to do that.

            However, there is a way to do something very similar, by providing an __array__() method that returns a NumPy array. Incidentally, SciPy sparse matrices already have such a method, just with a different name: toarray(). So I just renamed it:

            scipy.sparse.dok_matrix.__array__ = scipy.sparse.dok_matrix.toarray
            a[:] = s
            

            This works like a charm (also with the other sparse matrix classes) and is totally fast!

            According to my limited understanding of the situation, this should create a temporary NumPy array with the same size as a which holds all the values from s (and many zeros) and which is then assigned to a. But strangely, even when I use a very large a that occupies nearly all my available RAM, the assignment still happens very quickly and no additional RAM is used.

            So I guess this is another, much better solution to my original question.

            Which leaves another question: why does this work without a temporary array?

            qid & accept id: (26969526, 26974518) query: Creating a correlation plot with matplotlib soup:

            To centralise the ticks you need to add 0.5 to each value:

            \n
            plt.yticks([0.5,1.5,2.5], ["first", "second", "third"])\nplt.xticks([0.5,1.5,2.5], ["first", "second", "third"], rotation='vertical')\n
            \n

            Also, you might want to add the following so that the overall figure size is adjusted to take account of the rotated x labels:

            \n
            plt.tight_layout()\n
            \n soup wrap:

            To centralise the ticks you need to add 0.5 to each value:

            plt.yticks([0.5,1.5,2.5], ["first", "second", "third"])
            plt.xticks([0.5,1.5,2.5], ["first", "second", "third"], rotation='vertical')
            

            Also, you might want to add the following so that the overall figure size is adjusted to take account of the rotated x labels:

            plt.tight_layout()
            
            qid & accept id: (26983187, 26984764) query: Convert a 3D array to 2D array based on dictionary soup:

            A dict is a map from keys to values. A NumPy array can also act as a map from\nkeys to values. For example,

            \n
            In [11]: dct = {3:40, 2:30, 1:20, 0:10}\n\nIn [9]: arr = np.array([10,20,30,40])\n\nIn [12]: arr[3]\nOut[12]: 40\n\nIn [13]: dct[3]\nOut[13]: 40\n
            \n

            The dict is more flexible -- it's keys can be any hashable object. The array\nmust be indexed by integers. But the array may be more appropriate in a NumPy\nsetting since the array can itself be indexed by an integer array:

            \n
            In [8]: index = np.array([3,2,1,0])\n\nIn [10]: arr[index]\nOut[10]: array([40, 30, 20, 10])\n
            \n

            whereas the equivalent using a dict requires a loop:

            \n
            In [17]: [dct[i] for i in index]\nOut[17]: [40, 30, 20, 10]\n
            \n

            Integer indexing is much faster than dict lookups in a loop:

            \n
            In [19]: %timeit arr[index]\n1000000 loops, best of 3: 201 ns per loop\n\nIn [20]: %timeit [dct[i] for i in index]\n1000000 loops, best of 3: 1.63 µs per loop\n
            \n

            This rough equivalence between dicts and NumPy arrays is the one insight which\nmotivates the method below. The rest of the code is there simply to overcome\nobstacles such as not having integer keys (you'll see this is solved by using\nnp.unique's return_inverse=True to obtain unique labels which are integers.)

            \n
            \n

            Suppose you have this setup:

            \n
            import numpy as np\n\ncolor = np.array([\n    [  0,   0,   0],\n    [128,   0, 128],\n    [  0, 128, 128],\n    [  0,   0, 128],\n    [  0, 128,   0],\n    [128, 128,   0],\n    [128, 128, 128],\n    [128,   0,   0],], dtype='uint8').reshape(-1,2,3)\n\ncolor2ind = {(128, 128, 128): 6, \n             (0, 128, 128): 2, \n             (128, 0, 128): 1, \n             (128, 0, 0): 7, \n             (128, 128, 0): 5, \n             (0, 0, 128): 3, \n             (0, 128, 0): 4, \n             (0, 0, 0): 0}\n
            \n

            Then:

            \n
            def rgb2int(arr):\n    """\n    Convert (N,...M,3)-array of dtype uint8 to a (N,...,M)-array of dtype int32\n    """\n    return arr[...,0]*(256**2)+arr[...,1]*256+arr[...,2]\n\ndef rgb2vals(color, color2ind):\n    int_colors = rgb2int(color)\n    int_keys = rgb2int(np.array(color2ind.keys(), dtype='uint8'))\n    int_array = np.r_[int_colors.ravel(), int_keys]\n    uniq, index = np.unique(int_array, return_inverse=True)\n    color_labels = index[:int_colors.size]\n    key_labels = index[-len(color2ind):]\n\n    colormap = np.empty_like(int_keys, dtype='uint32')\n    colormap[key_labels] = color2ind.values()\n    out = colormap[color_labels].reshape(color.shape[:2])\n    return out\n\nprint(rgb2vals(color, color2ind))\n
            \n

            yields

            \n
            [[0 1]\n [2 3]\n [4 5]\n [6 7]]\n
            \n

            (The numbers are in order; color was picked so the answer is easy to check.)

            \n
            \n

            Here is a benchmark showing rgb2vals, which uses NumPy indexing, is much faster\nthan using a double for-loop:

            \n
            def using_loops(color, color2ind):\n    M, N = color.shape[:2]\n    out = np.zeros((M, N))\n    for i in range(M):\n        for j in range(N):\n            out[i][j] = color2ind[tuple(color[i,j,:])]\n    return out\n
            \n
            \n
            In [295]: color = np.tile(color, (100,100,1))\n\nIn [296]: (rgb2vals(color, color2ind) == using_loops(color, color2ind)).all()\nOut[296]: True\n\nIn [297]: %timeit rgb2vals(color, color2ind)\n100 loops, best of 3: 6.74 ms per loop\n\nIn [298]: %timeit using_loops(color, color2ind)\n1 loops, best of 3: 751 ms per loop\n
            \n
            \n

            The first step is to reduce color to a 2-dimensional array by converting every (r,g,b) triplet to a single int:

            \n
            In [270]: int_colors = rgb2int(color)\nIn [270]: int_colors\nOut[270]: \narray([[      0, 8388736],\n       [  32896,     128],\n       [  32768, 8421376],\n       [8421504, 8388608]], dtype=uint32)\n
            \n

            Now we do the same for the (r,g,b) triplet keys in the color2ind dict:

            \n
            In [271]: int_keys = rgb2int(np.array(color2ind.keys(), dtype='uint8'))\nIn [271]: int_keys\nOut[271]: \narray([8388608, 8421504, 8388736, 8421376,     128,       0,   32768,\n         32896], dtype=uint32)\n
            \n

            Concatenate these two arrays and then use np.unique to find the inverse index:

            \n
            In [283]: int_array = np.r_[int_colors.ravel(), int_keys]\n\nIn [284]: uniq, index = np.unique(int_array, return_inverse=True)\n\nIn [285]: index\nOut[285]: array([0, 5, 3, 1, 2, 6, 7, 4, 4, 7, 5, 6, 1, 0, 2, 3])\n\nIn [286]: uniq\nOut[286]: \narray([      0,     128,   32768,   32896, 8388608, 8388736, 8421376,\n       8421504], dtype=uint32)\n
            \n

            uniq holds the unique values in int_colors and int_keys.\nindex holds the index values such that uniq[index] = int_array:

            \n
            In [265]: (uniq[index] == int_array).all()\nOut[265]: True\n
            \n

            Once we have index we are golden. The values in index are like labels, each label is associated to a particular color. The first color.size items in index are labels for the colors in color, the last len(color2ind) items in index are the labels for the keys in color2ind.

            \n
            color_labels = index[:int_colors.size]\nkey_labels = index[-len(color2ind):]\n
            \n

            Now all we need is to make an array, colormap with the values in color2ind.values(), such that the key labels map to the values:

            \n
            colormap[key_labels] = color2ind.values()\n
            \n

            By placing the values in color2ind at the index positions equal to the\nassociated key labels, we create a colormap array which is can in effect act\nlike a dict. colormap[color_labels] maps the color labels to color2ind values, which is exactly what we want:

            \n
            out = colormap[color_labels].reshape(color.shape[:2])\n\nIn [267]: out\nOut[267]: \narray([[7, 6],\n       [1, 5],\n       [3, 0],\n       [4, 2]], dtype=uint32)\n
            \n soup wrap:

            A dict is a map from keys to values. A NumPy array can also act as a map from keys to values. For example,

            In [11]: dct = {3:40, 2:30, 1:20, 0:10}
            
            In [9]: arr = np.array([10,20,30,40])
            
            In [12]: arr[3]
            Out[12]: 40
            
            In [13]: dct[3]
            Out[13]: 40
            

            The dict is more flexible -- it's keys can be any hashable object. The array must be indexed by integers. But the array may be more appropriate in a NumPy setting since the array can itself be indexed by an integer array:

            In [8]: index = np.array([3,2,1,0])
            
            In [10]: arr[index]
            Out[10]: array([40, 30, 20, 10])
            

            whereas the equivalent using a dict requires a loop:

            In [17]: [dct[i] for i in index]
            Out[17]: [40, 30, 20, 10]
            

            Integer indexing is much faster than dict lookups in a loop:

            In [19]: %timeit arr[index]
            1000000 loops, best of 3: 201 ns per loop
            
            In [20]: %timeit [dct[i] for i in index]
            1000000 loops, best of 3: 1.63 µs per loop
            

            This rough equivalence between dicts and NumPy arrays is the one insight which motivates the method below. The rest of the code is there simply to overcome obstacles such as not having integer keys (you'll see this is solved by using np.unique's return_inverse=True to obtain unique labels which are integers.)


            Suppose you have this setup:

            import numpy as np
            
            color = np.array([
                [  0,   0,   0],
                [128,   0, 128],
                [  0, 128, 128],
                [  0,   0, 128],
                [  0, 128,   0],
                [128, 128,   0],
                [128, 128, 128],
                [128,   0,   0],], dtype='uint8').reshape(-1,2,3)
            
            color2ind = {(128, 128, 128): 6, 
                         (0, 128, 128): 2, 
                         (128, 0, 128): 1, 
                         (128, 0, 0): 7, 
                         (128, 128, 0): 5, 
                         (0, 0, 128): 3, 
                         (0, 128, 0): 4, 
                         (0, 0, 0): 0}
            

            Then:

            def rgb2int(arr):
                """
                Convert (N,...M,3)-array of dtype uint8 to a (N,...,M)-array of dtype int32
                """
                return arr[...,0]*(256**2)+arr[...,1]*256+arr[...,2]
            
            def rgb2vals(color, color2ind):
                int_colors = rgb2int(color)
                int_keys = rgb2int(np.array(color2ind.keys(), dtype='uint8'))
                int_array = np.r_[int_colors.ravel(), int_keys]
                uniq, index = np.unique(int_array, return_inverse=True)
                color_labels = index[:int_colors.size]
                key_labels = index[-len(color2ind):]
            
                colormap = np.empty_like(int_keys, dtype='uint32')
                colormap[key_labels] = color2ind.values()
                out = colormap[color_labels].reshape(color.shape[:2])
                return out
            
            print(rgb2vals(color, color2ind))
            

            yields

            [[0 1]
             [2 3]
             [4 5]
             [6 7]]
            

            (The numbers are in order; color was picked so the answer is easy to check.)


            Here is a benchmark showing rgb2vals, which uses NumPy indexing, is much faster than using a double for-loop:

            def using_loops(color, color2ind):
                M, N = color.shape[:2]
                out = np.zeros((M, N))
                for i in range(M):
                    for j in range(N):
                        out[i][j] = color2ind[tuple(color[i,j,:])]
                return out
            

            In [295]: color = np.tile(color, (100,100,1))
            
            In [296]: (rgb2vals(color, color2ind) == using_loops(color, color2ind)).all()
            Out[296]: True
            
            In [297]: %timeit rgb2vals(color, color2ind)
            100 loops, best of 3: 6.74 ms per loop
            
            In [298]: %timeit using_loops(color, color2ind)
            1 loops, best of 3: 751 ms per loop
            

            The first step is to reduce color to a 2-dimensional array by converting every (r,g,b) triplet to a single int:

            In [270]: int_colors = rgb2int(color)
            In [270]: int_colors
            Out[270]: 
            array([[      0, 8388736],
                   [  32896,     128],
                   [  32768, 8421376],
                   [8421504, 8388608]], dtype=uint32)
            

            Now we do the same for the (r,g,b) triplet keys in the color2ind dict:

            In [271]: int_keys = rgb2int(np.array(color2ind.keys(), dtype='uint8'))
            In [271]: int_keys
            Out[271]: 
            array([8388608, 8421504, 8388736, 8421376,     128,       0,   32768,
                     32896], dtype=uint32)
            

            Concatenate these two arrays and then use np.unique to find the inverse index:

            In [283]: int_array = np.r_[int_colors.ravel(), int_keys]
            
            In [284]: uniq, index = np.unique(int_array, return_inverse=True)
            
            In [285]: index
            Out[285]: array([0, 5, 3, 1, 2, 6, 7, 4, 4, 7, 5, 6, 1, 0, 2, 3])
            
            In [286]: uniq
            Out[286]: 
            array([      0,     128,   32768,   32896, 8388608, 8388736, 8421376,
                   8421504], dtype=uint32)
            

            uniq holds the unique values in int_colors and int_keys. index holds the index values such that uniq[index] = int_array:

            In [265]: (uniq[index] == int_array).all()
            Out[265]: True
            

            Once we have index we are golden. The values in index are like labels, each label is associated to a particular color. The first color.size items in index are labels for the colors in color, the last len(color2ind) items in index are the labels for the keys in color2ind.

            color_labels = index[:int_colors.size]
            key_labels = index[-len(color2ind):]
            

            Now all we need is to make an array, colormap with the values in color2ind.values(), such that the key labels map to the values:

            colormap[key_labels] = color2ind.values()
            

            By placing the values in color2ind at the index positions equal to the associated key labels, we create a colormap array which is can in effect act like a dict. colormap[color_labels] maps the color labels to color2ind values, which is exactly what we want:

            out = colormap[color_labels].reshape(color.shape[:2])
            
            In [267]: out
            Out[267]: 
            array([[7, 6],
                   [1, 5],
                   [3, 0],
                   [4, 2]], dtype=uint32)
            
            qid & accept id: (27007260, 27007737) query: python 3: Adding .csv column sums in to dictionaries with header keys soup:

            Definitely not my prettiest piece of code. But here it is. Basically, just store all the information in a list of lists, then iterate over it from there.

            \n
            def sumColumns1(columnfile):\n    import csv\n    with open(columnfile) as csvfile:\n        r = csv.reader(csvfile)\n        names = next(r)\n        Int = lambda x: 0 if x=='' else int(x)\n        sums  = reduce(lambda x,y: [ Int(a)+Int(b) for a,b in zip(x,y) ], r)\n        return dict(zip(names,sums))\n
            \n

            In an expanded form (or one that doesn't have reduce - before someone complains):

            \n
            def sumColumns1(columnfile):\n    import csv\n    with open(columnfile) as csvfile:\n        r = csv.reader(csvfile)\n        names = next(r)\n        sums = [ 0 for _ in names ]\n        for line in r:\n            for i in range(len(sums)):\n                sums[i] += int(0 if line[i]=='' else line[i])\n        return dict(zip(names,sums))\n
            \n

            Gives me the correct output. Hopefully someone comes up with something more pythonic.

            \n soup wrap:

            Definitely not my prettiest piece of code. But here it is. Basically, just store all the information in a list of lists, then iterate over it from there.

            def sumColumns1(columnfile):
                import csv
                with open(columnfile) as csvfile:
                    r = csv.reader(csvfile)
                    names = next(r)
                    Int = lambda x: 0 if x=='' else int(x)
                    sums  = reduce(lambda x,y: [ Int(a)+Int(b) for a,b in zip(x,y) ], r)
                    return dict(zip(names,sums))
            

            In an expanded form (or one that doesn't have reduce - before someone complains):

            def sumColumns1(columnfile):
                import csv
                with open(columnfile) as csvfile:
                    r = csv.reader(csvfile)
                    names = next(r)
                    sums = [ 0 for _ in names ]
                    for line in r:
                        for i in range(len(sums)):
                            sums[i] += int(0 if line[i]=='' else line[i])
                    return dict(zip(names,sums))
            

            Gives me the correct output. Hopefully someone comes up with something more pythonic.

            qid & accept id: (27024392, 27024512) query: Getting strings in between two keywords from a file in python soup:

            Try escaping your outermost parentheses pair.

            \n
            navigated_pages = re.findall(r'EVENT\(X(.*?)\) ',data,re.DOTALL|re.MULTILINE)\n
            \n

            This appears to make it match properly, at least for my little sample input:

            \n
            >>> s = "EVENT(X_HELLO) ... EVENT(X_HOW_ARE_YOU_DOING_TODAY)... EVENT(this one shouldn't appear because it doesn't start with X)"\n>>> re.findall(r"EVENT\(X(.*?)\)", s)\n['_HELLO', '_HOW_ARE_YOU_DOING_TODAY']\n
            \n

            If you want the starting X too, you should nudge the inner parentheses to the left by one. Don't worry, I'm pretty sure the *? will still have the proper precedence.

            \n
            >>> re.findall(r"EVENT\((X.*?)\)", s)\n['X_HELLO', 'X_HOW_ARE_YOU_DOING_TODAY']\n
            \n soup wrap:

            Try escaping your outermost parentheses pair.

            navigated_pages = re.findall(r'EVENT\(X(.*?)\) ',data,re.DOTALL|re.MULTILINE)
            

            This appears to make it match properly, at least for my little sample input:

            >>> s = "EVENT(X_HELLO) ... EVENT(X_HOW_ARE_YOU_DOING_TODAY)... EVENT(this one shouldn't appear because it doesn't start with X)"
            >>> re.findall(r"EVENT\(X(.*?)\)", s)
            ['_HELLO', '_HOW_ARE_YOU_DOING_TODAY']
            

            If you want the starting X too, you should nudge the inner parentheses to the left by one. Don't worry, I'm pretty sure the *? will still have the proper precedence.

            >>> re.findall(r"EVENT\((X.*?)\)", s)
            ['X_HELLO', 'X_HOW_ARE_YOU_DOING_TODAY']
            
            qid & accept id: (27027500, 27028189) query: How do I plot a histogram using Python so that x-values are frequencies of a spectra? soup:

            Naive answer for making a histogram of X, the DFT of a time domain signal x

            \n
            import matplotlib.pyplot as plt\nimport numpy as np\n\n...\nw = np.linspace(0,N*dw-dw,N)   \nplt.bar(w, abs(X), align='center', width=dw)\nplt.show()\n
            \n

            For a nice looking plot, you have to take into account that X is associated with frequencies 0*dw, 1*dw, ..., (N-1)*dw and that, in a nice looking plot, you usually want to use a range -N*dw/2, +N*dw/2 for your abscissas.

            \n

            Complete answer

            \n
            import matplotlib.pyplot as plt\nimport numpy as np\nnp.random.seed(57)\n\nN = 64 ; dw = 0.2\nw = np.linspace(0,N*dw-dw,N)\nX = 200 + (np.arange(N)-N/2)**2*np.random.random(N)\n\nplt.bar(w, abs(X), align='center', width=dw)\nplt.xticks([i*8*dw for i in range(N/8)]+[N*dw-dw/2])\nplt.xlim(-dw/2,N*dw-dw/2)\nplt.show()\n
            \n

            And this is the result so far

            \n

            enter image description here

            \n

            as you can see, this type of plot kind of stresses the periodicity of the DFT, but it is customary to plot the DFT centered around the zero frequency, and this can be done like this

            \n
            w2=np.concatenate((w-N*dw,w))\nX2=np.concatenate((X,X)\n\nplt.bar(w2, abs(X2), align='center', width=dw)\nplt.xticks([i*8*dw for i in range(-N/16,1+N/16)])\nplt.xlim(-dw*N/2,dw*N/2)\nplt.show()\n
            \n

            and this is the result\nenter image description here

            \n

            Post Scriptum

            \n

            The procedures I described are good procedures for the OP needs, but I'd like to say that the X data has thoughtlessly been synthesized on the spot, and has no resemblance with real life DFT. On the contrary, if I see something like the plots above I'd make a comment on the insufficient sampling rate in time domain.

            \n soup wrap:

            Naive answer for making a histogram of X, the DFT of a time domain signal x

            import matplotlib.pyplot as plt
            import numpy as np
            
            ...
            w = np.linspace(0,N*dw-dw,N)   
            plt.bar(w, abs(X), align='center', width=dw)
            plt.show()
            

            For a nice looking plot, you have to take into account that X is associated with frequencies 0*dw, 1*dw, ..., (N-1)*dw and that, in a nice looking plot, you usually want to use a range -N*dw/2, +N*dw/2 for your abscissas.

            Complete answer

            import matplotlib.pyplot as plt
            import numpy as np
            np.random.seed(57)
            
            N = 64 ; dw = 0.2
            w = np.linspace(0,N*dw-dw,N)
            X = 200 + (np.arange(N)-N/2)**2*np.random.random(N)
            
            plt.bar(w, abs(X), align='center', width=dw)
            plt.xticks([i*8*dw for i in range(N/8)]+[N*dw-dw/2])
            plt.xlim(-dw/2,N*dw-dw/2)
            plt.show()
            

            And this is the result so far

            enter image description here

            as you can see, this type of plot kind of stresses the periodicity of the DFT, but it is customary to plot the DFT centered around the zero frequency, and this can be done like this

            w2=np.concatenate((w-N*dw,w))
            X2=np.concatenate((X,X)
            
            plt.bar(w2, abs(X2), align='center', width=dw)
            plt.xticks([i*8*dw for i in range(-N/16,1+N/16)])
            plt.xlim(-dw*N/2,dw*N/2)
            plt.show()
            

            and this is the result enter image description here

            Post Scriptum

            The procedures I described are good procedures for the OP needs, but I'd like to say that the X data has thoughtlessly been synthesized on the spot, and has no resemblance with real life DFT. On the contrary, if I see something like the plots above I'd make a comment on the insufficient sampling rate in time domain.

            qid & accept id: (27031053, 27031101) query: How do you create a list of values in Python within a certain range? soup:

            you can use zfill

            \n

            demo:

            \n
            >>> [ str(x).zfill(5) for x in range(10) ]\n['00000', '00001', '00002', '00003', '00004', '00005', '00006', '00007', '00008', '00009']\n
            \n

            demo with IT:

            \n
            >>>[ 'IT'+str(x).zfill(5) for x in range(10) ]\n['IT00000', 'IT00001', 'IT00002', 'IT00003', 'IT00004', 'IT00005', 'IT00006', 'IT00007', 'IT00008', 'IT00009']\n
            \n

            you can also use format:

            \n
            >>> [ '{}{:05d}'.format('IT',x) for x in range(10) ]\n['IT00000', 'IT00001', 'IT00002', 'IT00003', 'IT00004', 'IT00005', 'IT00006', 'IT00007', 'IT00008', 'IT00009']\n
            \n soup wrap:

            you can use zfill

            demo:

            >>> [ str(x).zfill(5) for x in range(10) ]
            ['00000', '00001', '00002', '00003', '00004', '00005', '00006', '00007', '00008', '00009']
            

            demo with IT:

            >>>[ 'IT'+str(x).zfill(5) for x in range(10) ]
            ['IT00000', 'IT00001', 'IT00002', 'IT00003', 'IT00004', 'IT00005', 'IT00006', 'IT00007', 'IT00008', 'IT00009']
            

            you can also use format:

            >>> [ '{}{:05d}'.format('IT',x) for x in range(10) ]
            ['IT00000', 'IT00001', 'IT00002', 'IT00003', 'IT00004', 'IT00005', 'IT00006', 'IT00007', 'IT00008', 'IT00009']
            
            qid & accept id: (27032670, 27032748) query: Best way to convert value in nested list to string soup:
            value = [ ",".join(map(str,i)) for i in value ]\n
            \n

            map will convert all float type to str and then join will join them

            \n

            if you didn't understand about map how it working:

            \n
            value = [ ",".join(str(x) for x in i) for i in value ]\n
            \n soup wrap:
            value = [ ",".join(map(str,i)) for i in value ]
            

            map will convert all float type to str and then join will join them

            if you didn't understand about map how it working:

            value = [ ",".join(str(x) for x in i) for i in value ]
            
            qid & accept id: (27054358, 27054435) query: Remove duplicated string(s) in strings in a list soup:
            >>> a = "15-105;ZH0311;TZZGJJ; ZH0311; ZH0311;DOC"\n>>> a = map(str.strip,a.split(';'))\n>>> a\n['15-105', 'ZH0311', 'TZZGJJ', 'ZH0311', 'ZH0311', 'DOC']\n>>> a = sorted(set(a),key=lambda x:a.index(x))\n>>> a\n['15-105', 'ZH0311', 'TZZGJJ', 'DOC']\n>>> ";".join(a)\n'15-105;ZH0311;TZZGJJ;DOC'\n
            \n

            i have used split to split it then strip to remove extra spaces. I have use set to remove duplication, but set dosent care about order. so i need to sort in the order as they are

            \n
            >>> def remove_duplication(my_list):\n...     my_newlist = []\n...     for x in my_list:\n...         x = map(str.strip,x.split(';'))\n...         my_newlist.append(";".join(sorted(set(x),key=lambda y:x.index(y))))\n...     return my_newlist\n... \n>>> remove_duplication(a_list)\n['15~105;~ PO185-400CT;NGG;DOC', '15~105;-1;NGG;DOC', '15~105;NGG;-10;DOC', '15~55;J205~J208;POI;DOC', '15-105;ZH0305~;WER /;TZZGJJ;DOC', '15-105;ZH0311;TZZGJJ;DOC', '15-115;PL026~ PL028;Dry;PTT']\n
            \n

            if your string is delimited by space:

            \n
            >>> a="# -- coding: utf-8 --" \n>>> a= map(str.strip,a.split())\n>>> a\n['#', '--', 'coding:', 'utf-8', '--']\n>>> a = " ".join(sorted(set(a),key=lambda x:a.index(x)))\n>>> a\n'# -- coding: utf-8'\n
            \n

            split split the string on some delimiter, it may be space punchuatation or character or can be anything.

            \n

            Go though all this documentation, you will understand. Built-in types, \nBuilt-in function

            \n soup wrap:
            >>> a = "15-105;ZH0311;TZZGJJ; ZH0311; ZH0311;DOC"
            >>> a = map(str.strip,a.split(';'))
            >>> a
            ['15-105', 'ZH0311', 'TZZGJJ', 'ZH0311', 'ZH0311', 'DOC']
            >>> a = sorted(set(a),key=lambda x:a.index(x))
            >>> a
            ['15-105', 'ZH0311', 'TZZGJJ', 'DOC']
            >>> ";".join(a)
            '15-105;ZH0311;TZZGJJ;DOC'
            

            i have used split to split it then strip to remove extra spaces. I have use set to remove duplication, but set dosent care about order. so i need to sort in the order as they are

            >>> def remove_duplication(my_list):
            ...     my_newlist = []
            ...     for x in my_list:
            ...         x = map(str.strip,x.split(';'))
            ...         my_newlist.append(";".join(sorted(set(x),key=lambda y:x.index(y))))
            ...     return my_newlist
            ... 
            >>> remove_duplication(a_list)
            ['15~105;~ PO185-400CT;NGG;DOC', '15~105;-1;NGG;DOC', '15~105;NGG;-10;DOC', '15~55;J205~J208;POI;DOC', '15-105;ZH0305~;WER /;TZZGJJ;DOC', '15-105;ZH0311;TZZGJJ;DOC', '15-115;PL026~ PL028;Dry;PTT']
            

            if your string is delimited by space:

            >>> a="# -- coding: utf-8 --" 
            >>> a= map(str.strip,a.split())
            >>> a
            ['#', '--', 'coding:', 'utf-8', '--']
            >>> a = " ".join(sorted(set(a),key=lambda x:a.index(x)))
            >>> a
            '# -- coding: utf-8'
            

            split split the string on some delimiter, it may be space punchuatation or character or can be anything.

            Go though all this documentation, you will understand. Built-in types, Built-in function

            qid & accept id: (27064243, 27064513) query: Python: Read Content of Hidden HTML Table soup:

            The local institutes are in rows with just one table cell, but you are skipping those.

            \n

            Perhaps you need to extract the data from all cells and only skip rows without cells here:

            \n
            for row in soup('table')[5].findAll('tr'):\n    tds = row('td')\n    if not tds:\n        continue\n    print u' '.join([cell.string for cell in tds if cell.string])\n
            \n

            This produces

            \n
            United States, California\nVa Long Beach Healthcare System\nLong Beach, California, United States, 90822  \nUnited States, Georgia\nGastrointestinal Specialists Of Georgia Pc\nMarietta, Georgia, United States, 30060  \n# .... \nLocal Institution\nTaipei, Taiwan, 100  \nLocal Institution\nTaoyuan, Taiwan, 333  \nUnited Kingdom\nLocal Institution\nLondon, Greater London, United Kingdom, SE5 9RS  \n
            \n soup wrap:

            The local institutes are in rows with just one table cell, but you are skipping those.

            Perhaps you need to extract the data from all cells and only skip rows without cells here:

            for row in soup('table')[5].findAll('tr'):
                tds = row('td')
                if not tds:
                    continue
                print u' '.join([cell.string for cell in tds if cell.string])
            

            This produces

            United States, California
            Va Long Beach Healthcare System
            Long Beach, California, United States, 90822  
            United States, Georgia
            Gastrointestinal Specialists Of Georgia Pc
            Marietta, Georgia, United States, 30060  
            # .... 
            Local Institution
            Taipei, Taiwan, 100  
            Local Institution
            Taoyuan, Taiwan, 333  
            United Kingdom
            Local Institution
            London, Greater London, United Kingdom, SE5 9RS  
            
            qid & accept id: (27103165, 27103189) query: Python: How to remove whitespace from number in a string soup:

            You could use re.sub function like below,

            \n
            re.sub(r'(?<=\d)\s(?=\d)', r'', string)\n
            \n

            DEMO

            \n

            OR

            \n

            To replace inbetween one or more space characters.

            \n
            re.sub(r'(?<=\d)\s+(?=\d)', r'', string)\n
            \n

            Example:

            \n
            >>> import re\n>>> s = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris 850 152 nisi ut aliquip ex ea commodo consequat. Duis aute irure 360 458 000 dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."\n>>> re.sub(r'(?<=\d)\s(?=\d)', r'', s)\n'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris 850152 nisi ut aliquip ex ea commodo consequat. Duis aute irure 360458000 dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'\n
            \n

            Regular Expression:

            \n
            (?<=                     look behind to see if there is:\n  \d                       digits (0-9)\n)                        end of look-behind\n\s+                      whitespace (\n, \r, \t, \f, and " ") (1 or\n                         more times)\n(?=                      look ahead to see if there is:\n  \d                       digits (0-9)\n)                        end of look-ahead\n
            \n soup wrap:

            You could use re.sub function like below,

            re.sub(r'(?<=\d)\s(?=\d)', r'', string)
            

            DEMO

            OR

            To replace inbetween one or more space characters.

            re.sub(r'(?<=\d)\s+(?=\d)', r'', string)
            

            Example:

            >>> import re
            >>> s = "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris 850 152 nisi ut aliquip ex ea commodo consequat. Duis aute irure 360 458 000 dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
            >>> re.sub(r'(?<=\d)\s(?=\d)', r'', s)
            'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris 850152 nisi ut aliquip ex ea commodo consequat. Duis aute irure 360458000 dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.'
            

            Regular Expression:

            (?<=                     look behind to see if there is:
              \d                       digits (0-9)
            )                        end of look-behind
            \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                     more times)
            (?=                      look ahead to see if there is:
              \d                       digits (0-9)
            )                        end of look-ahead
            
            qid & accept id: (27112449, 27112594) query: how to represent a number value as a string in python? soup:

            You can use chr() function as:

            \n
            >>> chr(60)\n'<'\n>>> chr(97)\n'a'\n>>> chr(67)\n'C'\n
            \n

            To convert back use ord() funtion as:

            \n
            >>> ord('C')\n67\n
            \n soup wrap:

            You can use chr() function as:

            >>> chr(60)
            '<'
            >>> chr(97)
            'a'
            >>> chr(67)
            'C'
            

            To convert back use ord() funtion as:

            >>> ord('C')
            67
            
            qid & accept id: (27122584, 27122673) query: Sending email with html in Django 1.7 soup:

            render_to_string : which loads a template, renders it and returns the resulting string.\nhtml_message : If html_message is provided, the default message replaced with Html message.

            \n

            mail/html-message.html

            \n
            Hi {{ first_name }}.\n\n    This is your {{ email }}\n\nThank you\n
            \n

            views.py

            \n
            def mail_function(request):\n    subject = 'Test Mail'\n    from = 'info@domain.com'\n    to = 'to@domain.com'\n    c = Context({'email': email,\n                 'first_name': first_name})\n    html_content = render_to_string('mail/html-message.html', c)\n    txtmes = render_to_string('mail/text-message.html', c)\n    send_mail(subject,\n              txtmes,\n              from,\n              [to],\n              fail_silently=False,\n              html_message=html_content)\n
            \n soup wrap:

            render_to_string : which loads a template, renders it and returns the resulting string. html_message : If html_message is provided, the default message replaced with Html message.

            mail/html-message.html

            Hi {{ first_name }}.
            
                This is your {{ email }}
            
            Thank you
            

            views.py

            def mail_function(request):
                subject = 'Test Mail'
                from = 'info@domain.com'
                to = 'to@domain.com'
                c = Context({'email': email,
                             'first_name': first_name})
                html_content = render_to_string('mail/html-message.html', c)
                txtmes = render_to_string('mail/text-message.html', c)
                send_mail(subject,
                          txtmes,
                          from,
                          [to],
                          fail_silently=False,
                          html_message=html_content)
            
            qid & accept id: (27127176, 27128028) query: Subtracting an integer value from a text file and displaying the result in Python2.7 soup:

            you can try like this using datetime module

            \n

            if your file is like this:

            \n
            00:47:12: start interaction\n00:47:18: End interaction\n00:47:20: Start interaction\n00:47:23: End interaction\n00:47:25: Start interaction\n00:47:28: End interaction\n00:47:29: Start interaction\n00:47:31: End interaction\n
            \n

            code here:

            \n
            >>> f = open('file.txt')\n>>> for x in f:\n...     start = x.split()[0][:-1]\n...     end = f.next().split()[0][:-1]\n...     print str(datetime.datetime.strptime(end,"%H:%M:%S")- datetime.datetime.strptime(start,"%H:%M:%S")).split(':')[-1]\n... \n06\n03 \n03\n02\n
            \n

            to handle empty lines:

            \n
            >>> f = open('file.txt').readlines()\n>>> my_file = [ x for x in f if x!='\n' ]\n>>> for x in range(0,len(my_file)-1,2):\n...     start = my_file[x].split()[0][:-1]\n...     end = my_file[x+1].split()[0][:-1]\n...     print str(datetime.datetime.strptime(end,"%H:%M:%S")- datetime.datetime.strptime(start,"%H:%M:%S")).split(':')[-1]\n... \n06\n03\n03\n02\n
            \n soup wrap:

            you can try like this using datetime module

            if your file is like this:

            00:47:12: start interaction
            00:47:18: End interaction
            00:47:20: Start interaction
            00:47:23: End interaction
            00:47:25: Start interaction
            00:47:28: End interaction
            00:47:29: Start interaction
            00:47:31: End interaction
            

            code here:

            >>> f = open('file.txt')
            >>> for x in f:
            ...     start = x.split()[0][:-1]
            ...     end = f.next().split()[0][:-1]
            ...     print str(datetime.datetime.strptime(end,"%H:%M:%S")- datetime.datetime.strptime(start,"%H:%M:%S")).split(':')[-1]
            ... 
            06
            03 
            03
            02
            

            to handle empty lines:

            >>> f = open('file.txt').readlines()
            >>> my_file = [ x for x in f if x!='\n' ]
            >>> for x in range(0,len(my_file)-1,2):
            ...     start = my_file[x].split()[0][:-1]
            ...     end = my_file[x+1].split()[0][:-1]
            ...     print str(datetime.datetime.strptime(end,"%H:%M:%S")- datetime.datetime.strptime(start,"%H:%M:%S")).split(':')[-1]
            ... 
            06
            03
            03
            02
            
            qid & accept id: (27139744, 27139826) query: Python: Effective reading from a file using csv module soup:

            Use the top row to figure out what the column headings are. Initialize a dictionary of totals based on the headings.

            \n
            import csv\n\nwith open("file.csv") as f:\n  reader = csv.reader(f)\n\n  titles = next(reader)\n  while titles[-1] == '':\n    titles.pop()\n  num_titles = len(titles)      \n  totals = { title: 0 for title in titles }\n\n  for row in reader:\n    for i in range(num_titles):\n      totals[titles[i]] += int(row[i])\n\nprint(totals)\n
            \n

            Let me add that you don't have to close the file after the with block. The whole point of with is that it takes care of closing the file.

            \n

            Also, let me mention that the data you posted appears to have four columns:

            \n
            John,Jeff,Judy,\n21,19,32,\n178,182,169,\n85,74,57,\n
            \n

            That's why I did this:

            \n
              while titles[-1] == '':\n    titles.pop()\n
            \n soup wrap:

            Use the top row to figure out what the column headings are. Initialize a dictionary of totals based on the headings.

            import csv
            
            with open("file.csv") as f:
              reader = csv.reader(f)
            
              titles = next(reader)
              while titles[-1] == '':
                titles.pop()
              num_titles = len(titles)      
              totals = { title: 0 for title in titles }
            
              for row in reader:
                for i in range(num_titles):
                  totals[titles[i]] += int(row[i])
            
            print(totals)
            

            Let me add that you don't have to close the file after the with block. The whole point of with is that it takes care of closing the file.

            Also, let me mention that the data you posted appears to have four columns:

            John,Jeff,Judy,
            21,19,32,
            178,182,169,
            85,74,57,
            

            That's why I did this:

              while titles[-1] == '':
                titles.pop()
            
            qid & accept id: (27150664, 27150778) query: Pythonic way to parse preflib Orders with Ties files soup:

            You can use ast.literal_eval with some str.replace calls:

            \n
            >>> from ast import literal_eval\n>>> s = '1,2,{3,4,5},6'\n>>> [x if isinstance(x, tuple) else (x,) for x \n                         in literal_eval(s.replace('{', '(').replace('}', ')'))]\n[(1,), (2,), (3, 4, 5), (6,)]\n
            \n

            As @Martijn Pieters suggested you can replace the two str.replace calls with a single str.translate call:

            \n
            >>> from string import maketrans\n>>> table = maketrans('{}', '()')\n>>> [x if isinstance(x, tuple) else (x,) for x in literal_eval(s.translate(table))]\n[(1,), (2,), (3, 4, 5), (6,)]\n
            \n

            In Python 3 you won't need any str.replace or str.translate calls calls, it fails in Python 2.7 and here is the related bug:

            \n
            >>> [tuple(x) if isinstance(x, set) else (x,) for x in literal_eval(s)]\n[(1,), (2,), (3, 4, 5), (6,)]\n
            \n soup wrap:

            You can use ast.literal_eval with some str.replace calls:

            >>> from ast import literal_eval
            >>> s = '1,2,{3,4,5},6'
            >>> [x if isinstance(x, tuple) else (x,) for x 
                                     in literal_eval(s.replace('{', '(').replace('}', ')'))]
            [(1,), (2,), (3, 4, 5), (6,)]
            

            As @Martijn Pieters suggested you can replace the two str.replace calls with a single str.translate call:

            >>> from string import maketrans
            >>> table = maketrans('{}', '()')
            >>> [x if isinstance(x, tuple) else (x,) for x in literal_eval(s.translate(table))]
            [(1,), (2,), (3, 4, 5), (6,)]
            

            In Python 3 you won't need any str.replace or str.translate calls calls, it fails in Python 2.7 and here is the related bug:

            >>> [tuple(x) if isinstance(x, set) else (x,) for x in literal_eval(s)]
            [(1,), (2,), (3, 4, 5), (6,)]
            
            qid & accept id: (27159189, 27159258) query: Find empty or NaN entry in Pandas Dataframe soup:

            np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

            \n
            In [152]: import numpy as np\nIn [153]: import pandas as pd\nIn [154]: np.where(pd.isnull(df))\nOut[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))\n\nIn [155]: df.iloc[2,7]\nOut[155]: nan\n\nIn [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]\nOut[160]: [nan, nan, nan, nan, nan, nan]\n
            \n

            Finding values which are empty strings could be done with applymap:

            \n
            In [182]: np.where(df.applymap(lambda x: x == ''))\nOut[182]: (array([5]), array([7]))\n
            \n

            Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

            \n soup wrap:

            np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

            In [152]: import numpy as np
            In [153]: import pandas as pd
            In [154]: np.where(pd.isnull(df))
            Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))
            
            In [155]: df.iloc[2,7]
            Out[155]: nan
            
            In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
            Out[160]: [nan, nan, nan, nan, nan, nan]
            

            Finding values which are empty strings could be done with applymap:

            In [182]: np.where(df.applymap(lambda x: x == ''))
            Out[182]: (array([5]), array([7]))
            

            Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

            qid & accept id: (27178829, 27179087) query: Convert a python list into function soup:

            Will scipy's 1d interpolation functions work?

            \n
            import numpy as np\nfrom scipy.interpolate import interp1d\n\nx = y = np.arange(5)\nf = interp1d(x,y, kind="linear", fill_value=0., bounds_error=False)\n\nprint f(0)\nprint f(2)\nprint f(3)\nprint f(3.4)\n
            \n

            Which gives:

            \n
            1.0\n2.0\n3.0\n3.4\n
            \n soup wrap:

            Will scipy's 1d interpolation functions work?

            import numpy as np
            from scipy.interpolate import interp1d
            
            x = y = np.arange(5)
            f = interp1d(x,y, kind="linear", fill_value=0., bounds_error=False)
            
            print f(0)
            print f(2)
            print f(3)
            print f(3.4)
            

            Which gives:

            1.0
            2.0
            3.0
            3.4
            
            qid & accept id: (27193884, 27193910) query: how to grab alternating child tags in python beautifulsoup soup:

            Find all headers, and grab the next sibling from there:

            \n
            for header in soup.select('div h3'):\n    next_div = header.find_next_sibling('div')\n
            \n

            element.find_next_sibling() returns an element or None if no such sibling can be found.

            \n

            Demo:

            \n
            >>> from bs4 import BeautifulSoup\n>>> soup = BeautifulSoup('''\\n... 
            \n...

            First header

            \n...
            First div to go with a header
            \n...

            Second header

            \n...
            Second div to go with a header
            \n...
            \n... ''')\n>>> for header in soup.select('div h3'):\n... next_div = header.find_next_sibling('div')\n... print(header.text, next_div.text)\n... \nFirst header First div to go with a header\nSecond header Second div to go with a header\n
            \n soup wrap:

            Find all headers, and grab the next sibling from there:

            for header in soup.select('div h3'):
                next_div = header.find_next_sibling('div')
            

            element.find_next_sibling() returns an element or None if no such sibling can be found.

            Demo:

            >>> from bs4 import BeautifulSoup
            >>> soup = BeautifulSoup('''\
            ... 
            ...

            First header

            ...
            First div to go with a header
            ...

            Second header

            ...
            Second div to go with a header
            ...
            ... ''') >>> for header in soup.select('div h3'): ... next_div = header.find_next_sibling('div') ... print(header.text, next_div.text) ... First header First div to go with a header Second header Second div to go with a header
            qid & accept id: (27216944, 27216961) query: Regular expressions matching across multiple line in Sublime Text soup:
            \{([^}]+)\}\n
            \n

            You can try this.See demo.

            \n

            http://regex101.com/r/hQ9xT1/32

            \n
            import re\np = re.compile(ur'{([^}]+)}')\ntest_str = u"{'AuthorSite': None,\n 'FirstText': None,\n 'Image': None,\n 'SrcDate': None,\n 'Title': None,\n 'Url': None}"\n\nre.findall(p, test_str)\n
            \n

            Your regex \{(.|\s)\} didnt work coz you had not quantified it.Use \{(?:.|\s)+\}.

            \n soup wrap:
            \{([^}]+)\}
            

            You can try this.See demo.

            http://regex101.com/r/hQ9xT1/32

            import re
            p = re.compile(ur'{([^}]+)}')
            test_str = u"{'AuthorSite': None,\n 'FirstText': None,\n 'Image': None,\n 'SrcDate': None,\n 'Title': None,\n 'Url': None}"
            
            re.findall(p, test_str)
            

            Your regex \{(.|\s)\} didnt work coz you had not quantified it.Use \{(?:.|\s)+\}.

            qid & accept id: (27216950, 27217180) query: summing nested dictionary entries soup:

            Maybe I misunderstood the expected final result, but you might not need counters... A simple sum could suffice if you know that you're only going to have two levels of nesting.

            \n

            Let's assume you loaded your json dictionary of dictionaries into a variable called data.

            \n

            Then you can do:

            \n
            results = {}\nfor key in data.keys():\n    # key is '20101021', '20101004'...\n    # data[key].keys() is '4x4, '4x2'... so let's make sure\n    # that the result dictionary contains all those '4x4', '4x2'\n    # being zero if nothing better can be calculated.\n    results[key] = dict.fromkeys(data[key].keys(), 0)\n\n    for sub_key in data[key].keys():\n        # sub_key is '4x4', '4x2'...\n        # Also, don't consider a 'valid value' someting that is not a\n        # "Central Spectrum" or a "Full Frame"\n        valid_values = [\n            int(v) for k, v in data[key][sub_key].items()\n            if k in ["Central Spectrum", "Full Frame"]\n        ]\n        # Now add the 'valid_values'\n        results[key][sub_key] = sum(valid_values)\nprint results\n
            \n

            Which outputs:

            \n
            {\n  u'20101021': {u'1x1': 9, u'4x4': 10, u'4x2': 10},\n  u'20101004': {u'1x1': 10, u'4x4': 10, u'4x2': 10}\n}\n
            \n

            In many cases, I only used dict.keys() because maybe that clarifies the process? (well, and once dict.items()) You also have dict.values() (and all the tree functions have their iterator equivalents) which might shorten your code. Also, see what dict.fromkeys does.

            \n

            EDIT (as per OP's comments to this answer)

            \n

            If you want data added (or "collected") over time, then you need to need to move your results[key] from the date string (as shown above in the answer) to the 1x1, 4x4...

            \n
            VALID_KEYS = ["Central Spectrum", "Full Frame"]\nresults = {}\nfor key_1 in data.keys():\n    # key_1 is '20101021', '20101004'...\n\n    for key_2 in data[key_1].keys():\n        # key_2 is '4x4', '4x2'...\n        if key_2 not in results:\n            results[key_2] = dict.fromkeys(VALID_KEYS, 0)\n        for key_3 in data[key_1][key_2].keys():\n            # key_3 is 'Central Spectrum', 'Full Frame', 'Custom'...\n            if key_3 in VALID_KEYS:\n                results[key_2][key_3] += data[key_1][key_2][key_3]\nprint results\n
            \n

            Which outputs:

            \n
            {\n    u'1x1': {'Central Spectrum': 10, 'Full Frame': 9},\n    u'4x4': {'Central Spectrum': 10, 'Full Frame': 10},\n    u'4x2': {'Central Spectrum': 10, 'Full Frame': 10}\n}\n
            \n soup wrap:

            Maybe I misunderstood the expected final result, but you might not need counters... A simple sum could suffice if you know that you're only going to have two levels of nesting.

            Let's assume you loaded your json dictionary of dictionaries into a variable called data.

            Then you can do:

            results = {}
            for key in data.keys():
                # key is '20101021', '20101004'...
                # data[key].keys() is '4x4, '4x2'... so let's make sure
                # that the result dictionary contains all those '4x4', '4x2'
                # being zero if nothing better can be calculated.
                results[key] = dict.fromkeys(data[key].keys(), 0)
            
                for sub_key in data[key].keys():
                    # sub_key is '4x4', '4x2'...
                    # Also, don't consider a 'valid value' someting that is not a
                    # "Central Spectrum" or a "Full Frame"
                    valid_values = [
                        int(v) for k, v in data[key][sub_key].items()
                        if k in ["Central Spectrum", "Full Frame"]
                    ]
                    # Now add the 'valid_values'
                    results[key][sub_key] = sum(valid_values)
            print results
            

            Which outputs:

            {
              u'20101021': {u'1x1': 9, u'4x4': 10, u'4x2': 10},
              u'20101004': {u'1x1': 10, u'4x4': 10, u'4x2': 10}
            }
            

            In many cases, I only used dict.keys() because maybe that clarifies the process? (well, and once dict.items()) You also have dict.values() (and all the tree functions have their iterator equivalents) which might shorten your code. Also, see what dict.fromkeys does.

            EDIT (as per OP's comments to this answer)

            If you want data added (or "collected") over time, then you need to need to move your results[key] from the date string (as shown above in the answer) to the 1x1, 4x4...

            VALID_KEYS = ["Central Spectrum", "Full Frame"]
            results = {}
            for key_1 in data.keys():
                # key_1 is '20101021', '20101004'...
            
                for key_2 in data[key_1].keys():
                    # key_2 is '4x4', '4x2'...
                    if key_2 not in results:
                        results[key_2] = dict.fromkeys(VALID_KEYS, 0)
                    for key_3 in data[key_1][key_2].keys():
                        # key_3 is 'Central Spectrum', 'Full Frame', 'Custom'...
                        if key_3 in VALID_KEYS:
                            results[key_2][key_3] += data[key_1][key_2][key_3]
            print results
            

            Which outputs:

            {
                u'1x1': {'Central Spectrum': 10, 'Full Frame': 9},
                u'4x4': {'Central Spectrum': 10, 'Full Frame': 10},
                u'4x2': {'Central Spectrum': 10, 'Full Frame': 10}
            }
            
            qid & accept id: (27255080, 27255239) query: Python unittesting: Test whether two angles are almost equal soup:

            You can use the squared Euclidian distance between two points on the unit circle and the law of cosines to get the absolute difference between two angles:

            \n
            from math import sin, cos, acos\nfrom unittest import assertAlmostEqual        \n\ndef assertAlmostEqualAngles(x, y, **kwargs):\n    c2 = (sin(x)-sin(y))**2 + (cos(x)-cos(y))**2\n    angle_diff = acos((2.0 - c2)/2.0) # a = b = 1\n    assertAlmostEqual(angle_diff, 0.0, **kwargs)\n
            \n

            This works with radians. If the angle is in degrees, you must do a conversion:

            \n
            from math import sin, cos, acos, radians, degrees\nfrom unittest import assertAlmostEqual        \n\ndef assertAlmostEqualAngles(x, y, **kwargs):\n    x,y = radians(x),radians(y)\n    c2 = (sin(x)-sin(y))**2 + (cos(x)-cos(y))**2\n    angle_diff = degrees(acos((2.0 - c2)/2.0))\n    assertAlmostEqual(angle_diff, 0.0, **kwargs)\n
            \n soup wrap:

            You can use the squared Euclidian distance between two points on the unit circle and the law of cosines to get the absolute difference between two angles:

            from math import sin, cos, acos
            from unittest import assertAlmostEqual        
            
            def assertAlmostEqualAngles(x, y, **kwargs):
                c2 = (sin(x)-sin(y))**2 + (cos(x)-cos(y))**2
                angle_diff = acos((2.0 - c2)/2.0) # a = b = 1
                assertAlmostEqual(angle_diff, 0.0, **kwargs)
            

            This works with radians. If the angle is in degrees, you must do a conversion:

            from math import sin, cos, acos, radians, degrees
            from unittest import assertAlmostEqual        
            
            def assertAlmostEqualAngles(x, y, **kwargs):
                x,y = radians(x),radians(y)
                c2 = (sin(x)-sin(y))**2 + (cos(x)-cos(y))**2
                angle_diff = degrees(acos((2.0 - c2)/2.0))
                assertAlmostEqual(angle_diff, 0.0, **kwargs)
            
            qid & accept id: (27257991, 27258191) query: Accept a single string instead of normal parameters soup:

            Python doesn't have method overloading, so your only option is to "play" with the arguments:

            \n

            You can do something that IMHO is very bad (downgraded very bad to meeeeh... so, so after reading @ivan-pozdeev's comments in this answer)

            \n
            class Time:\n    def __init__(self, hours=0, minutes=0, seconds=0, time_now=''):\n        if hours == 'now':\n            tmp_t = now()\n            self.hour = tmp_t.hour\n            self.min = tmp_t.min\n            self.sec = tmp_t.sec\n        else:\n            t = abs(3600*hours + 60*minutes + seconds)\n            self.hour = t//3600\n            self.min = t//60%60\n            self.sec = t%60\n
            \n

            That... well, that works:

            \n
            >>> a = Time('now')\n>>> print vars(a)\n{'sec': 20, 'hour': 15, 'min': 18}\n>>>\n>>> a = Time(hours=19, minutes=4, seconds=5)\n>>> print vars(a)\n{'sec': 5, 'hour': 19, 'min': 4}\n
            \n

            But that leaves the code in a very weird state. Is very difficult to read. I certainly would try to come with a different approach altogether...

            \n

            I also changed the time variable within the __init__ to t because that conflicted with the time name from import time

            \n soup wrap:

            Python doesn't have method overloading, so your only option is to "play" with the arguments:

            You can do something that IMHO is very bad (downgraded very bad to meeeeh... so, so after reading @ivan-pozdeev's comments in this answer)

            class Time:
                def __init__(self, hours=0, minutes=0, seconds=0, time_now=''):
                    if hours == 'now':
                        tmp_t = now()
                        self.hour = tmp_t.hour
                        self.min = tmp_t.min
                        self.sec = tmp_t.sec
                    else:
                        t = abs(3600*hours + 60*minutes + seconds)
                        self.hour = t//3600
                        self.min = t//60%60
                        self.sec = t%60
            

            That... well, that works:

            >>> a = Time('now')
            >>> print vars(a)
            {'sec': 20, 'hour': 15, 'min': 18}
            >>>
            >>> a = Time(hours=19, minutes=4, seconds=5)
            >>> print vars(a)
            {'sec': 5, 'hour': 19, 'min': 4}
            

            But that leaves the code in a very weird state. Is very difficult to read. I certainly would try to come with a different approach altogether...

            I also changed the time variable within the __init__ to t because that conflicted with the time name from import time

            qid & accept id: (27265939, 27266178) query: Comparing Python dictionaries and nested dictionaries soup:

            comparing 2 dictionaries using recursion:

            \n
            d1= {'a':{'b':{'cs':10},'d':{'cs':20}}}\nd2= {'a':{'b':{'cs':30} ,'d':{'cs':20}},'newa':{'q':{'cs':50}}}\n\ndef findDiff(d1, d2, path=""):\n    for k in d1.keys():\n        if not d2.has_key(k):\n            print path, ":"\n            print k + " as key not in d2", "\n"\n        else:\n            if type(d1[k]) is dict:\n                if path == "":\n                    path = k\n                else:\n                    path = path + "->" + k\n                findDiff(d1[k],d2[k], path)\n            else:\n                if d1[k] != d2[k]:\n                    print path, ":"\n                    print " - ", k," : ", d1[k]\n                    print " + ", k," : ", d2[k] \n\nprint "comparing d1 to d2:"\nprint findDiff(d1,d2)\nprint "comparing d2 to d1:"\nprint findDiff(d2,d1)\n
            \n

            Output:

            \n
            comparing d1 to d2:\na->b :\n -  cs  :  10\n +  cs  :  30\nNone\ncomparing d2 to d1:\na->b :\n -  cs  :  30\n +  cs  :  10\na :\nnewa as key not in d2 \n\nNone\n
            \n soup wrap:

            comparing 2 dictionaries using recursion:

            d1= {'a':{'b':{'cs':10},'d':{'cs':20}}}
            d2= {'a':{'b':{'cs':30} ,'d':{'cs':20}},'newa':{'q':{'cs':50}}}
            
            def findDiff(d1, d2, path=""):
                for k in d1.keys():
                    if not d2.has_key(k):
                        print path, ":"
                        print k + " as key not in d2", "\n"
                    else:
                        if type(d1[k]) is dict:
                            if path == "":
                                path = k
                            else:
                                path = path + "->" + k
                            findDiff(d1[k],d2[k], path)
                        else:
                            if d1[k] != d2[k]:
                                print path, ":"
                                print " - ", k," : ", d1[k]
                                print " + ", k," : ", d2[k] 
            
            print "comparing d1 to d2:"
            print findDiff(d1,d2)
            print "comparing d2 to d1:"
            print findDiff(d2,d1)
            

            Output:

            comparing d1 to d2:
            a->b :
             -  cs  :  10
             +  cs  :  30
            None
            comparing d2 to d1:
            a->b :
             -  cs  :  30
             +  cs  :  10
            a :
            newa as key not in d2 
            
            None
            
            qid & accept id: (27270532, 27270676) query: better way to find pattern in string? soup:

            If you just want to match the first four dot-separated numbers in your string, then it's trivial:

            \n
            >>> re.search(r"\d+\.\d+\.\d+\.\d+", a).group()\n'100.80.54.162'\n
            \n

            If you want to do some additional checking (only allowing numbers between 0 and 255), you can:

            \n
            >>> re.search(r"""(?x)\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.\n...                     (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.\n...                     (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.\n...                     (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b""", b).group()\n'100.80.54.93'\n
            \n soup wrap:

            If you just want to match the first four dot-separated numbers in your string, then it's trivial:

            >>> re.search(r"\d+\.\d+\.\d+\.\d+", a).group()
            '100.80.54.162'
            

            If you want to do some additional checking (only allowing numbers between 0 and 255), you can:

            >>> re.search(r"""(?x)\b(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
            ...                     (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
            ...                     (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.
            ...                     (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b""", b).group()
            '100.80.54.93'
            
            qid & accept id: (27277381, 27277615) query: Plot arrays same extension Matlotlib soup:

            This is where the glob standard module shines!\nIt will generate lists of files matching simple format rules.

            \n

            In your case:

            \n
            import glob\nimport numpy as np\n\narray_files = glob.glob('*.corr.npy')\nfor fname in array_files:\n    x = np.load(fname)\n    plt.plot(x)\n
            \n

            glob.glob will operate in the current working directory, so ýou might want to use the absolute path instead:

            \n
            ROOT_DIR = '/some/path/to/array/files/'\narray_files = glob.glob(os.path.join(ROOT_DIR, '*.corr.npy'))\n
            \n
            \n

            I see you use num as an alias for numpy. I think np is the de-facto standard of numpy aliasing, so you could consider using that instead.

            \n soup wrap:

            This is where the glob standard module shines! It will generate lists of files matching simple format rules.

            In your case:

            import glob
            import numpy as np
            
            array_files = glob.glob('*.corr.npy')
            for fname in array_files:
                x = np.load(fname)
                plt.plot(x)
            

            glob.glob will operate in the current working directory, so ýou might want to use the absolute path instead:

            ROOT_DIR = '/some/path/to/array/files/'
            array_files = glob.glob(os.path.join(ROOT_DIR, '*.corr.npy'))
            

            I see you use num as an alias for numpy. I think np is the de-facto standard of numpy aliasing, so you could consider using that instead.

            qid & accept id: (27302220, 27304104) query: pandas pivot_table percentile / quantile soup:

            Dummy data:

            \n
            In [135]: df = pd.DataFrame([['a',2,3],\n                             ['a',5,6],\n                             ['a',7,8], \n                             ['b',9,10], \n                             ['b',11,12], \n                             ['b',13,14]], columns=list('abc'))\n
            \n

            np.percentile seems to work just fine?

            \n
            In [140]: df.pivot_table(columns='a', aggfunc=lambda x: np.percentile(x, 50))\nOut[140]: \na  a   b\nb  5  11\nc  6  12\n
            \n soup wrap:

            Dummy data:

            In [135]: df = pd.DataFrame([['a',2,3],
                                         ['a',5,6],
                                         ['a',7,8], 
                                         ['b',9,10], 
                                         ['b',11,12], 
                                         ['b',13,14]], columns=list('abc'))
            

            np.percentile seems to work just fine?

            In [140]: df.pivot_table(columns='a', aggfunc=lambda x: np.percentile(x, 50))
            Out[140]: 
            a  a   b
            b  5  11
            c  6  12
            
            qid & accept id: (27308840, 27308877) query: Converting C style for loop to python soup:
            for i in range(m, low - 1, -1):\n
            \n

            Keep in mind range is exclusive of the stop parameter.

            \n
            range(...)\n    range(stop) -> list of integers\n    range(start, stop[, step]) -> list of integers\n
            \n

            The difference between this code and the C code is that in Python 2, a list is being constructed in memory by range so for very huge ranges this could be a problem. Replacing range with xrange would not build a list in memory and make the code practically the same. In Python 3 this issue no longer exists.

            \n soup wrap:
            for i in range(m, low - 1, -1):
            

            Keep in mind range is exclusive of the stop parameter.

            range(...)
                range(stop) -> list of integers
                range(start, stop[, step]) -> list of integers
            

            The difference between this code and the C code is that in Python 2, a list is being constructed in memory by range so for very huge ranges this could be a problem. Replacing range with xrange would not build a list in memory and make the code practically the same. In Python 3 this issue no longer exists.

            qid & accept id: (27327513, 27327984) query: Create PDF from a list of images soup:

            Install FPDF for Python:

            \n
            pip install fpdf\n
            \n

            Now you can use the same logic:

            \n
            from fpdf import FPDF\npdf = FPDF()\n# imagelist is the list with all image filenames\nfor image in imagelist:\n    pdf.add_page()\n    pdf.image(image,x,y,w,h)\npdf.output("yourfile.pdf", "F")\n
            \n

            You can find more info at the tutorial page or the official documentation.

            \n soup wrap:

            Install FPDF for Python:

            pip install fpdf
            

            Now you can use the same logic:

            from fpdf import FPDF
            pdf = FPDF()
            # imagelist is the list with all image filenames
            for image in imagelist:
                pdf.add_page()
                pdf.image(image,x,y,w,h)
            pdf.output("yourfile.pdf", "F")
            

            You can find more info at the tutorial page or the official documentation.

            qid & accept id: (27331006, 27338395) query: How to create LinkExtractor rule which based on href in Scrapy soup:

            Test for http://example.com/category/ at the start of the string and the page parameter with one or more digits in the value:

            \n
            Rule(LinkExtractor(allow=('^http://example.com/category/\?.*?(?=page=\d+)', )), callback='parse_item'),\n
            \n

            Demo (using your example urls):

            \n
            >>> import re\n>>> pattern = re.compile(r'^http://example.com/category/\?.*?(?=page=\d+)')\n>>> should_match = [\n...     'http://example.com/category/?sort=a-z&page=1',\n...     'http://example.com/category/?page=1&sort=a-z&cache=1',\n...     'http://example.com/category/?page=1&sort=a-z#'\n... ]\n>>> for url in should_match:\n...     print "Matches" if pattern.search(url) else "Doesn't match"\n... \nMatches\nMatches\nMatches\n
            \n soup wrap:

            Test for http://example.com/category/ at the start of the string and the page parameter with one or more digits in the value:

            Rule(LinkExtractor(allow=('^http://example.com/category/\?.*?(?=page=\d+)', )), callback='parse_item'),
            

            Demo (using your example urls):

            >>> import re
            >>> pattern = re.compile(r'^http://example.com/category/\?.*?(?=page=\d+)')
            >>> should_match = [
            ...     'http://example.com/category/?sort=a-z&page=1',
            ...     'http://example.com/category/?page=1&sort=a-z&cache=1',
            ...     'http://example.com/category/?page=1&sort=a-z#'
            ... ]
            >>> for url in should_match:
            ...     print "Matches" if pattern.search(url) else "Doesn't match"
            ... 
            Matches
            Matches
            Matches
            
            qid & accept id: (27356890, 27362695) query: Proper way to destroy a file chooser dialog in pygtk for python soup:

            There might be a nicer way, but I usually do it like this:

            \n
            from gi.repository import Gtk, Gdk, GLib\n\ndef run_dialog(_None):\n    dialog = Gtk.FileChooserDialog("Please choose a folder", None,\n    Gtk.FileChooserAction.SELECT_FOLDER,\n        (Gtk.STOCK_CANCEL, Gtk.ResponseType.CANCEL,\n        "Select", Gtk.ResponseType.OK))\n\n    response = dialog.run()\n    if response == Gtk.ResponseType.OK:\n        print("Select clicked")\n        print("Folder selected: " + dialog.get_filename())\n    elif response == Gtk.ResponseType.CANCEL:\n        print("Cancel clicked")\n\n    dialog.destroy()\n    Gtk.main_quit()\n\n\nGdk.threads_add_idle(GLib.PRIORITY_DEFAULT, run_dialog, None)\nGtk.main()\n
            \n

            This will call the run_dialog function as soon as the mainloop starts, which will display the dialog and then quit.

            \n

            UPDATE: If you want to enclose that code in a function that returns the selected folder, you'll need to save the path to a non-local variable:

            \n
            def run_folder_chooser_dialog():\n    result= []\n\n    def run_dialog(_None):\n        dialog = Gtk.FileChooserDialog("Please choose a folder", None,\n        Gtk.FileChooserAction.SELECT_FOLDER,\n            (Gtk.STOCK_CANCEL, Gtk.ResponseType.CANCEL,\n            "Select", Gtk.ResponseType.OK))\n\n        response = dialog.run()\n        if response == Gtk.ResponseType.OK:\n            result.append(dialog.get_filename())\n        else:\n            result.append(None)\n\n        dialog.destroy()\n        Gtk.main_quit()\n\n\n    Gdk.threads_add_idle(GLib.PRIORITY_DEFAULT, run_dialog, None)\n    Gtk.main()\n    return result[0]\n
            \n

            In python 3, you can use nonlocal result and result= dialog.get_filename() instead of the ugly list reference.

            \n soup wrap:

            There might be a nicer way, but I usually do it like this:

            from gi.repository import Gtk, Gdk, GLib
            
            def run_dialog(_None):
                dialog = Gtk.FileChooserDialog("Please choose a folder", None,
                Gtk.FileChooserAction.SELECT_FOLDER,
                    (Gtk.STOCK_CANCEL, Gtk.ResponseType.CANCEL,
                    "Select", Gtk.ResponseType.OK))
            
                response = dialog.run()
                if response == Gtk.ResponseType.OK:
                    print("Select clicked")
                    print("Folder selected: " + dialog.get_filename())
                elif response == Gtk.ResponseType.CANCEL:
                    print("Cancel clicked")
            
                dialog.destroy()
                Gtk.main_quit()
            
            
            Gdk.threads_add_idle(GLib.PRIORITY_DEFAULT, run_dialog, None)
            Gtk.main()
            

            This will call the run_dialog function as soon as the mainloop starts, which will display the dialog and then quit.

            UPDATE: If you want to enclose that code in a function that returns the selected folder, you'll need to save the path to a non-local variable:

            def run_folder_chooser_dialog():
                result= []
            
                def run_dialog(_None):
                    dialog = Gtk.FileChooserDialog("Please choose a folder", None,
                    Gtk.FileChooserAction.SELECT_FOLDER,
                        (Gtk.STOCK_CANCEL, Gtk.ResponseType.CANCEL,
                        "Select", Gtk.ResponseType.OK))
            
                    response = dialog.run()
                    if response == Gtk.ResponseType.OK:
                        result.append(dialog.get_filename())
                    else:
                        result.append(None)
            
                    dialog.destroy()
                    Gtk.main_quit()
            
            
                Gdk.threads_add_idle(GLib.PRIORITY_DEFAULT, run_dialog, None)
                Gtk.main()
                return result[0]
            

            In python 3, you can use nonlocal result and result= dialog.get_filename() instead of the ugly list reference.

            qid & accept id: (27407485, 27407796) query: Python program: foreign language word-frequency dictionary soup:

            A simpler version of your code that does pretty much what you want :)

            \n
            import string\nimport collections\n\ndef cleanedup(fh):\n    for line in fh:\n        word = ''\n        for character in line:\n            if character in string.ascii_letters:\n                word += character\n            elif word:\n                yield word\n                word = ''\n\nwith open ('DQ.txt') as doc:\n    wordlist = collections.Counter(cleanedup(doc))\n    print wordlist.most_commond(5)\n
            \n

            Alternative solutions with regular expressions:

            \n
            import re\nimport collections\n\ndef cleandup(fh):\n    for line in fh:\n        for word in re.findall('[a-z]+', line.lower()):\n            yield word\n\nwith open ('DQ.txt') as doc:\n    wordlist = collections.Counter(cleanedup(doc))\n    print wordlist.most_commond(5)\n
            \n

            Or:

            \n
            import re\nimport collections\n\ndef cleandup(fh):\n    for line in fh:\n        for word in re.split('[^a-z]+', line.lower()):\n            yield word\n\nwith open ('DQ.txt') as doc:\n    wordlist = collections.Counter(cleanedup(doc))\n    print wordlist.most_commond(5)\n
            \n soup wrap:

            A simpler version of your code that does pretty much what you want :)

            import string
            import collections
            
            def cleanedup(fh):
                for line in fh:
                    word = ''
                    for character in line:
                        if character in string.ascii_letters:
                            word += character
                        elif word:
                            yield word
                            word = ''
            
            with open ('DQ.txt') as doc:
                wordlist = collections.Counter(cleanedup(doc))
                print wordlist.most_commond(5)
            

            Alternative solutions with regular expressions:

            import re
            import collections
            
            def cleandup(fh):
                for line in fh:
                    for word in re.findall('[a-z]+', line.lower()):
                        yield word
            
            with open ('DQ.txt') as doc:
                wordlist = collections.Counter(cleanedup(doc))
                print wordlist.most_commond(5)
            

            Or:

            import re
            import collections
            
            def cleandup(fh):
                for line in fh:
                    for word in re.split('[^a-z]+', line.lower()):
                        yield word
            
            with open ('DQ.txt') as doc:
                wordlist = collections.Counter(cleanedup(doc))
                print wordlist.most_commond(5)
            
            qid & accept id: (27419345, 27419418) query: How to treat a hex as string? soup:

            Your problem is that you're using str:

            \n
            >>> str(0x61cc1000)\n'1640763392'  # int value of the hex number as a string\n
            \n

            That's because first 0x61cc1000 is evaluated as an int, then str performed on the resulted int.

            \n

            You want to do:

            \n
            "{0:x}".format(0x61cc1000)\n
            \n

            Or

            \n
            '{:#x}'.format(0x61cc1000)\n
            \n

            As already stated in other answer, you can simply:

            \n
            >>> hex(0x61cc1000)\n'0x61cc1000'\n
            \n

            See 6.1.3.1. Format Specification Mini-Language for details.

            \n soup wrap:

            Your problem is that you're using str:

            >>> str(0x61cc1000)
            '1640763392'  # int value of the hex number as a string
            

            That's because first 0x61cc1000 is evaluated as an int, then str performed on the resulted int.

            You want to do:

            "{0:x}".format(0x61cc1000)
            

            Or

            '{:#x}'.format(0x61cc1000)
            

            As already stated in other answer, you can simply:

            >>> hex(0x61cc1000)
            '0x61cc1000'
            

            See 6.1.3.1. Format Specification Mini-Language for details.

            qid & accept id: (27444949, 27445652) query: Build a Pandas pd.tseries.offsets from timedelta soup:

            to_offset returns a pd.DateOffset. So you can directly build this object:

            \n
            >>> td = datetime.timedelta(hours=1)\n>>> pd.DateOffset(seconds=td.total_seconds())\n\n\n>>> to_offset(pd.DateOffset(seconds=td.total_seconds()))\n\n
            \n

            For a slightly nicer string representation:

            \n
            >>> pd.DateOffset(days=td.days, \n                  hours=td.seconds // 3600, \n                  minutes=(td.seconds // 60) % 60)\n\n
            \n soup wrap:

            to_offset returns a pd.DateOffset. So you can directly build this object:

            >>> td = datetime.timedelta(hours=1)
            >>> pd.DateOffset(seconds=td.total_seconds())
            
            
            >>> to_offset(pd.DateOffset(seconds=td.total_seconds()))
            
            

            For a slightly nicer string representation:

            >>> pd.DateOffset(days=td.days, 
                              hours=td.seconds // 3600, 
                              minutes=(td.seconds // 60) % 60)
            
            
            qid & accept id: (27458073, 27458160) query: Uploading Django projects set up within virtual environment on Github soup:

            I suggest that you also install virtualenvwrapper (here). virtualenvwrapper keeps all files except your project at another location so your project directory contains only your files and you can safely use git add --all.

            \n

            After its installed, do:

            \n
            $ mkdir my-project; cd my-project\n$ mkvirtualenv my-env-name\n$ pip install django \n$ pip freeze > requirements.txt\n$ git init; git add --all; git commit -m "Initial Commit"\n... push to github ...\n
            \n

            Now go to other machine, and install virtualenv and virtualenvwrapper

            \n
            $ git clone  my-project; cd my-project \n$ mkvirtualenv my-env-name\n$ pip install -r requirements.txt\n... continue your work, commit and push push and win at life :D\n
            \n soup wrap:

            I suggest that you also install virtualenvwrapper (here). virtualenvwrapper keeps all files except your project at another location so your project directory contains only your files and you can safely use git add --all.

            After its installed, do:

            $ mkdir my-project; cd my-project
            $ mkvirtualenv my-env-name
            $ pip install django 
            $ pip freeze > requirements.txt
            $ git init; git add --all; git commit -m "Initial Commit"
            ... push to github ...
            

            Now go to other machine, and install virtualenv and virtualenvwrapper

            $ git clone  my-project; cd my-project 
            $ mkvirtualenv my-env-name
            $ pip install -r requirements.txt
            ... continue your work, commit and push push and win at life :D
            
            qid & accept id: (27491734, 27492179) query: How do I get rid of dotted line on x axis of Pandas/Matplotlib bar plot? soup:

            pandas adds a dashed horizontal line on the axis of bar plots. There is a line in pandas/tools/plotting.py, in BarPlot._post_plot_logic (line 1842 in my version):

            \n
            ax.axhline(0, color='k', linestyle='--')\n
            \n

            This doesn't seem to be explicitly documented, and there's apparently no way to stop it from doing this. Worse, the plot doesn't keep any reference to the line, so there's no clear way to safely remove it. If the barplot is "plain", then this will work:

            \n
            ax.get_lines()[0].set_visible(False)\n
            \n

            This only works because in the plain barplot, this is the only Line artist in the plot. If you do anything else that adds other lines to the plot, it could get tricky to determine which one is the axis line you want to remove.

            \n soup wrap:

            pandas adds a dashed horizontal line on the axis of bar plots. There is a line in pandas/tools/plotting.py, in BarPlot._post_plot_logic (line 1842 in my version):

            ax.axhline(0, color='k', linestyle='--')
            

            This doesn't seem to be explicitly documented, and there's apparently no way to stop it from doing this. Worse, the plot doesn't keep any reference to the line, so there's no clear way to safely remove it. If the barplot is "plain", then this will work:

            ax.get_lines()[0].set_visible(False)
            

            This only works because in the plain barplot, this is the only Line artist in the plot. If you do anything else that adds other lines to the plot, it could get tricky to determine which one is the axis line you want to remove.

            qid & accept id: (27491988, 27494062) query: "Canonical" offset from UTC using pytz? soup:

            If we take "canonical" to mean the utcoffset of dates that are not in DST, then the problem is reduced to finding dates (for each timezone) which are not DST.

            \n

            We could try the current date first. If it is not DST, then we are in luck. If it is, then we could step through the list of utc transition dates (which are stored in tzone._utc_transition_times) until we find one that is not DST:

            \n
            import pytz\nimport datetime as DT\nutcnow = DT.datetime.utcnow()\n\ncanonical = dict()\nfor name in pytz.all_timezones:\n    tzone = pytz.timezone(name)\n    try:\n        dstoffset = tzone.dst(utcnow, is_dst=False)\n    except TypeError:\n        # pytz.utc.dst does not have a is_dst keyword argument\n        dstoffset = tzone.dst(utcnow)\n    if dstoffset == DT.timedelta(0):\n        # utcnow happens to be in a non-DST period\n        canonical[name] = tzone.localize(utcnow, is_dst=False).strftime('%z') \n    else:\n        # step through the transition times until we find a non-DST datetime\n        for transition in tzone._utc_transition_times[::-1]:\n            dstoffset = tzone.dst(transition, is_dst=False) \n            if dstoffset == DT.timedelta(0):\n                canonical[name] = (tzone.localize(transition, is_dst=False)\n                                   .strftime('%z'))\n                break\n\nfor name, utcoffset in canonical.iteritems():\n    print('{} --> {}'.format(name, utcoffset)) \n\n# All timezones have been accounted for\nassert len(canonical) == len(pytz.all_timezones)\n
            \n

            yields

            \n
            ...\nMexico/BajaNorte --> -0800\nAfrica/Kigali --> +0200\nBrazil/West --> -0400\nAmerica/Grand_Turk --> -0400\nMexico/BajaSur --> -0700\nCanada/Central --> -0600\nAfrica/Lagos --> +0100\nGMT-0 --> +0000\nEurope/Sofia --> +0200\nSingapore --> +0800\nAfrica/Tripoli --> +0200\nAmerica/Anchorage --> -0900\nPacific/Nauru --> +1200\n
            \n

            Note that the code above accesses the private attribute tzone._utc_transition_times. This is an implementation detail in pytz. Since it is not part of the public API, it is not guaranteed to exist in future versions of pytz. Indeed, it does not even exist for all timezones in the current version of pytz -- in particular, it does not exist for timezones that have no DST transition times, such as 'Africa/Bujumbura' for example. (That's why I bother to check if utcnow happens to be in a non-DST time period first.)

            \n

            If you'd like a method which does not rely on private attributes, we could instead simply march utcnow back one day until we find a day which is in a non-DST time period. The code would be a bit slower than the one above, but since you really only have to run this code once to glean the desired information, it really should not matter.

            \n

            Here is what the code would look like without using _utc_transition_times:

            \n
            import pytz\nimport datetime as DT\nutcnow = DT.datetime.utcnow()\n\ncanonical = dict()\nfor name in pytz.all_timezones:\n    tzone = pytz.timezone(name)\n    try:\n        dstoffset = tzone.dst(utcnow, is_dst=False)\n    except TypeError:\n        # pytz.utc.dst does not have a is_dst keyword argument\n        dstoffset = tzone.dst(utcnow)\n    if dstoffset == DT.timedelta(0):\n        # utcnow happens to be in a non-DST period\n        canonical[name] = tzone.localize(utcnow, is_dst=False).strftime('%z') \n    else:\n        # step through the transition times until we find a non-DST datetime\n        date = utcnow\n        while True:\n            date = date - DT.timedelta(days=1)\n            dstoffset = tzone.dst(date, is_dst=False) \n            if dstoffset == DT.timedelta(0):\n                canonical[name] = (tzone.localize(date, is_dst=False)\n                                   .strftime('%z'))\n                break\n\nfor name, utcoffset in canonical.iteritems():\n    print('{} --> {}'.format(name, utcoffset)) \n\n# All timezones have been accounted for\nassert len(canonical) == len(pytz.all_timezones)\n
            \n soup wrap:

            If we take "canonical" to mean the utcoffset of dates that are not in DST, then the problem is reduced to finding dates (for each timezone) which are not DST.

            We could try the current date first. If it is not DST, then we are in luck. If it is, then we could step through the list of utc transition dates (which are stored in tzone._utc_transition_times) until we find one that is not DST:

            import pytz
            import datetime as DT
            utcnow = DT.datetime.utcnow()
            
            canonical = dict()
            for name in pytz.all_timezones:
                tzone = pytz.timezone(name)
                try:
                    dstoffset = tzone.dst(utcnow, is_dst=False)
                except TypeError:
                    # pytz.utc.dst does not have a is_dst keyword argument
                    dstoffset = tzone.dst(utcnow)
                if dstoffset == DT.timedelta(0):
                    # utcnow happens to be in a non-DST period
                    canonical[name] = tzone.localize(utcnow, is_dst=False).strftime('%z') 
                else:
                    # step through the transition times until we find a non-DST datetime
                    for transition in tzone._utc_transition_times[::-1]:
                        dstoffset = tzone.dst(transition, is_dst=False) 
                        if dstoffset == DT.timedelta(0):
                            canonical[name] = (tzone.localize(transition, is_dst=False)
                                               .strftime('%z'))
                            break
            
            for name, utcoffset in canonical.iteritems():
                print('{} --> {}'.format(name, utcoffset)) 
            
            # All timezones have been accounted for
            assert len(canonical) == len(pytz.all_timezones)
            

            yields

            ...
            Mexico/BajaNorte --> -0800
            Africa/Kigali --> +0200
            Brazil/West --> -0400
            America/Grand_Turk --> -0400
            Mexico/BajaSur --> -0700
            Canada/Central --> -0600
            Africa/Lagos --> +0100
            GMT-0 --> +0000
            Europe/Sofia --> +0200
            Singapore --> +0800
            Africa/Tripoli --> +0200
            America/Anchorage --> -0900
            Pacific/Nauru --> +1200
            

            Note that the code above accesses the private attribute tzone._utc_transition_times. This is an implementation detail in pytz. Since it is not part of the public API, it is not guaranteed to exist in future versions of pytz. Indeed, it does not even exist for all timezones in the current version of pytz -- in particular, it does not exist for timezones that have no DST transition times, such as 'Africa/Bujumbura' for example. (That's why I bother to check if utcnow happens to be in a non-DST time period first.)

            If you'd like a method which does not rely on private attributes, we could instead simply march utcnow back one day until we find a day which is in a non-DST time period. The code would be a bit slower than the one above, but since you really only have to run this code once to glean the desired information, it really should not matter.

            Here is what the code would look like without using _utc_transition_times:

            import pytz
            import datetime as DT
            utcnow = DT.datetime.utcnow()
            
            canonical = dict()
            for name in pytz.all_timezones:
                tzone = pytz.timezone(name)
                try:
                    dstoffset = tzone.dst(utcnow, is_dst=False)
                except TypeError:
                    # pytz.utc.dst does not have a is_dst keyword argument
                    dstoffset = tzone.dst(utcnow)
                if dstoffset == DT.timedelta(0):
                    # utcnow happens to be in a non-DST period
                    canonical[name] = tzone.localize(utcnow, is_dst=False).strftime('%z') 
                else:
                    # step through the transition times until we find a non-DST datetime
                    date = utcnow
                    while True:
                        date = date - DT.timedelta(days=1)
                        dstoffset = tzone.dst(date, is_dst=False) 
                        if dstoffset == DT.timedelta(0):
                            canonical[name] = (tzone.localize(date, is_dst=False)
                                               .strftime('%z'))
                            break
            
            for name, utcoffset in canonical.iteritems():
                print('{} --> {}'.format(name, utcoffset)) 
            
            # All timezones have been accounted for
            assert len(canonical) == len(pytz.all_timezones)
            
            qid & accept id: (27551521, 27551678) query: Auto validate a function parameter using a method soup:

            You can use a url processor.

            \n
            @app.url_value_preprocessor\ndef _is_valid_token(endpoint, values):\n    if 'token' not in values:\n        return\n\n    if values['token'] != TOKEN:\n        abort(400)\n
            \n

            This runs for all routes, but only does the validation if the route actually has a 'token' value. There are of course many other checks you could do beforehand to limit validation, such as basing it on specific endpoint names, but this is the most general function.

            \n
            \n

            You can also just decorate the specific functions you want to validate. This would be more general than the Flask solution.

            \n
            def _is_valid_token(f):\n    @wraps(f)\n    def decorated(token, *args, **kwargs):\n        if token != TOKEN:\n            abort(400)\n\n        return f(token, *args, **kwargs):\n\n    return decorated\n\n@app.route(...)\n@_is_valid_token\ndef create_new_game(token, ...):\n    ...\n
            \n soup wrap:

            You can use a url processor.

            @app.url_value_preprocessor
            def _is_valid_token(endpoint, values):
                if 'token' not in values:
                    return
            
                if values['token'] != TOKEN:
                    abort(400)
            

            This runs for all routes, but only does the validation if the route actually has a 'token' value. There are of course many other checks you could do beforehand to limit validation, such as basing it on specific endpoint names, but this is the most general function.


            You can also just decorate the specific functions you want to validate. This would be more general than the Flask solution.

            def _is_valid_token(f):
                @wraps(f)
                def decorated(token, *args, **kwargs):
                    if token != TOKEN:
                        abort(400)
            
                    return f(token, *args, **kwargs):
            
                return decorated
            
            @app.route(...)
            @_is_valid_token
            def create_new_game(token, ...):
                ...
            
            qid & accept id: (27551921, 27552377) query: how to extend ambiguous dna sequence soup:

            Perhaps a little shorter and faster way, since by all odds this function is going to be used on very large data:

            \n
            from Bio import Seq\nfrom itertools import product\n\ndef extend_ambiguous_dna(seq):\n   """return list of all possible sequences given an ambiguous DNA input"""\n   d = Seq.IUPAC.IUPACData.ambiguous_dna_values\n   return [ list(map("".join, product(*map(d.get, seq)))) ]\n
            \n

            Using map allows your loops to be executed in C rather than in Python. This should prove much faster than using plain loops or even list comprehensions.

            \n

            Field testing

            \n

            With a simple dict as d instead of the one returned by ambiguous_na_values

            \n
            from itertools import product\nimport time\n\nd = { "N": ["A", "G", "T", "C"], "R": ["C", "A", "T", "G"] }\nseq = "RNRN"\n\n# using list comprehensions\nlst_start = time.time()\n[ "".join(i) for i in product(*[ d[j] for j in seq ]) ]\nlst_end = time.time()\n\n# using map\nmap_start = time.time()\n[ list(map("".join, product(*map(d.get, seq)))) ]\nmap_end = time.time()\n\nlst_delay = (lst_end - lst_start) * 1000\nmap_delay = (map_end - map_start) * 1000\n\nprint("List delay: {} ms".format(round(lst_delay, 2)))\nprint("Map delay: {} ms".format(round(map_delay, 2)))\n
            \n

            Outputs:

            \n
            # len(seq) = 2:\nList delay: 0.02 ms\nMap delay: 0.01 ms\n\n# len(seq) = 3:\nList delay: 0.04 ms\nMap delay: 0.02 ms\n\n# len(seq) = 4\nList delay: 0.08 ms\nMap delay: 0.06 ms\n\n# len(seq) = 5\nList delay: 0.43 ms\nMap delay: 0.17 ms\n\n# len(seq) = 10\nList delay: 126.68 ms\nMap delay: 77.15 ms\n\n# len(seq) = 12\nList delay: 1887.53 ms\nMap delay: 1320.49 ms\n
            \n

            Clearly map is better, but just by a factor of 2 or 3. It's certain it could be further optimised.

            \n soup wrap:

            Perhaps a little shorter and faster way, since by all odds this function is going to be used on very large data:

            from Bio import Seq
            from itertools import product
            
            def extend_ambiguous_dna(seq):
               """return list of all possible sequences given an ambiguous DNA input"""
               d = Seq.IUPAC.IUPACData.ambiguous_dna_values
               return [ list(map("".join, product(*map(d.get, seq)))) ]
            

            Using map allows your loops to be executed in C rather than in Python. This should prove much faster than using plain loops or even list comprehensions.

            Field testing

            With a simple dict as d instead of the one returned by ambiguous_na_values

            from itertools import product
            import time
            
            d = { "N": ["A", "G", "T", "C"], "R": ["C", "A", "T", "G"] }
            seq = "RNRN"
            
            # using list comprehensions
            lst_start = time.time()
            [ "".join(i) for i in product(*[ d[j] for j in seq ]) ]
            lst_end = time.time()
            
            # using map
            map_start = time.time()
            [ list(map("".join, product(*map(d.get, seq)))) ]
            map_end = time.time()
            
            lst_delay = (lst_end - lst_start) * 1000
            map_delay = (map_end - map_start) * 1000
            
            print("List delay: {} ms".format(round(lst_delay, 2)))
            print("Map delay: {} ms".format(round(map_delay, 2)))
            

            Outputs:

            # len(seq) = 2:
            List delay: 0.02 ms
            Map delay: 0.01 ms
            
            # len(seq) = 3:
            List delay: 0.04 ms
            Map delay: 0.02 ms
            
            # len(seq) = 4
            List delay: 0.08 ms
            Map delay: 0.06 ms
            
            # len(seq) = 5
            List delay: 0.43 ms
            Map delay: 0.17 ms
            
            # len(seq) = 10
            List delay: 126.68 ms
            Map delay: 77.15 ms
            
            # len(seq) = 12
            List delay: 1887.53 ms
            Map delay: 1320.49 ms
            

            Clearly map is better, but just by a factor of 2 or 3. It's certain it could be further optimised.

            qid & accept id: (27573800, 27573919) query: combine list of dictionaries with same key soup:
            orig_data = [{'Range': '192.168.1.1-192.168.1.254', 'Org_ID': 'TX', 'name': 'TX-Dallas'}, {'Range': '192.168.2.1-192.168.2.254', 'Org_ID': 'TX', 'name': 'TX-Dallas'}, {'Range': '192.168.3.1-192.168.3.254', 'Org_ID': 'TX', 'name': 'TX-Dallas'}, {'Range': '10.0.0.1-10.0.0.254', 'Org_ID': 'TX', 'name': 'TX-Dallas'}, {'Range': '192.168.9.1-192.168.1.254', 'Org_ID': 'CA', 'name': 'CA-San Diego'}, {'Range': '10.0.5.1-10.0.5.254', 'Org_ID': 'CA', 'name': 'CA-San Diego'}, {'Range': '172.16.0.1-172.16.0.254', 'Org_ID': 'TX', 'name': 'TX-Houston'}, {'Range': '172.16.3.1-172.16.3.254', 'Org_ID': 'TX', 'name': 'TX-Houston'}]\n\ncont = collections.defaultdict(lambda : collections.defaultdict(list))\nfor d in orig_data:\n    cont[d['Org_ID']][d['name']].append(d['Range'])\n\nanswer = []\nfor orgid in cont:\n    for name,rangelist in cont[orgid].items():\n        answer.append({'Org_ID':orgid, 'name':name, 'Range':rangelist})\n
            \n

            Output:

            \n
            In [226]: answer\nOut[226]: \n[{'name': 'TX-Houston',\n  'Org_ID': 'TX',\n  'Range': ['172.16.0.1-172.16.0.254', '172.16.3.1-172.16.3.254']},\n {'name': 'TX-Dallas',\n  'Org_ID': 'TX',\n  'Range': ['192.168.1.1-192.168.1.254',\n   '192.168.2.1-192.168.2.254',\n   '192.168.3.1-192.168.3.254',\n   '10.0.0.1-10.0.0.254']},\n {'name': 'CA-San Diego',\n  'Org_ID': 'CA',\n  'Range': ['192.168.9.1-192.168.1.254', '10.0.5.1-10.0.5.254']}]\n
            \n soup wrap:
            orig_data = [{'Range': '192.168.1.1-192.168.1.254', 'Org_ID': 'TX', 'name': 'TX-Dallas'}, {'Range': '192.168.2.1-192.168.2.254', 'Org_ID': 'TX', 'name': 'TX-Dallas'}, {'Range': '192.168.3.1-192.168.3.254', 'Org_ID': 'TX', 'name': 'TX-Dallas'}, {'Range': '10.0.0.1-10.0.0.254', 'Org_ID': 'TX', 'name': 'TX-Dallas'}, {'Range': '192.168.9.1-192.168.1.254', 'Org_ID': 'CA', 'name': 'CA-San Diego'}, {'Range': '10.0.5.1-10.0.5.254', 'Org_ID': 'CA', 'name': 'CA-San Diego'}, {'Range': '172.16.0.1-172.16.0.254', 'Org_ID': 'TX', 'name': 'TX-Houston'}, {'Range': '172.16.3.1-172.16.3.254', 'Org_ID': 'TX', 'name': 'TX-Houston'}]
            
            cont = collections.defaultdict(lambda : collections.defaultdict(list))
            for d in orig_data:
                cont[d['Org_ID']][d['name']].append(d['Range'])
            
            answer = []
            for orgid in cont:
                for name,rangelist in cont[orgid].items():
                    answer.append({'Org_ID':orgid, 'name':name, 'Range':rangelist})
            

            Output:

            In [226]: answer
            Out[226]: 
            [{'name': 'TX-Houston',
              'Org_ID': 'TX',
              'Range': ['172.16.0.1-172.16.0.254', '172.16.3.1-172.16.3.254']},
             {'name': 'TX-Dallas',
              'Org_ID': 'TX',
              'Range': ['192.168.1.1-192.168.1.254',
               '192.168.2.1-192.168.2.254',
               '192.168.3.1-192.168.3.254',
               '10.0.0.1-10.0.0.254']},
             {'name': 'CA-San Diego',
              'Org_ID': 'CA',
              'Range': ['192.168.9.1-192.168.1.254', '10.0.5.1-10.0.5.254']}]
            
            qid & accept id: (27576462, 27579524) query: How to load_files and process a .txt file with scikit-learn? soup:

            I'm not familiar with skikit-learn, which may have something better, but you could do what you describe if the files are in the format shown using something relatively simple as illustrated by the following function:

            \n
            import ast\nimport glob\nimport os\n\ndef my_load_files(folder, pattern):\n    pathname = os.path.join(folder, pattern)\n    for filename in glob.glob(pathname):\n        with open(filename) as file:\n            yield ast.literal_eval(file.read())\n\ntext_folder = 'C:/Users/username/Desktop/Samples'\nprint [[' '.join(x) for x in sample]\n                        for sample in my_load_files(text_folder, 'File_*')]\n
            \n

            Note: Since there's a label at the end of each file (and yourtraining_data), you might want to use the following instead which would leave it out of what is passed to the feature_hasher_vect.transform() method:

            \n
            print [[' '.join(x) for x in sample[:-1]]\n                        for sample in my_load_files(text_folder, 'File_*')]\n
            \n soup wrap:

            I'm not familiar with skikit-learn, which may have something better, but you could do what you describe if the files are in the format shown using something relatively simple as illustrated by the following function:

            import ast
            import glob
            import os
            
            def my_load_files(folder, pattern):
                pathname = os.path.join(folder, pattern)
                for filename in glob.glob(pathname):
                    with open(filename) as file:
                        yield ast.literal_eval(file.read())
            
            text_folder = 'C:/Users/username/Desktop/Samples'
            print [[' '.join(x) for x in sample]
                                    for sample in my_load_files(text_folder, 'File_*')]
            

            Note: Since there's a label at the end of each file (and yourtraining_data), you might want to use the following instead which would leave it out of what is passed to the feature_hasher_vect.transform() method:

            print [[' '.join(x) for x in sample[:-1]]
                                    for sample in my_load_files(text_folder, 'File_*')]
            
            qid & accept id: (27585173, 27585204) query: Using multiple (similar) generator expressions soup:

            str.translate might be appropriate; something along the lines of

            \n
            replacements = [\n    ('abc', 'x'),\n    ('def', 'y'),\n    ('ghi', 'z'),\n]\n\ntrans = str.maketrans({ k: v for l, v in replacements for k in l })\n
            \n

            and

            \n
            new_row = [item.translate(trans) for item in row]\n
            \n soup wrap:

            str.translate might be appropriate; something along the lines of

            replacements = [
                ('abc', 'x'),
                ('def', 'y'),
                ('ghi', 'z'),
            ]
            
            trans = str.maketrans({ k: v for l, v in replacements for k in l })
            

            and

            new_row = [item.translate(trans) for item in row]
            
            qid & accept id: (27591621, 27619941) query: NLTK convert tokenized sentence to synset format soup:

            You can use a simple conversion function:

            \n
            from nltk.corpus import wordnet as wn\n\ndef penn_to_wn(tag):\n    if tag.startswith('J'):\n        return wn.ADJ\n    elif tag.startswith('N'):\n        return wn.NOUN\n    elif tag.startswith('R'):\n        return wn.ADV\n    elif tag.startswith('V'):\n        return wn.VERB\n    return None\n
            \n

            After tagging a sentence you can tie a word inside the sentence with a SYNSET using this function. Here's an example:

            \n
            from nltk.stem import WordNetLemmatizer\nfrom nltk import pos_tag, word_tokenize\n\nsentence = "I am going to buy some gifts"\ntagged = pos_tag(word_tokenize(sentence))\n\nsynsets = []\nlemmatzr = WordNetLemmatizer()\n\nfor token in tagged:\n    wn_tag = penn_to_wn(token[1])\n    if not wn_tag:\n        continue\n\n    lemma = lemmatzr.lemmatize(token[0], pos=wn_tag)\n    synsets.append(wn.synsets(lemma, pos=wn_tag)[0])\n\nprint synsets\n
            \n

            Result: [Synset('be.v.01'), Synset('travel.v.01'), Synset('buy.v.01'), Synset('gift.n.01')]

            \n soup wrap:

            You can use a simple conversion function:

            from nltk.corpus import wordnet as wn
            
            def penn_to_wn(tag):
                if tag.startswith('J'):
                    return wn.ADJ
                elif tag.startswith('N'):
                    return wn.NOUN
                elif tag.startswith('R'):
                    return wn.ADV
                elif tag.startswith('V'):
                    return wn.VERB
                return None
            

            After tagging a sentence you can tie a word inside the sentence with a SYNSET using this function. Here's an example:

            from nltk.stem import WordNetLemmatizer
            from nltk import pos_tag, word_tokenize
            
            sentence = "I am going to buy some gifts"
            tagged = pos_tag(word_tokenize(sentence))
            
            synsets = []
            lemmatzr = WordNetLemmatizer()
            
            for token in tagged:
                wn_tag = penn_to_wn(token[1])
                if not wn_tag:
                    continue
            
                lemma = lemmatzr.lemmatize(token[0], pos=wn_tag)
                synsets.append(wn.synsets(lemma, pos=wn_tag)[0])
            
            print synsets
            

            Result: [Synset('be.v.01'), Synset('travel.v.01'), Synset('buy.v.01'), Synset('gift.n.01')]

            qid & accept id: (27605834, 27605853) query: Test if two lists of lists are equal soup:
            l1 = [['a',1], ['b',2], ['c',3]]\nl2 = [['b',2], ['c',3], ['a',1]]\nprint sorted(l1) == sorted(l2)\n
            \n

            Result:

            \n
            True\n
            \n soup wrap:
            l1 = [['a',1], ['b',2], ['c',3]]
            l2 = [['b',2], ['c',3], ['a',1]]
            print sorted(l1) == sorted(l2)
            

            Result:

            True
            
            qid & accept id: (27615872, 27615948) query: Make one list from two list applying constraint soup:

            The way you are doing it is good because it is very readable... but if a one-liner is what you are after the I will oblige:

            \n
            >>> A = [2,3,1,4,5,2,4]\n>>> B = [4,2,3,6,2,5,1]\n>>> [i for sublist in [[a, b] if a < b else [b, a] for a, b in zip(A, B)] for i in sublist]\n[2, 4, 2, 3, 1, 3, 4, 6, 2, 5, 2, 5, 1, 4]\n
            \n

            Few notes:

            \n
              \n
            1. When you add a conditional to a list comp, have the if - else right after the first variable in the list comp. ['a' if i in (2, 4, 16) else 'b' for i in [1, 2, 3, 16, 24]]

            2. \n
            3. The best way to construct (mentally) nested list comprehensions is to think of it how you would write it in a normal loop.

            4. \n
            \n
            \n
            C = [[a, b] if a < b else [b, a] for a, b in zip(A, B)]\nfor sublist in C:\n    for i in sublist:\n        yield i\n
            \n

            Then you just flatten the nested loops and move the yield i to the front, dropping the yield.

            \n
            for sublist in C for i in sublist yield i\n|-> yield i for sublist in C for i in sublist\n    |-> i for sublist in C for i in sublist\n
            \n

            Now you can just replace C with the list comp above and get the one-liner I posted.

            \n soup wrap:

            The way you are doing it is good because it is very readable... but if a one-liner is what you are after the I will oblige:

            >>> A = [2,3,1,4,5,2,4]
            >>> B = [4,2,3,6,2,5,1]
            >>> [i for sublist in [[a, b] if a < b else [b, a] for a, b in zip(A, B)] for i in sublist]
            [2, 4, 2, 3, 1, 3, 4, 6, 2, 5, 2, 5, 1, 4]
            

            Few notes:

            1. When you add a conditional to a list comp, have the if - else right after the first variable in the list comp. ['a' if i in (2, 4, 16) else 'b' for i in [1, 2, 3, 16, 24]]

            2. The best way to construct (mentally) nested list comprehensions is to think of it how you would write it in a normal loop.


            C = [[a, b] if a < b else [b, a] for a, b in zip(A, B)]
            for sublist in C:
                for i in sublist:
                    yield i
            

            Then you just flatten the nested loops and move the yield i to the front, dropping the yield.

            for sublist in C for i in sublist yield i
            |-> yield i for sublist in C for i in sublist
                |-> i for sublist in C for i in sublist
            

            Now you can just replace C with the list comp above and get the one-liner I posted.

            qid & accept id: (27665039, 27665097) query: Python: Scrape Data from Web after Inputing Info soup:

            The main issue was that you needed to make a GET request, not a POST.

            \n

            Plus, @Paul Lo is right about the date ranges. For the sake of example, I'm querying from 2010 to 2015.

            \n

            Also, you have to pass query parameters as strings. 00 evaluated to 0, requests converted int 0 to a "0" string. As a result, instead of 00 for a month, you had 0 sent as a parameter value.

            \n

            Here is a fixed version with a modified part that gets the amounts:

            \n
            from lxml import html\nimport requests\n\ndef historic_quotes(symbol, stMonth, stDate, stYear, enMonth, enDate, enYear):\n    url = 'https://finance.yahoo.com/q/hp?s=%s+Historical+Prices' % symbol\n\n    params = {\n        'a': stMonth,\n        'b': stDate,\n        'c': stYear,\n        'd': enMonth,\n        'e': enDate,\n        'f': enYear,\n        'submit': 'submit',\n    }\n    response = requests.get(url, params=params)\n\n    tree = html.document_fromstring(response.content)\n    for amount in tree.xpath('//table[@class="yfnc_datamodoutline1"]//tr[td[@class="yfnc_tabledata1"]]//td[5]/text()'):\n        print amount\n\nhistoric_quotes('baba', '00', '11', '2010', '00', '11', '2015')\n
            \n

            Prints:

            \n
            105.95\n105.95\n105.52\n108.77\n110.65\n109.25\n109.02\n105.77\n104.70\n105.11\n104.97\n103.88\n107.48\n105.07\n107.90\n...\n90.57\n
            \n soup wrap:

            The main issue was that you needed to make a GET request, not a POST.

            Plus, @Paul Lo is right about the date ranges. For the sake of example, I'm querying from 2010 to 2015.

            Also, you have to pass query parameters as strings. 00 evaluated to 0, requests converted int 0 to a "0" string. As a result, instead of 00 for a month, you had 0 sent as a parameter value.

            Here is a fixed version with a modified part that gets the amounts:

            from lxml import html
            import requests
            
            def historic_quotes(symbol, stMonth, stDate, stYear, enMonth, enDate, enYear):
                url = 'https://finance.yahoo.com/q/hp?s=%s+Historical+Prices' % symbol
            
                params = {
                    'a': stMonth,
                    'b': stDate,
                    'c': stYear,
                    'd': enMonth,
                    'e': enDate,
                    'f': enYear,
                    'submit': 'submit',
                }
                response = requests.get(url, params=params)
            
                tree = html.document_fromstring(response.content)
                for amount in tree.xpath('//table[@class="yfnc_datamodoutline1"]//tr[td[@class="yfnc_tabledata1"]]//td[5]/text()'):
                    print amount
            
            historic_quotes('baba', '00', '11', '2010', '00', '11', '2015')
            

            Prints:

            105.95
            105.95
            105.52
            108.77
            110.65
            109.25
            109.02
            105.77
            104.70
            105.11
            104.97
            103.88
            107.48
            105.07
            107.90
            ...
            90.57
            
            qid & accept id: (27670683, 27682388) query: Parallelize DictVectorizer Creation soup:

            I don't think there is an efficient way to combine the output of several DictVectorizer. You can probably hack something together by making a first pass only fitting to build the dictionary, then combine the dictionaries into a big one with all your features and finally transform with the whole set passed to each DictVectorizer and finally stack the result matrices. This is unnecessarily complicated and won't guarantee you a speed increase.

            \n

            Parallelization is the ideal use case for a FeatureHasher. It can also accept dictionaries over (feature_name, value). For example:

            \n
            from sklearn.feature_extraction FeatureHasher\nimport scipy\n\nvect = FeatureHasher(n_features=4, non_negative=True)\n\n# thread 1 \nl1 = [{'foo': 1, 'bar': 2}]\nX1 = vect.fit_transform(l1) \n# thread 2\nl2 = [{'foo': 3, 'baz': 1}]\nX2 = vect.fit_transform(l2)\n
            \n

            At the end combine the results:

            \n
            >>> scipy.sparse.vstack([X1, X2]).toarray()\narray([[ 1.,  2.,  0.,  0.],\n       [ 3.,  0.,  1.,  0.]])\n
            \n

            Just make sure that you use a large enough number of features (like 2**18) so that there are no collisions.

            \n soup wrap:

            I don't think there is an efficient way to combine the output of several DictVectorizer. You can probably hack something together by making a first pass only fitting to build the dictionary, then combine the dictionaries into a big one with all your features and finally transform with the whole set passed to each DictVectorizer and finally stack the result matrices. This is unnecessarily complicated and won't guarantee you a speed increase.

            Parallelization is the ideal use case for a FeatureHasher. It can also accept dictionaries over (feature_name, value). For example:

            from sklearn.feature_extraction FeatureHasher
            import scipy
            
            vect = FeatureHasher(n_features=4, non_negative=True)
            
            # thread 1 
            l1 = [{'foo': 1, 'bar': 2}]
            X1 = vect.fit_transform(l1) 
            # thread 2
            l2 = [{'foo': 3, 'baz': 1}]
            X2 = vect.fit_transform(l2)
            

            At the end combine the results:

            >>> scipy.sparse.vstack([X1, X2]).toarray()
            array([[ 1.,  2.,  0.,  0.],
                   [ 3.,  0.,  1.,  0.]])
            

            Just make sure that you use a large enough number of features (like 2**18) so that there are no collisions.

            qid & accept id: (27674145, 27674179) query: Python: Create Dictionary From List with [0] = Key and [1:]= Values soup:

            You can use dictionary expression

            \n
            data = [['cups', 'cusp', 'cpus', 'cpsu', 'csup', 'cspu',],\n        ['pups', 'pusp','upsp', 'upps', 'upsp', 'uspp']]\n\nresult = {each[0]:each[1:] for each in data}           \nprint result\n
            \n

            Yields:

            \n
            {'pups': ['pusp', 'upsp', 'upps', 'upsp', 'uspp'], \n'cups': ['cusp', 'cpus', 'cpsu', 'csup', 'cspu']}\n
            \n soup wrap:

            You can use dictionary expression

            data = [['cups', 'cusp', 'cpus', 'cpsu', 'csup', 'cspu',],
                    ['pups', 'pusp','upsp', 'upps', 'upsp', 'uspp']]
            
            result = {each[0]:each[1:] for each in data}           
            print result
            

            Yields:

            {'pups': ['pusp', 'upsp', 'upps', 'upsp', 'uspp'], 
            'cups': ['cusp', 'cpus', 'cpsu', 'csup', 'cspu']}
            
            qid & accept id: (27676866, 28032702) query: creating namedtuple instances with kwargs soup:

            This works, and couldn't be any more compact:

            \n
            >>> tup4 = My_tuple(**dict(zip(vars, vals)))\n>>> tup4\nMy_tuple(var1='val1', var2='val2')\n
            \n

            To answer your side question – "is there any difference between them?" – no, they're all the same:

            \n
            >>> tup1 == tup2 == tup3 == tup4\nTrue\n
            \n soup wrap:

            This works, and couldn't be any more compact:

            >>> tup4 = My_tuple(**dict(zip(vars, vals)))
            >>> tup4
            My_tuple(var1='val1', var2='val2')
            

            To answer your side question – "is there any difference between them?" – no, they're all the same:

            >>> tup1 == tup2 == tup3 == tup4
            True
            
            qid & accept id: (27724543, 27724768) query: creating a wxpython scrolled window (frame) by an event soup:
            def newFrame(self, event):\n    self.new_window = wx.Frame(self, title='frame2', size=(500, 500), pos=(800,0))\n\n    self.scroll = wx.ScrolledWindow(self.new_window, -1)\n    self.scroll.SetScrollbars(1, 1, 1600, 1400)\n    self.new_window.Layout()\n    self.new_window.Fit()\n    self.new_window.Show()\n
            \n

            you need to layout the new window ... since you clearly want it to fill the 500,500 area you will need to use sizers

            \n
            def newFrame(self, event):\n    self.new_window = wx.Frame(self, title='frame2', size=(500, 500), pos=(800,0))\n    sz = wx.BoxSizer()\n    sz.SetMinSize((500,500)) #force minimum size\n    self.scroll = wx.ScrolledWindow(self.new_window, -1)\n    sz.Add(self.scroll,1,wx.EXPAND)\n    self.scroll.SetScrollbars(1, 1, 1600, 1400)\n    self.new_window.SetSizer(sz)\n    self.new_window.Layout()\n    self.new_window.Fit()\n    self.new_window.Show()\n
            \n

            or just force the size of the contained scrollwindow (which is what you normally do for scrolled windows)

            \n
            def newFrame(self, event):\n    self.new_window = wx.Frame(self, title='frame2', pos=(800,0))\n\n    self.scroll = wx.ScrolledWindow(self.new_window, -1,size=(500,500))\n    self.scroll.SetScrollbars(1, 1, 1600, 1400)\n    self.new_window.Layout()\n    self.new_window.Fit()\n    self.new_window.Show()\n
            \n soup wrap:
            def newFrame(self, event):
                self.new_window = wx.Frame(self, title='frame2', size=(500, 500), pos=(800,0))
            
                self.scroll = wx.ScrolledWindow(self.new_window, -1)
                self.scroll.SetScrollbars(1, 1, 1600, 1400)
                self.new_window.Layout()
                self.new_window.Fit()
                self.new_window.Show()
            

            you need to layout the new window ... since you clearly want it to fill the 500,500 area you will need to use sizers

            def newFrame(self, event):
                self.new_window = wx.Frame(self, title='frame2', size=(500, 500), pos=(800,0))
                sz = wx.BoxSizer()
                sz.SetMinSize((500,500)) #force minimum size
                self.scroll = wx.ScrolledWindow(self.new_window, -1)
                sz.Add(self.scroll,1,wx.EXPAND)
                self.scroll.SetScrollbars(1, 1, 1600, 1400)
                self.new_window.SetSizer(sz)
                self.new_window.Layout()
                self.new_window.Fit()
                self.new_window.Show()
            

            or just force the size of the contained scrollwindow (which is what you normally do for scrolled windows)

            def newFrame(self, event):
                self.new_window = wx.Frame(self, title='frame2', pos=(800,0))
            
                self.scroll = wx.ScrolledWindow(self.new_window, -1,size=(500,500))
                self.scroll.SetScrollbars(1, 1, 1600, 1400)
                self.new_window.Layout()
                self.new_window.Fit()
                self.new_window.Show()
            
            qid & accept id: (27733482, 27734399) query: pandas: Rolling correlation with fixed patch for pattern-matching soup:

            Clearly, the copious use of reset_index is a signal that we are fighting with Panda's indexing and automatic alignment. Oh, how much easier things would be if we could just forget about the index!\nIndeed, that is what NumPy is for. (Generally speaking, use Pandas when you need alignment or grouping by index, use NumPy when doing computation on N-dimensional arrays.)

            \n

            Using NumPy will make the computation much faster because we will be able to remove the for-loop and handle all the computations done in the for-loop as one computation done on a NumPy array of rolling windows.

            \n

            We can look inside pandas/core/frame.py's DataFrame.corrwith to see how the computation is done. Then translate it into corresponding code done on NumPy arrays, making adjustments as necessary for the fact that we want to do the computations on a whole array full of rolling windows instead of just one window at a time, while keeping patch constant. (Note: the Pandas corrwith method handles NaNs. To keep the code a bit simpler I've assumed there are no NaNs in the inputs.)

            \n
            import numpy as np\nimport pandas as pd\nfrom pandas import Series\nfrom pandas import DataFrame\nimport numpy.lib.stride_tricks as stride\nnp.random.seed(1)\n\nn = 10\nrng = pd.date_range('1/1/2000 00:00:00', periods=n, freq='5min')\ndf = DataFrame(np.random.rand(n, 1), columns=['a'], index=rng)\n\nm = 4\nrng = pd.date_range('1/1/2000 00:10:00', periods=m, freq='5min')\npatch = DataFrame(np.arange(m), columns=['a'], index=rng)\n\ndef orig(df, patch):\n    patch.reset_index(inplace=True, drop=True)\n\n    df['corr'] = np.nan\n\n    for i in range(df.shape[0]):\n        window = df[i : i+patch.shape[0]]\n        if window.shape[0] != patch.shape[0] :\n            break\n        else:\n            window.reset_index(inplace=True, drop=True)\n            corr = window.corrwith(patch)\n\n            df['corr'][i] = corr.a\n\n    return df\n\ndef using_numpy(df, patch):\n    left = df['a'].values\n    itemsize = left.itemsize\n    left = stride.as_strided(left, shape=(n-m+1, m), strides = (itemsize, itemsize))\n\n    right = patch['a'].values\n\n    ldem = left - left.mean(axis=1)[:, None]\n    rdem = right - right.mean()\n\n    num = (ldem * rdem).sum(axis=1)\n    dom = (m - 1) * np.sqrt(left.var(axis=1, ddof=1) * right.var(ddof=1))\n    correl = num/dom\n\n    df.ix[:len(correl), 'corr'] = correl\n    return df\n\nexpected = orig(df.copy(), patch.copy())\nresult = using_numpy(df.copy(), patch.copy())\n\nprint(expected)\nprint(result)\n
            \n

            This confirms that the values values generated by orig and using_numpy are \nthe same:

            \n
            assert np.allclose(expected['corr'].dropna(), result['corr'].dropna())\n
            \n
            \n

            Technical note:

            \n

            To create the array full of rolling windows in a memory-friendly manner, I used a striding trick I learned here.

            \n
            \n

            Here is a benchmark, using n, m = 1000, 4 (lots of rows and a tiny patch to generate lots of windows):

            \n
            In [77]: %timeit orig(df.copy(), patch.copy())\n1 loops, best of 3: 3.56 s per loop\n\nIn [78]: %timeit using_numpy(df.copy(), patch.copy())\n1000 loops, best of 3: 1.35 ms per loop\n
            \n

            -- a 2600x speedup.

            \n soup wrap:

            Clearly, the copious use of reset_index is a signal that we are fighting with Panda's indexing and automatic alignment. Oh, how much easier things would be if we could just forget about the index! Indeed, that is what NumPy is for. (Generally speaking, use Pandas when you need alignment or grouping by index, use NumPy when doing computation on N-dimensional arrays.)

            Using NumPy will make the computation much faster because we will be able to remove the for-loop and handle all the computations done in the for-loop as one computation done on a NumPy array of rolling windows.

            We can look inside pandas/core/frame.py's DataFrame.corrwith to see how the computation is done. Then translate it into corresponding code done on NumPy arrays, making adjustments as necessary for the fact that we want to do the computations on a whole array full of rolling windows instead of just one window at a time, while keeping patch constant. (Note: the Pandas corrwith method handles NaNs. To keep the code a bit simpler I've assumed there are no NaNs in the inputs.)

            import numpy as np
            import pandas as pd
            from pandas import Series
            from pandas import DataFrame
            import numpy.lib.stride_tricks as stride
            np.random.seed(1)
            
            n = 10
            rng = pd.date_range('1/1/2000 00:00:00', periods=n, freq='5min')
            df = DataFrame(np.random.rand(n, 1), columns=['a'], index=rng)
            
            m = 4
            rng = pd.date_range('1/1/2000 00:10:00', periods=m, freq='5min')
            patch = DataFrame(np.arange(m), columns=['a'], index=rng)
            
            def orig(df, patch):
                patch.reset_index(inplace=True, drop=True)
            
                df['corr'] = np.nan
            
                for i in range(df.shape[0]):
                    window = df[i : i+patch.shape[0]]
                    if window.shape[0] != patch.shape[0] :
                        break
                    else:
                        window.reset_index(inplace=True, drop=True)
                        corr = window.corrwith(patch)
            
                        df['corr'][i] = corr.a
            
                return df
            
            def using_numpy(df, patch):
                left = df['a'].values
                itemsize = left.itemsize
                left = stride.as_strided(left, shape=(n-m+1, m), strides = (itemsize, itemsize))
            
                right = patch['a'].values
            
                ldem = left - left.mean(axis=1)[:, None]
                rdem = right - right.mean()
            
                num = (ldem * rdem).sum(axis=1)
                dom = (m - 1) * np.sqrt(left.var(axis=1, ddof=1) * right.var(ddof=1))
                correl = num/dom
            
                df.ix[:len(correl), 'corr'] = correl
                return df
            
            expected = orig(df.copy(), patch.copy())
            result = using_numpy(df.copy(), patch.copy())
            
            print(expected)
            print(result)
            

            This confirms that the values values generated by orig and using_numpy are the same:

            assert np.allclose(expected['corr'].dropna(), result['corr'].dropna())
            

            Technical note:

            To create the array full of rolling windows in a memory-friendly manner, I used a striding trick I learned here.


            Here is a benchmark, using n, m = 1000, 4 (lots of rows and a tiny patch to generate lots of windows):

            In [77]: %timeit orig(df.copy(), patch.copy())
            1 loops, best of 3: 3.56 s per loop
            
            In [78]: %timeit using_numpy(df.copy(), patch.copy())
            1000 loops, best of 3: 1.35 ms per loop
            

            -- a 2600x speedup.

            qid & accept id: (27739381, 27739435) query: Regular expression to find a word after multiple spaces soup:

            You can use \s+ (match all whitspaces) or ' +' but as look-behind requires fixed-width pattern you need to put it outside the look-behind and use grouping also you can just use re.search:\n:

            \n
            >>> string = 'I love my           world of dreams'\n>>> print re.search (r'(?<=my)\s+([^ -.]*)', string).group(1)\nworld\n
            \n

            or

            \n
            >>> string = 'I love my           world of dreams'\n>>> print re.search (r'(?<=my) +([^ -.]*)', string).group(1)\nworld\n
            \n soup wrap:

            You can use \s+ (match all whitspaces) or ' +' but as look-behind requires fixed-width pattern you need to put it outside the look-behind and use grouping also you can just use re.search: :

            >>> string = 'I love my           world of dreams'
            >>> print re.search (r'(?<=my)\s+([^ -.]*)', string).group(1)
            world
            

            or

            >>> string = 'I love my           world of dreams'
            >>> print re.search (r'(?<=my) +([^ -.]*)', string).group(1)
            world
            
            qid & accept id: (27743031, 27743139) query: Secure MySQL login data in a Python client program soup:

            You can prevent injections just by parameterising arguments, for example:

            \n
            "SELECT * FROM Users WHERE name=\"".name."\";"\n
            \n

            Will read

            \n
            SELECT * FROM Users WHERE name="AlecTeal";\n
            \n

            But an be "injected" with:

            \n
            name="\" or UserType=\"Admin"\n
            \n

            Then it will read

            \n
            SELECT * FROM Users WHERE name="" or UserType="Admin";\n
            \n

            That's bad, you can prevent that with stuff like:

            \n
            SELECT * FROM Users WHERE name=?\n
            \n

            and binding your variables, then the SQL server doesn't actually parse any data from the user, it sees the ? and just reads from the parameters.

            \n

            You cannot hide the SQL statements yourself

            \n

            You can obscure them sure, but they'll be in your .pyo or .py code somewhere!

            \n

            You have two options:

            \n

            MySQL supports users - and this is what users are for and assume "Only fairly trustworthy people will get my program" so you can have a user like Debbie_From_Accounts" who can select from theUserstable, and update/delete/insert/select fromFinancial` Tables say.

            \n

            OR!

            \n

            You can use some sort of API on the server and have like a set of PHP scripts that do the DB work and you just http get the pages.

            \n soup wrap:

            You can prevent injections just by parameterising arguments, for example:

            "SELECT * FROM Users WHERE name=\"".name."\";"
            

            Will read

            SELECT * FROM Users WHERE name="AlecTeal";
            

            But an be "injected" with:

            name="\" or UserType=\"Admin"
            

            Then it will read

            SELECT * FROM Users WHERE name="" or UserType="Admin";
            

            That's bad, you can prevent that with stuff like:

            SELECT * FROM Users WHERE name=?
            

            and binding your variables, then the SQL server doesn't actually parse any data from the user, it sees the ? and just reads from the parameters.

            You cannot hide the SQL statements yourself

            You can obscure them sure, but they'll be in your .pyo or .py code somewhere!

            You have two options:

            MySQL supports users - and this is what users are for and assume "Only fairly trustworthy people will get my program" so you can have a user like Debbie_From_Accounts" who can select from theUserstable, and update/delete/insert/select fromFinancial` Tables say.

            OR!

            You can use some sort of API on the server and have like a set of PHP scripts that do the DB work and you just http get the pages.

            qid & accept id: (27746297, 27746998) query: Detrend Flux Time Series with Non-Linear Trend soup:

            In a nutshell, you take the coefficients that polyfit returns and pass them to polyval to evaluate the polynomial at the observed "x" locations.

            \n

            As a stand-alone example, let's say we have something similar to the following:

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\n\nnum = 1000\nx = np.linspace(0, 10, num)\ny = np.exp(x)\n\n# Add some non-stationary noise that's hard to see without de-trending\nnoise = 100 * np.exp(0.2 * x) * np.random.normal(0, 1, num)\ny += noise\n\nfig, ax = plt.subplots()\nax.plot(x, y, 'ro')\nplt.show()\n
            \n

            enter image description here

            \n

            Note that I haven't used a polynomial function here to create y. That's deliberate. Otherwise, we'd get an exact fit and wouldn't need to "play around" with the order of the polynomial.

            \n

            Now let's try detrending it with a 2nd order polynomial function (note the 2 in the line model = np.polyfit(x, y, 2)):

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\n\nnum = 1000\nx = np.linspace(0, 10, num)\ny = np.exp(x)\n\n# Add some non-stationary noise that's hard to see without de-trending\nnoise = 100 * np.exp(0.2 * x) * np.random.normal(0, 1, num)\ny += noise\n\n# Detrend with a 2d order polynomial\nmodel = np.polyfit(x, y, 2)\npredicted = np.polyval(model, x)\n\nfig, axes = plt.subplots(nrows=2, sharex=True)\naxes[0].plot(x, y, 'ro')\naxes[0].plot(x, predicted, 'k-')\naxes[0].set(title='Original Data and 2nd Order Polynomial Trend')\n\naxes[1].plot(x, y - predicted, 'ro')\naxes[1].set(title='Detrended Residual')\n\nplt.show()\n
            \n

            enter image description here

            \n
            \n

            Notice that we didn't fit the data exactly. It's an exponential function and we're using a polynomial. However, as we increase the order of the polynomial, we'll fit the function more precisely (at the risk of starting to fit noise):

            \n

            enter image description here

            \n

            enter image description here

            \n

            enter image description here

            \n

            enter image description here

            \n soup wrap:

            In a nutshell, you take the coefficients that polyfit returns and pass them to polyval to evaluate the polynomial at the observed "x" locations.

            As a stand-alone example, let's say we have something similar to the following:

            import numpy as np
            import matplotlib.pyplot as plt
            
            num = 1000
            x = np.linspace(0, 10, num)
            y = np.exp(x)
            
            # Add some non-stationary noise that's hard to see without de-trending
            noise = 100 * np.exp(0.2 * x) * np.random.normal(0, 1, num)
            y += noise
            
            fig, ax = plt.subplots()
            ax.plot(x, y, 'ro')
            plt.show()
            

            enter image description here

            Note that I haven't used a polynomial function here to create y. That's deliberate. Otherwise, we'd get an exact fit and wouldn't need to "play around" with the order of the polynomial.

            Now let's try detrending it with a 2nd order polynomial function (note the 2 in the line model = np.polyfit(x, y, 2)):

            import numpy as np
            import matplotlib.pyplot as plt
            
            num = 1000
            x = np.linspace(0, 10, num)
            y = np.exp(x)
            
            # Add some non-stationary noise that's hard to see without de-trending
            noise = 100 * np.exp(0.2 * x) * np.random.normal(0, 1, num)
            y += noise
            
            # Detrend with a 2d order polynomial
            model = np.polyfit(x, y, 2)
            predicted = np.polyval(model, x)
            
            fig, axes = plt.subplots(nrows=2, sharex=True)
            axes[0].plot(x, y, 'ro')
            axes[0].plot(x, predicted, 'k-')
            axes[0].set(title='Original Data and 2nd Order Polynomial Trend')
            
            axes[1].plot(x, y - predicted, 'ro')
            axes[1].set(title='Detrended Residual')
            
            plt.show()
            

            enter image description here


            Notice that we didn't fit the data exactly. It's an exponential function and we're using a polynomial. However, as we increase the order of the polynomial, we'll fit the function more precisely (at the risk of starting to fit noise):

            enter image description here

            enter image description here

            enter image description here

            enter image description here

            qid & accept id: (27747712, 27748346) query: Attach callback to Counter() value soup:

            As I mentioned in comments, you'll need to override __setitem__, either in a new subclass of Counter that you create, or in a new from-scratch class that you create.

            \n

            However, your comment indicates you really want to override g['a'] += 1, which is more problematic. Unless the object g['a'] (not g) defines its own __iadd__ method (which integers don't), then g['a'] += 1 is the same as g['a'] = g['a'] + 1. This means that g never gets to see the 1 that was added; it only gets to see the new resulting value after 1 is added. In other words, if g['a'] is 2 and you do g['a'] += 1, g never sees a 1 at all; it only sees 3 (which is what you get after adding 1 to its current value).

            \n

            If you want to use the "incremental" difference to do your handling, you'll have to backfigure it yourself. Here's a simple example where setting a new value for a string key also changes the values for all its individual characters, incrementing them by the difference between the old and new values for the original key:

            \n
            class MagicCounter(collections.Counter):\n    def __setitem__(self, key, val):\n        # see how much is being "added"\n        diff = val - self[key]\n        super(MagicCounter, self).__setitem__(key, val)\n        if len(key) > 1:\n            for item in key:\n                self[item] += diff\n
            \n

            Then:

            \n
            >>> c = MagicCounter()\n>>> c['a'] = 1\n>>> c['b'] = 1\n>>> c['c'] = 1\n>>> c\nMagicCounter({'a': 1, 'c': 1, 'b': 1})\n>>> c['abc'] += 1\n>>> c\nMagicCounter({'a': 2, 'c': 2, 'b': 2, 'abc': 1})\n
            \n soup wrap:

            As I mentioned in comments, you'll need to override __setitem__, either in a new subclass of Counter that you create, or in a new from-scratch class that you create.

            However, your comment indicates you really want to override g['a'] += 1, which is more problematic. Unless the object g['a'] (not g) defines its own __iadd__ method (which integers don't), then g['a'] += 1 is the same as g['a'] = g['a'] + 1. This means that g never gets to see the 1 that was added; it only gets to see the new resulting value after 1 is added. In other words, if g['a'] is 2 and you do g['a'] += 1, g never sees a 1 at all; it only sees 3 (which is what you get after adding 1 to its current value).

            If you want to use the "incremental" difference to do your handling, you'll have to backfigure it yourself. Here's a simple example where setting a new value for a string key also changes the values for all its individual characters, incrementing them by the difference between the old and new values for the original key:

            class MagicCounter(collections.Counter):
                def __setitem__(self, key, val):
                    # see how much is being "added"
                    diff = val - self[key]
                    super(MagicCounter, self).__setitem__(key, val)
                    if len(key) > 1:
                        for item in key:
                            self[item] += diff
            

            Then:

            >>> c = MagicCounter()
            >>> c['a'] = 1
            >>> c['b'] = 1
            >>> c['c'] = 1
            >>> c
            MagicCounter({'a': 1, 'c': 1, 'b': 1})
            >>> c['abc'] += 1
            >>> c
            MagicCounter({'a': 2, 'c': 2, 'b': 2, 'abc': 1})
            
            qid & accept id: (27773141, 27773299) query: XML value Replacement in Python soup:

            You can check the "key" == 'Type A' / 'Type B' by using get method, like this:

            \n
            for node in tree.iterfind('.//logging/Adapter[@type="abcdef"]'):\n    for child in node:\n        # check if the key is 'Type A'\n        if child.get('key') == 'Type A':\n            child.set('value', 'false')\n        # ... if 'Type B' ...\n
            \n

            In fact, you can improve your code by using a better xpath accessing directly:

            \n
            for node in tree.iterfind('.//logging/Adapter[@type="abcdef"]/arg'):\n    # so you don't need another inner loop to access  elements\n    if node.get('key') == 'Type A':\n        node.set('value', 'false')\n    # ... if 'Type B' ...\n
            \n soup wrap:

            You can check the "key" == 'Type A' / 'Type B' by using get method, like this:

            for node in tree.iterfind('.//logging/Adapter[@type="abcdef"]'):
                for child in node:
                    # check if the key is 'Type A'
                    if child.get('key') == 'Type A':
                        child.set('value', 'false')
                    # ... if 'Type B' ...
            

            In fact, you can improve your code by using a better xpath accessing directly:

            for node in tree.iterfind('.//logging/Adapter[@type="abcdef"]/arg'):
                # so you don't need another inner loop to access  elements
                if node.get('key') == 'Type A':
                    node.set('value', 'false')
                # ... if 'Type B' ...
            
            qid & accept id: (27779615, 27779962) query: python - how to convert a nested list to a list of all individual sub-lists soup:

            You could return a list as the result at the current nesting level and join together the nested results using extend.

            \n
            l = [['A', ['A', 'B', ['A', 'B', 'C'], ['A', 'B', 'D']], ['A', 'D', ['A', 'D', 'A']], ['A', 'C', ['A', 'C', 'B'], ['A', 'C', 'A']], ['A', 'A', ['A', 'A', 'D']]]]\n\ndef un_nest(l):\n    r = []\n    k = []\n    for item in l:\n        if type(item) is list:\n            r.extend(un_nest(item))\n        else:\n            k.append(item)\n    if k:\n        r.insert(0, k)\n    return r\n\nprint(un_nest(l))\n
            \n

            outputs:

            \n
            [['A'], ['A', 'B'], ['A', 'B', 'C'], ['A', 'B', 'D'], ['A', 'D'], ['A', 'D', 'A'], ['A', 'C'], ['A', 'C', 'B'], ['A', 'C', 'A'], ['A', 'A'], ['A', 'A', 'D']]\n
            \n soup wrap:

            You could return a list as the result at the current nesting level and join together the nested results using extend.

            l = [['A', ['A', 'B', ['A', 'B', 'C'], ['A', 'B', 'D']], ['A', 'D', ['A', 'D', 'A']], ['A', 'C', ['A', 'C', 'B'], ['A', 'C', 'A']], ['A', 'A', ['A', 'A', 'D']]]]
            
            def un_nest(l):
                r = []
                k = []
                for item in l:
                    if type(item) is list:
                        r.extend(un_nest(item))
                    else:
                        k.append(item)
                if k:
                    r.insert(0, k)
                return r
            
            print(un_nest(l))
            

            outputs:

            [['A'], ['A', 'B'], ['A', 'B', 'C'], ['A', 'B', 'D'], ['A', 'D'], ['A', 'D', 'A'], ['A', 'C'], ['A', 'C', 'B'], ['A', 'C', 'A'], ['A', 'A'], ['A', 'A', 'D']]
            
            qid & accept id: (27781555, 27781814) query: Is it possible to pass the evaluated result of one template tag as a parameter to another tag? soup:

            use simple_tag

            \n
            @register.simple_tag(take_context=True)\ndef some_simple_tag(context, arg1, arg2):\n    # do some work\n    return 'string result for template to display'\n
            \n

            in your template:

            \n
            {% some_simple_tag something myobject.body %}\n
            \n soup wrap:

            use simple_tag

            @register.simple_tag(take_context=True)
            def some_simple_tag(context, arg1, arg2):
                # do some work
                return 'string result for template to display'
            

            in your template:

            {% some_simple_tag something myobject.body %}
            
            qid & accept id: (27793543, 27801039) query: Python BeautifulSoup Mix Matching items in Table soup:

            The problem is that you are not getting to the appropriate table.

            \n

            Rely on the chart element, get the next table sibling and find all rows inside:

            \n
            from bs4 import BeautifulSoup\nimport requests\n\nurl = 'http://www.boxofficemojo.com/movies/?page=daily&view=chart&id=hungergames3.htm'\n\nresponse = requests.get(url)\nsoup = BeautifulSoup(response.content)\n\nfor tr in soup.find('div', id='chart_container').find_next_sibling('table').find_all('tr')[1:]:\n    print [td.text for td in tr('td')]\n
            \n

            Prints:

            \n
            [u'Fri', u'Nov. 21, 2014', u'1', u'$55,139,942', u'-', u'-', u'4,151', u'$13,284', u'$55,139,942', u'1']\n[u'Sat', u'Nov. 22, 2014', u'1', u'$40,905,873', u'-25.8%', u'-', u'4,151', u'$9,854', u'$96,045,815', u'2']\n[u'Sun', u'Nov. 23, 2014', u'1', u'$25,851,819', u'-36.8%', u'-', u'4,151', u'$6,228', u'$121,897,634', u'3']\n[u'Mon', u'Nov. 24, 2014', u'1', u'$8,978,318', u'-65.3%', u'-', u'4,151', u'$2,163', u'$130,875,952', u'4']\n[u'Tue', u'Nov. 25, 2014', u'1', u'$12,131,853', u'+35.1%', u'-', u'4,151', u'$2,923', u'$143,007,805', u'5']\n[u'Wed', u'Nov. 26, 2014', u'1', u'$14,620,517', u'+20.5%', u'-', u'4,151', u'$3,522', u'$157,628,322', u'6']\n[u'Thu', u'Nov. 27, 2014', u'1', u'$11,079,983', u'-24.2%', u'-', u'4,151', u'$2,669', u'$168,708,305', u'7']\n[u'']\n[u'Fri', u'Nov. 28, 2014', u'1', u'$24,199,442', u'+118.4%', u'-56.1%', u'4,151', u'$5,830', u'$192,907,747', u'8']\n[u'Sat', u'Nov. 29, 2014', u'1', u'$21,992,225', u'-9.1%', u'-46.2%', u'4,151', u'$5,298', u'$214,899,972', u'9']\n[u'Sun', u'Nov. 30, 2014', u'1', u'$10,780,932', u'-51.0%', u'-58.3%', u'4,151', u'$2,597', u'$225,680,904', u'10']\n[u'Mon', u'Dec. 1, 2014', u'1', u'$2,635,435', u'-75.6%', u'-70.6%', u'4,151', u'$635', u'$228,316,339', u'11']\n[u'Tue', u'Dec. 2, 2014', u'1', u'$3,160,145', u'+19.9%', u'-74.0%', u'4,151', u'$761', u'$231,476,484', u'12']\n[u'Wed', u'Dec. 3, 2014', u'1', u'$2,332,453', u'-26.2%', u'-84.0%', u'4,151', u'$562', u'$233,808,937', u'13']\n[u'Thu', u'Dec. 4, 2014', u'1', u'$2,317,894', u'-0.6%', u'-79.1%', u'4,151', u'$558', u'$236,126,831', u'14']\n...\n
            \n soup wrap:

            The problem is that you are not getting to the appropriate table.

            Rely on the chart element, get the next table sibling and find all rows inside:

            from bs4 import BeautifulSoup
            import requests
            
            url = 'http://www.boxofficemojo.com/movies/?page=daily&view=chart&id=hungergames3.htm'
            
            response = requests.get(url)
            soup = BeautifulSoup(response.content)
            
            for tr in soup.find('div', id='chart_container').find_next_sibling('table').find_all('tr')[1:]:
                print [td.text for td in tr('td')]
            

            Prints:

            [u'Fri', u'Nov. 21, 2014', u'1', u'$55,139,942', u'-', u'-', u'4,151', u'$13,284', u'$55,139,942', u'1']
            [u'Sat', u'Nov. 22, 2014', u'1', u'$40,905,873', u'-25.8%', u'-', u'4,151', u'$9,854', u'$96,045,815', u'2']
            [u'Sun', u'Nov. 23, 2014', u'1', u'$25,851,819', u'-36.8%', u'-', u'4,151', u'$6,228', u'$121,897,634', u'3']
            [u'Mon', u'Nov. 24, 2014', u'1', u'$8,978,318', u'-65.3%', u'-', u'4,151', u'$2,163', u'$130,875,952', u'4']
            [u'Tue', u'Nov. 25, 2014', u'1', u'$12,131,853', u'+35.1%', u'-', u'4,151', u'$2,923', u'$143,007,805', u'5']
            [u'Wed', u'Nov. 26, 2014', u'1', u'$14,620,517', u'+20.5%', u'-', u'4,151', u'$3,522', u'$157,628,322', u'6']
            [u'Thu', u'Nov. 27, 2014', u'1', u'$11,079,983', u'-24.2%', u'-', u'4,151', u'$2,669', u'$168,708,305', u'7']
            [u'']
            [u'Fri', u'Nov. 28, 2014', u'1', u'$24,199,442', u'+118.4%', u'-56.1%', u'4,151', u'$5,830', u'$192,907,747', u'8']
            [u'Sat', u'Nov. 29, 2014', u'1', u'$21,992,225', u'-9.1%', u'-46.2%', u'4,151', u'$5,298', u'$214,899,972', u'9']
            [u'Sun', u'Nov. 30, 2014', u'1', u'$10,780,932', u'-51.0%', u'-58.3%', u'4,151', u'$2,597', u'$225,680,904', u'10']
            [u'Mon', u'Dec. 1, 2014', u'1', u'$2,635,435', u'-75.6%', u'-70.6%', u'4,151', u'$635', u'$228,316,339', u'11']
            [u'Tue', u'Dec. 2, 2014', u'1', u'$3,160,145', u'+19.9%', u'-74.0%', u'4,151', u'$761', u'$231,476,484', u'12']
            [u'Wed', u'Dec. 3, 2014', u'1', u'$2,332,453', u'-26.2%', u'-84.0%', u'4,151', u'$562', u'$233,808,937', u'13']
            [u'Thu', u'Dec. 4, 2014', u'1', u'$2,317,894', u'-0.6%', u'-79.1%', u'4,151', u'$558', u'$236,126,831', u'14']
            ...
            
            qid & accept id: (27805919, 27805988) query: How to only read lines in a text file after a certain string using python? soup:

            just start another loop when you reach the line you want to start from :

            \n
            for files in filepath:\n    with open(files, 'r') as f:\n        for line in f:\n            if 'Abstract' in line:                \n                for line in f: # now you are at the lines you want\n                    # do work\n
            \n

            A file object is it's own iterator, so when we reach the line with Abstract in it we continue our iteration from that line until we have consumed the iterator.

            \n

            A simple example:

            \n
            gen  =  (n for n in xrange(8))\n\nfor x in gen:\n    if x == 3:\n        print("starting second loop")\n        for x in gen:\n            print("In second loop",x)\n    else:\n        print("In first loop", x)\n\nIn first loop 0\nIn first loop 1\nIn first loop 2\nstarting second loop\nIn second loop 4\nIn second loop 5\nIn second loop 6\nIn second loop 7\n
            \n

            You can also use itertools.dropwhile to consume the lines up to the point you want.

            \n
            from itertools import dropwhile\n\nfor files in filepath:\n    with open(files, 'r') as f:\n        dropped = dropwhile(lambda _line: "Abstract" not in _line, f)\n        next(dropped,"")\n        for line in dropped:\n                print(line)\n
            \n soup wrap:

            just start another loop when you reach the line you want to start from :

            for files in filepath:
                with open(files, 'r') as f:
                    for line in f:
                        if 'Abstract' in line:                
                            for line in f: # now you are at the lines you want
                                # do work
            

            A file object is it's own iterator, so when we reach the line with Abstract in it we continue our iteration from that line until we have consumed the iterator.

            A simple example:

            gen  =  (n for n in xrange(8))
            
            for x in gen:
                if x == 3:
                    print("starting second loop")
                    for x in gen:
                        print("In second loop",x)
                else:
                    print("In first loop", x)
            
            In first loop 0
            In first loop 1
            In first loop 2
            starting second loop
            In second loop 4
            In second loop 5
            In second loop 6
            In second loop 7
            

            You can also use itertools.dropwhile to consume the lines up to the point you want.

            from itertools import dropwhile
            
            for files in filepath:
                with open(files, 'r') as f:
                    dropped = dropwhile(lambda _line: "Abstract" not in _line, f)
                    next(dropped,"")
                    for line in dropped:
                            print(line)
            
            qid & accept id: (27810523, 27810889) query: sqlalchemy - elegant way to deal with several optional filters? soup:

            Code perfectly equivalent to the one you've shown is:

            \n
            def get_query_results(*filters):\n    res = models.Item.query\n    for i, filt in enumerate(filters, 1):\n        if filt is not None:\n            d = {'filter{}'.format(i): filt}\n            res = res.filter(**d)\n    return res.all()\n
            \n

            I'm not quite sure why you need the named argument to res.filter to be specifically filter1, filter2, etc, but this snippet will do it without the repetitious pattern that you understandably want to avoid.

            \n

            Should the names not actually be filter1, filter2, etc, that's OK as long as the required names are known:

            \n
            NAMES = 'foo bar baz bat'.split()\n\ndef get_query_results(*filters):\n    res = models.Item.query\n    for name, filt in zip(NAMES, filters):\n        if filt is not None:\n            d = {name: filt}\n            res = res.filter(**d)\n    return res.all()\n
            \n

            This variant would work in this case.

            \n soup wrap:

            Code perfectly equivalent to the one you've shown is:

            def get_query_results(*filters):
                res = models.Item.query
                for i, filt in enumerate(filters, 1):
                    if filt is not None:
                        d = {'filter{}'.format(i): filt}
                        res = res.filter(**d)
                return res.all()
            

            I'm not quite sure why you need the named argument to res.filter to be specifically filter1, filter2, etc, but this snippet will do it without the repetitious pattern that you understandably want to avoid.

            Should the names not actually be filter1, filter2, etc, that's OK as long as the required names are known:

            NAMES = 'foo bar baz bat'.split()
            
            def get_query_results(*filters):
                res = models.Item.query
                for name, filt in zip(NAMES, filters):
                    if filt is not None:
                        d = {name: filt}
                        res = res.filter(**d)
                return res.all()
            

            This variant would work in this case.

            qid & accept id: (27823447, 27823872) query: python - sorting a list of lists by a key that's substring of each element soup:

            You need to define the key for comparing 2 elements:

            \n
            import time\ndef key(item):\n    return time.strptime(item[0][-16:], "%d/%m/%y à %H:%M")\n
            \n

            Then sort it:

            \n
            print sorted(my_list,key=key)\n
            \n soup wrap:

            You need to define the key for comparing 2 elements:

            import time
            def key(item):
                return time.strptime(item[0][-16:], "%d/%m/%y à %H:%M")
            

            Then sort it:

            print sorted(my_list,key=key)
            
            qid & accept id: (27827982, 27828679) query: how to dynamically read a specific cell value in a table using selenium and python soup:
            \n

            I do not want to hard-code the xpath of the column that has value "0"

            \n
            \n
            from selenium import webdriver\nimport re\n\ndriver = webdriver.PhantomJS()\ndriver.set_window_size(1120, 550) #For bug\ndriver.get("http://localhost:8000")\n\npattern = r"""\n    \s*         #Match whitespace, 0 or more times, followed by...\n    (\d+)       #a digit, one or more times, captured, followed by\n    \s*         #whitespace, 0 or more times, followed by...\n    [|]         #vertical bar, followed by...\n    \s*         #whitespace, 0 or more times, followed by...\n    \d+         #a digit, one or more times\n"""\nregex = re.compile(pattern, re.X)\n\ntable = driver.find_element_by_id('ambassadors-for-assignment')\ntrs = table.find_elements_by_tag_name('tr')\n\nfor tr in trs:\n    tds = tr.find_elements_by_tag_name('td')\n\n    for td in tds:\n        match_obj = re.search(regex, text)\n\n        if match_obj and match_obj.group(1) == '0':\n            success_button = tr.find_element_by_css_selector('button.btn-success')\n            print success_button.get_attribute('type')\n            success_button.click()\n
            \n

            re.match(pattern, string, flags=0)
            \nIf zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.

            \n

            Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.

            \n

            If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).

            \n

            https://docs.python.org/3/library/re.html#module-re

            \n

            ======

            \n

            Here it is with xpath, and I think it better matches what you are trying to do, i.e. given a column, look down the rows for the value 0:

            \n
            from selenium import webdriver\nimport re\n\ndriver = webdriver.PhantomJS()\ndriver.set_window_size(1120, 550) #For bug\ndriver.get("http://localhost:8000")\n\npattern = r""" \n    \s*         #Match whitespace, 0 or more times, followed by...\n    (\d+)       #a digit, one or more times, captured, followed by\n    \s*         #whitespace, 0 or more times, followed by...\n    [|]         #vertical bar, followed by...\n    \s*         #whitespace, 0 or more times, followed by...\n    \d+         #a digit, one or more times\n"""\nregex = re.compile(pattern, re.X)\n\ntrs = driver.find_elements_by_xpath('//table[@id="ambassadors-for-assignment"]/tbody/tr')\ntarget_columns = [3, 4]\n\nfor target_column in target_columns:\n    for tr in trs:\n        target_column_xpath = './td[{}]'.format(target_column)  #VARY COLUMN HERE ***\n        td = tr.find_element_by_xpath(target_column_xpath)\n        match_obj = re.match(regex, td.text)\n\n        if match_obj and match_obj.group(1) == '0':\n            button_xpath = './/button[contains(concat(" ", normalize-space(@class), " "), " btn-success ")]' \n            success_button = tr.find_element_by_xpath(button_xpath)\n            #success_button.click()\n\n            print "column {}:".format(target_column)\n            print match_obj.group(0)\n            print success_button.get_attribute('class')\n            print\n
            \n

            The output will look like the following, depending on what text you are trying to match with the regex:

            \n
            column 3:\n0 | 5\nbtn btn-success\n\ncolumn 4:\n0 | 61\nbtn btn-success\n
            \n

            But in my opinion, having to use the following in an xpath:

            \n
            '[contains(concat(" ", normalize-space(@class), " "), " btn-success ")]'\n
            \n

            to match a class, means that using xpath is NOT the way to do it. The python method:

            \n
            find_element_by_csss_selector('button.btn-success')\n
            \n

            ...will do the same thing more succinctly and clearly.

            \n soup wrap:

            I do not want to hard-code the xpath of the column that has value "0"

            from selenium import webdriver
            import re
            
            driver = webdriver.PhantomJS()
            driver.set_window_size(1120, 550) #For bug
            driver.get("http://localhost:8000")
            
            pattern = r"""
                \s*         #Match whitespace, 0 or more times, followed by...
                (\d+)       #a digit, one or more times, captured, followed by
                \s*         #whitespace, 0 or more times, followed by...
                [|]         #vertical bar, followed by...
                \s*         #whitespace, 0 or more times, followed by...
                \d+         #a digit, one or more times
            """
            regex = re.compile(pattern, re.X)
            
            table = driver.find_element_by_id('ambassadors-for-assignment')
            trs = table.find_elements_by_tag_name('tr')
            
            for tr in trs:
                tds = tr.find_elements_by_tag_name('td')
            
                for td in tds:
                    match_obj = re.search(regex, text)
            
                    if match_obj and match_obj.group(1) == '0':
                        success_button = tr.find_element_by_css_selector('button.btn-success')
                        print success_button.get_attribute('type')
                        success_button.click()
            

            re.match(pattern, string, flags=0)
            If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.

            Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.

            If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).

            https://docs.python.org/3/library/re.html#module-re

            ======

            Here it is with xpath, and I think it better matches what you are trying to do, i.e. given a column, look down the rows for the value 0:

            from selenium import webdriver
            import re
            
            driver = webdriver.PhantomJS()
            driver.set_window_size(1120, 550) #For bug
            driver.get("http://localhost:8000")
            
            pattern = r""" 
                \s*         #Match whitespace, 0 or more times, followed by...
                (\d+)       #a digit, one or more times, captured, followed by
                \s*         #whitespace, 0 or more times, followed by...
                [|]         #vertical bar, followed by...
                \s*         #whitespace, 0 or more times, followed by...
                \d+         #a digit, one or more times
            """
            regex = re.compile(pattern, re.X)
            
            trs = driver.find_elements_by_xpath('//table[@id="ambassadors-for-assignment"]/tbody/tr')
            target_columns = [3, 4]
            
            for target_column in target_columns:
                for tr in trs:
                    target_column_xpath = './td[{}]'.format(target_column)  #VARY COLUMN HERE ***
                    td = tr.find_element_by_xpath(target_column_xpath)
                    match_obj = re.match(regex, td.text)
            
                    if match_obj and match_obj.group(1) == '0':
                        button_xpath = './/button[contains(concat(" ", normalize-space(@class), " "), " btn-success ")]' 
                        success_button = tr.find_element_by_xpath(button_xpath)
                        #success_button.click()
            
                        print "column {}:".format(target_column)
                        print match_obj.group(0)
                        print success_button.get_attribute('class')
                        print
            

            The output will look like the following, depending on what text you are trying to match with the regex:

            column 3:
            0 | 5
            btn btn-success
            
            column 4:
            0 | 61
            btn btn-success
            

            But in my opinion, having to use the following in an xpath:

            '[contains(concat(" ", normalize-space(@class), " "), " btn-success ")]'
            

            to match a class, means that using xpath is NOT the way to do it. The python method:

            find_element_by_csss_selector('button.btn-success')
            

            ...will do the same thing more succinctly and clearly.

            qid & accept id: (27855146, 27855332) query: remove the unicode from the output of JSON using scrapy soup:

            There are hidden elements that you don't see in the browser. Scrapy sees them.

            \n

            You just need to search for the data inside the relevant part of the page (div with id="siteTable"):

            \n
            def parse(self, response):\n    # make a selector and search the fields inside it\n    sel = response.xpath('//div[@id="siteTable"]')\n\n    item = ExItem()\n    item["title"] = sel.xpath('.//p[contains(@class,"title")]/a/text()').extract()\n    item["rank"] = sel.xpath('.//span[contains(@class,"rank")]/text()').extract()\n    item["votes_dislike"] = sel.xpath('.//div[contains(@class,"score dislikes")]/text()').extract()\n    item["votes_unvoted"] = sel.xpath('.//div[contains(@class,"score unvoted")]/text()').extract()\n    item["votes_likes"] = sel.xpath('.//div[contains(@class,"score likes")]/text()').extract()\n    item["video_reference"] = sel.xpath('.//a[contains(@class,"thumbnail may-blank")]/@href').extract()\n    item["image"] = sel.xpath('.//a[contains(@class,"thumbnail may-blank")]/img/@src').extract()\n    return item\n
            \n

            Tested, here is what I get for, for example, votes_likes:

            \n
             'votes_likes': [u'5340',\n                 u'4041',\n                 u'4080',\n                 u'5055',\n                 u'4385',\n                 u'4784',\n                 u'3842',\n                 u'3734',\n                 u'4081',\n                 u'3731',\n                 u'4580',\n                 u'5279',\n                 u'2540',\n                 u'4345',\n                 u'2068',\n                 u'3715',\n                 u'3249',\n                 u'4232',\n                 u'4025',\n                 u'522',\n                 u'2993',\n                 u'2789',\n                 u'3529',\n                 u'3450',\n                 u'3533'],\n
            \n soup wrap:

            There are hidden elements that you don't see in the browser. Scrapy sees them.

            You just need to search for the data inside the relevant part of the page (div with id="siteTable"):

            def parse(self, response):
                # make a selector and search the fields inside it
                sel = response.xpath('//div[@id="siteTable"]')
            
                item = ExItem()
                item["title"] = sel.xpath('.//p[contains(@class,"title")]/a/text()').extract()
                item["rank"] = sel.xpath('.//span[contains(@class,"rank")]/text()').extract()
                item["votes_dislike"] = sel.xpath('.//div[contains(@class,"score dislikes")]/text()').extract()
                item["votes_unvoted"] = sel.xpath('.//div[contains(@class,"score unvoted")]/text()').extract()
                item["votes_likes"] = sel.xpath('.//div[contains(@class,"score likes")]/text()').extract()
                item["video_reference"] = sel.xpath('.//a[contains(@class,"thumbnail may-blank")]/@href').extract()
                item["image"] = sel.xpath('.//a[contains(@class,"thumbnail may-blank")]/img/@src').extract()
                return item
            

            Tested, here is what I get for, for example, votes_likes:

             'votes_likes': [u'5340',
                             u'4041',
                             u'4080',
                             u'5055',
                             u'4385',
                             u'4784',
                             u'3842',
                             u'3734',
                             u'4081',
                             u'3731',
                             u'4580',
                             u'5279',
                             u'2540',
                             u'4345',
                             u'2068',
                             u'3715',
                             u'3249',
                             u'4232',
                             u'4025',
                             u'522',
                             u'2993',
                             u'2789',
                             u'3529',
                             u'3450',
                             u'3533'],
            
            qid & accept id: (27857842, 27859536) query: Python, use "order by" inside a "group concat" with pandas DataFrame soup:

            There's no group concat function in python / pandas, so we'll have to use some groupby. It's a bit longer than SQL, but still relatively short (main part is 3 lines).

            \n

            Let's create the dataframe :

            \n
            import pandas as pd\n\ndata = {'product_id': [23, 65, 66, 98, 998, 798],\n        'category': ['cat1', 'cat2', 'cat1', 'cat1', 'cat1', 'cat2'],\n        'number_of_purchase': [18,19,4,9,1,8]}\n\ndf = pd.DataFrame(data)\nprint df\n
            \n

            result :

            \n
              category  number_of_purchase  product_id\n0     cat1                  18          23\n1     cat2                  19          65\n2     cat1                   4          66\n3     cat1                   9          98\n4     cat1                   1         998\n5     cat2                   8         798\n
            \n

            First step : we sort the dataframe by sales :

            \n
            df = df.sort(columns='number_of_purchase', ascending=False)\ndf\n
            \n

            result :

            \n
              category  number_of_purchase  product_id\n1     cat2                  19          65\n0     cat1                  18          23\n3     cat1                   9          98\n5     cat2                   8         798\n2     cat1                   4          66\n4     cat1                   1         998\n
            \n

            Seconde step : We use a groupby operation.For each category, it will create a list of the top two categories. Data is still integer.

            \n
            df = df.groupby('category').apply(lambda x: list(x.product_id)[:2])\nprint df\n
            \n

            result :

            \n
            category\ncat1         [23, 98]\ncat2        [65, 798]\ndtype: object\n
            \n

            If you need to have the result as a string, we use a simple lambda operation :

            \n
            df.apply(lambda x: '&'.join([str(elem) for elem in x]))\n
            \n

            result :

            \n
            category\ncat1         23&98\ncat2        65&798\ndtype: object\n
            \n soup wrap:

            There's no group concat function in python / pandas, so we'll have to use some groupby. It's a bit longer than SQL, but still relatively short (main part is 3 lines).

            Let's create the dataframe :

            import pandas as pd
            
            data = {'product_id': [23, 65, 66, 98, 998, 798],
                    'category': ['cat1', 'cat2', 'cat1', 'cat1', 'cat1', 'cat2'],
                    'number_of_purchase': [18,19,4,9,1,8]}
            
            df = pd.DataFrame(data)
            print df
            

            result :

              category  number_of_purchase  product_id
            0     cat1                  18          23
            1     cat2                  19          65
            2     cat1                   4          66
            3     cat1                   9          98
            4     cat1                   1         998
            5     cat2                   8         798
            

            First step : we sort the dataframe by sales :

            df = df.sort(columns='number_of_purchase', ascending=False)
            df
            

            result :

              category  number_of_purchase  product_id
            1     cat2                  19          65
            0     cat1                  18          23
            3     cat1                   9          98
            5     cat2                   8         798
            2     cat1                   4          66
            4     cat1                   1         998
            

            Seconde step : We use a groupby operation.For each category, it will create a list of the top two categories. Data is still integer.

            df = df.groupby('category').apply(lambda x: list(x.product_id)[:2])
            print df
            

            result :

            category
            cat1         [23, 98]
            cat2        [65, 798]
            dtype: object
            

            If you need to have the result as a string, we use a simple lambda operation :

            df.apply(lambda x: '&'.join([str(elem) for elem in x]))
            

            result :

            category
            cat1         23&98
            cat2        65&798
            dtype: object
            
            qid & accept id: (27884051, 27884282) query: How do I run python file without path? soup:

            I'm assuming you're running Python on Windows (the '\' backslash is my only clue). If so, I think you've got at least one reasonable option.

            \n

            Create a python_run.bat file similar to this:

            \n
            @ECHO OFF\n\nREM *** MODIFY THE NEXT LINE TO SPECIFY THE LOCATION OF YOUR SCRIPTS ***\nSET SCRIPT_DIR=C:\Path\To\Scripts\n\nREM *** MODIFY THE NEXT LINE TO SPECIFY THE LOCATION OF YOUR PYTHON.EXE ***\nSET PYTHON_BIN=C:\Python27\python.exe\n\nPUSHD %SCRIPT_DIR%\n%PYTHON_BIN% %*\nPOPD\n
            \n

            Then make sure the folder where the python_run.bat is located is in your PATH environment variable. So if the script lives in C:\Path\To\Scripts\python_run.bat, you'd make sure your PATH environment variable had C:\Path\To\Scripts in it.

            \n

            Then you simply have to type the following to execute any script located in your SCRIPT_DIR.

            \n
            python_run my_cool_script.py --foo=bar\n
            \n

            And it will result in running the following command as if you were already inside your scripts folder:

            \n
            C:\Python27\python.exe my_cool_script.py --foo=bar\n
            \n soup wrap:

            I'm assuming you're running Python on Windows (the '\' backslash is my only clue). If so, I think you've got at least one reasonable option.

            Create a python_run.bat file similar to this:

            @ECHO OFF
            
            REM *** MODIFY THE NEXT LINE TO SPECIFY THE LOCATION OF YOUR SCRIPTS ***
            SET SCRIPT_DIR=C:\Path\To\Scripts
            
            REM *** MODIFY THE NEXT LINE TO SPECIFY THE LOCATION OF YOUR PYTHON.EXE ***
            SET PYTHON_BIN=C:\Python27\python.exe
            
            PUSHD %SCRIPT_DIR%
            %PYTHON_BIN% %*
            POPD
            

            Then make sure the folder where the python_run.bat is located is in your PATH environment variable. So if the script lives in C:\Path\To\Scripts\python_run.bat, you'd make sure your PATH environment variable had C:\Path\To\Scripts in it.

            Then you simply have to type the following to execute any script located in your SCRIPT_DIR.

            python_run my_cool_script.py --foo=bar
            

            And it will result in running the following command as if you were already inside your scripts folder:

            C:\Python27\python.exe my_cool_script.py --foo=bar
            
            qid & accept id: (27885989, 27886020) query: How to convert a list of datetime.datetime objects to date in Python? soup:

            You can use strftime :

            \n
            \n

            Return a string representing the date and time, controlled by an explicit format string:

            \n
            \n
            >>> l=('hostzi.com', [datetime.datetime(2009, 5, 12, 13, 4, 12)])\n>>> l[1][0].strftime('%Y/%m/%d')\n'2009/05/12'\n
            \n

            Also you can do it directly on your main code :

            \n
            f = open (file,'r')\nwith open (output,'wt') as m:\n    for line in f:\n        line = line.strip('\n')\n        domain = line.split(';')\n        try:\n            w = pythonwhois.get_whois(domain)\n            c_date = (w['creation_date'])\n            print (domain,c_date[0].strftime('%Y/%m/%d'))\n\n        except:\n            pass\n
            \n soup wrap:

            You can use strftime :

            Return a string representing the date and time, controlled by an explicit format string:

            >>> l=('hostzi.com', [datetime.datetime(2009, 5, 12, 13, 4, 12)])
            >>> l[1][0].strftime('%Y/%m/%d')
            '2009/05/12'
            

            Also you can do it directly on your main code :

            f = open (file,'r')
            with open (output,'wt') as m:
                for line in f:
                    line = line.strip('\n')
                    domain = line.split(';')
                    try:
                        w = pythonwhois.get_whois(domain)
                        c_date = (w['creation_date'])
                        print (domain,c_date[0].strftime('%Y/%m/%d'))
            
                    except:
                        pass
            
            qid & accept id: (27887545, 27887769) query: Python - how to ignore escape chars in regexp soup:

            I suggest you to add a negative lookbehind assertion.

            \n
            (STR\()"(.+?)(?
            \n

            DEMO

            \n

            Example:

            \n
            >>> s1 = r'STR("")'\n>>> s2 = r'STR("test \") string")'\n>>> re.findall(r'STR\("(.+?)(?']\n>>> re.findall(r'STR\("(.+?)(?
            \n

            (? Negative lookbehind assertion based pattern would asserts that the double quotes won't be preceded by a backslash character.

            \n

            OR

            \n
            STR\("((?:\\"|[^"])*)"\)\n
            \n

            DEMO

            \n soup wrap:

            I suggest you to add a negative lookbehind assertion.

            (STR\()"(.+?)(?

            DEMO

            Example:

            >>> s1 = r'STR("")'
            >>> s2 = r'STR("test \") string")'
            >>> re.findall(r'STR\("(.+?)(?']
            >>> re.findall(r'STR\("(.+?)(?

            (? Negative lookbehind assertion based pattern would asserts that the double quotes won't be preceded by a backslash character.

            OR

            STR\("((?:\\"|[^"])*)"\)
            

            DEMO

            qid & accept id: (27914360, 27916313) query: Python pandas idxmax for multiple indexes in a dataframe soup:

            Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)

            \n

            I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.

            \n

            Setting up data :

            \n
            import pandas as pd\nd= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27',\n             '2007-04-27', '2007-04-28', '2007-04-28'], \n        'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],\n        'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}\n\ndf = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')\nprint df\n
            \n

            output

            \n
                        DeliveryCount  DeliveryNb\nDate                                 \n2007-04-26             23         706\n2007-04-27             10         705\n2007-04-27           1089         708\n2007-04-27             82         450\n2007-04-27             34         283\n2007-04-28            100          45\n2007-04-28             11          89\n
            \n

            creating custom function :

            \n

            The trick is to use the reset_index() method (so you easily get the integer index of the group)

            \n
            def func(df):\n    idx = df.reset_index()['DeliveryCount'].idxmax()\n    return df['DeliveryNb'].iloc[idx]\n
            \n

            applying it :

            \n
            g = df.groupby(df.index)\ng.apply(func)\n
            \n

            result :

            \n
            Date\n2007-04-26    706\n2007-04-27    708\n2007-04-28     45\ndtype: int64\n
            \n soup wrap:

            Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)

            I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.

            Setting up data :

            import pandas as pd
            d= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27',
                         '2007-04-27', '2007-04-28', '2007-04-28'], 
                    'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],
                    'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}
            
            df = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')
            print df
            

            output

                        DeliveryCount  DeliveryNb
            Date                                 
            2007-04-26             23         706
            2007-04-27             10         705
            2007-04-27           1089         708
            2007-04-27             82         450
            2007-04-27             34         283
            2007-04-28            100          45
            2007-04-28             11          89
            

            creating custom function :

            The trick is to use the reset_index() method (so you easily get the integer index of the group)

            def func(df):
                idx = df.reset_index()['DeliveryCount'].idxmax()
                return df['DeliveryNb'].iloc[idx]
            

            applying it :

            g = df.groupby(df.index)
            g.apply(func)
            

            result :

            Date
            2007-04-26    706
            2007-04-27    708
            2007-04-28     45
            dtype: int64
            
            qid & accept id: (27925861, 27928436) query: python pandas filter dataframe by another series, multiple columns soup:

            The tricky part is merging the two series/dataframes that have indexes with different datetime resolutions. Once you combine them intelligently, you can just filter normally.

            \n
            # Make sure your series has a name\n# Make sure the index is pure dates, not date 00:00:00\nmost_liquid_contracts.name = 'most'\nmost_liquid_conttracts.index = most_liquid_contracts.index.date\n\ndata = df\ndata['day'] = data.index.date\ncombined = data.join(most_liquid_contracts, on='day', how='left')\n
            \n

            Now you can do something like

            \n
            combined[combined.delivery == combined.most]\n
            \n

            This will yield the rows in data (df) where data.delivery is equal to the value in most_liquid_contracts for that day.

            \n soup wrap:

            The tricky part is merging the two series/dataframes that have indexes with different datetime resolutions. Once you combine them intelligently, you can just filter normally.

            # Make sure your series has a name
            # Make sure the index is pure dates, not date 00:00:00
            most_liquid_contracts.name = 'most'
            most_liquid_conttracts.index = most_liquid_contracts.index.date
            
            data = df
            data['day'] = data.index.date
            combined = data.join(most_liquid_contracts, on='day', how='left')
            

            Now you can do something like

            combined[combined.delivery == combined.most]
            

            This will yield the rows in data (df) where data.delivery is equal to the value in most_liquid_contracts for that day.

            qid & accept id: (27947419, 27955197) query: How can I assign scores to a list of datapoints and then output values > 2 standard deviations from the mean in python? soup:

            As soon as you talk of means and standard deviations for lots of data, you should start using any of the numerical libraries. Consider using numpy, or even pandas (for readability) here. I'll be using them in this example, together with the Counter object from the collections module. Read up on both to see how they work, but I'll explain a bit throughout the code as well.

            \n
            import numpy as np\nfrom collections import Counter    \n\nnucleotid_bases = ('C', 'A', 'T', 'G', '.')\nresults = []\nchecksum = []\nwith open('datafile.txt') as f:\n    for line in f:\n        fields = line.split()  # splits by consecutive whitespace, empty records will be purged\n        chrom, pos = [int(fields[x]) for x in (0,1)]\n        results.append([chrom,pos])  # start by building the current record\n        allele1, allele2 = [fields[i] for i in (3,4)]\n        checksum.append([allele1, allele2])  # you wanted to keep these, most likely for debugging purposes?\n        popA = fields[3:26]  # population size: 2*23\n        popB = fields[26:36]  # population size: 2*10\n        for population in (popA, popB):\n            summary = Counter(population) # traverses the line only once - much more efficient!\n            base_counts = [ sum(summary[k] for k in summary.keys() if base in k) for base in nucleotid_bases]\n            for index, base_name in enumerate(nucleotid_bases):\n                # Double the count when there is an exact match, e.g. "A/A" -> "A"\n                # An 'in' match can match an item anywhere in the string: 'A' in 'A/C' evaluates to True\n                base_counts[index] += summary[base_name]    \n            results[-1].extend(base_counts)  # append to the current record\nresults = np.array(results, dtype=np.float)  # shape is now (x, 12) with x the amount of lines read\nresults[:, 2:7] /= 46\nresults[:, 7:] /= 20\n
            \n

            At this point, the layout of the results is two columns filled with the chrom (results[:,0]) and pos (results[:,1]) labels from the text file,\nthen 5 columns of population A, where the first of those 5 contains the relative frequency of the 'C' base, next\n column of the 'A' base and so on (see nucleotid_bases for the order). Then, the last 5 columns are similar, but they are for population B:

            \n
            chrom, pos, freqC_in_A,..., freqG_in_A, freq_dot_in_A freqC_in_B, ..., freqG_in_B, freq_dot_in_B\n
            \n

            If you want to ignore records (rows) in this table where either of the unknowns-frequencies (columns 6 and 11) are above a threshold, you would do:

            \n
            threshold = .1 # arbitrary: 10%\nto_consider = np.logical_and(results[:,6] < threshold, results[:,11] < threshold)\ntable = results[to_consider][:, [0,1,2,3,4,5,7,8,9,10]]\n
            \n

            Now you can compute the table of frequency differences with:

            \n
            freq_diffs  = np.abs(table[:,2:6] - table[:,-4:])  # 4 columns, n rows\n\nmean_freq_diff = freq_diffs.mean(axis=0) # holds 4 numbers, these are the means over all the rows\nstd_freq_diff = freq_diffs.std(axis=0) # similar: std over all the rows\n\ncondition = freq_diffs > (mean_freq_diff + 2*std_freq_diff)\n
            \n

            Now you'll want to check if the condition was valid for any elements of the row, so e.g. if\nthe frequency difference for 'C' between popA and popB was .8 and the\n(mean+2*std) was .7, then it will return True. But it will also return True\nfor the same row if this condition was fulfilled for any of the other\nnucleotids. To check if the condition was True for any of the nucleotid frequency differences, do this:

            \n
            specials = np.any(condition, axis=1)  \nprint(table[specials, :2])\n
            \n soup wrap:

            As soon as you talk of means and standard deviations for lots of data, you should start using any of the numerical libraries. Consider using numpy, or even pandas (for readability) here. I'll be using them in this example, together with the Counter object from the collections module. Read up on both to see how they work, but I'll explain a bit throughout the code as well.

            import numpy as np
            from collections import Counter    
            
            nucleotid_bases = ('C', 'A', 'T', 'G', '.')
            results = []
            checksum = []
            with open('datafile.txt') as f:
                for line in f:
                    fields = line.split()  # splits by consecutive whitespace, empty records will be purged
                    chrom, pos = [int(fields[x]) for x in (0,1)]
                    results.append([chrom,pos])  # start by building the current record
                    allele1, allele2 = [fields[i] for i in (3,4)]
                    checksum.append([allele1, allele2])  # you wanted to keep these, most likely for debugging purposes?
                    popA = fields[3:26]  # population size: 2*23
                    popB = fields[26:36]  # population size: 2*10
                    for population in (popA, popB):
                        summary = Counter(population) # traverses the line only once - much more efficient!
                        base_counts = [ sum(summary[k] for k in summary.keys() if base in k) for base in nucleotid_bases]
                        for index, base_name in enumerate(nucleotid_bases):
                            # Double the count when there is an exact match, e.g. "A/A" -> "A"
                            # An 'in' match can match an item anywhere in the string: 'A' in 'A/C' evaluates to True
                            base_counts[index] += summary[base_name]    
                        results[-1].extend(base_counts)  # append to the current record
            results = np.array(results, dtype=np.float)  # shape is now (x, 12) with x the amount of lines read
            results[:, 2:7] /= 46
            results[:, 7:] /= 20
            

            At this point, the layout of the results is two columns filled with the chrom (results[:,0]) and pos (results[:,1]) labels from the text file, then 5 columns of population A, where the first of those 5 contains the relative frequency of the 'C' base, next column of the 'A' base and so on (see nucleotid_bases for the order). Then, the last 5 columns are similar, but they are for population B:

            chrom, pos, freqC_in_A,..., freqG_in_A, freq_dot_in_A freqC_in_B, ..., freqG_in_B, freq_dot_in_B
            

            If you want to ignore records (rows) in this table where either of the unknowns-frequencies (columns 6 and 11) are above a threshold, you would do:

            threshold = .1 # arbitrary: 10%
            to_consider = np.logical_and(results[:,6] < threshold, results[:,11] < threshold)
            table = results[to_consider][:, [0,1,2,3,4,5,7,8,9,10]]
            

            Now you can compute the table of frequency differences with:

            freq_diffs  = np.abs(table[:,2:6] - table[:,-4:])  # 4 columns, n rows
            
            mean_freq_diff = freq_diffs.mean(axis=0) # holds 4 numbers, these are the means over all the rows
            std_freq_diff = freq_diffs.std(axis=0) # similar: std over all the rows
            
            condition = freq_diffs > (mean_freq_diff + 2*std_freq_diff)
            

            Now you'll want to check if the condition was valid for any elements of the row, so e.g. if the frequency difference for 'C' between popA and popB was .8 and the (mean+2*std) was .7, then it will return True. But it will also return True for the same row if this condition was fulfilled for any of the other nucleotids. To check if the condition was True for any of the nucleotid frequency differences, do this:

            specials = np.any(condition, axis=1)  
            print(table[specials, :2])
            
            qid & accept id: (27960965, 28251602) query: Parsing an equation with custom functions in python soup:

            a minimal working example (+,-,*,/,** binary and unary operations and function call implemented) the priority of operation are set with parenthesis

            \n

            a little bit more than the functionality for the example given is done

            \n
            from __future__ import print_function\nimport ast\n\ndef transform(eq,functions):\n    class EqVisitor(ast.NodeVisitor):\n        def visit_BinOp(self,node):\n            #generate("=>BinOp")\n            generate("(")\n            self.visit(node.left)\n            self.visit(node.op)\n            #generate("ici",str(node.op),node._fields,node._attributes)\n            #generate(dir(node.op))\n            self.visit(node.right)\n            generate(")")\n            #ast.NodeVisitor.generic_visit(self,node)\n        def visit_USub(self,node):\n            generate("-")\n        def visit_UAdd(self,node):\n            generate("+")\n\n        def visit_Sub(self,node):\n            generate("-")\n        def visit_Add(self,node):\n            generate("+")\n        def visit_Pow(self,node):\n            generate("**")\n        def visit_Mult(self,node):\n            generate("*")\n        def visit_Div(self,node):\n            generate("/")\n        def visit_Name(self,node):\n            generate(node.id)\n        def visit_Call(self,node):\n            debug("function",node.func.id)\n            if node.func.id in functions:\n                debug("defined function")\n                func_visit(functions[node.func.id],node.args)\n                return\n            debug("not defined function",node.func.id)    \n            #generate(node._fields)\n            #generate("args")\n            generate(node.func.id)\n            generate("(")\n            sep = ""\n            for arg in node.args:\n                generate (sep)\n                self.visit(arg)\n                sep=","\n            generate(")")\n        def visit_Num(self,node):\n            generate(node.n)\n        def generic_visit(self, node):\n\n\n            debug ("\n",type(node).__name__)\n            debug (node._fields)\n            ast.NodeVisitor.generic_visit(self, node)  \n\n    def func_visit(definition,concrete_args):\n        class FuncVisitor(EqVisitor):\n            def visit_arguments(self,node):\n                #generate("visit arguments")\n                #generate(node._fields)\n                self.arguments={}\n                for concrete_arg,formal_arg in zip(concrete_args,node.args):\n                    #generate(formal_arg._fields)\n                    self.arguments[formal_arg.id]=concrete_arg\n                debug(self.arguments)\n            def visit_Name(self,node):\n                debug("visit Name",node.id)\n                if node.id in self.arguments:\n                    eqV.visit(self.arguments[node.id])\n                else:\n                    generate(node.id)\n\n\n        funcV=FuncVisitor()\n        funcV.visit(ast.parse(definition))\n\n    eqV=EqVisitor()\n    result = []\n    def generate(s):\n        #following line maybe usefull for debug\n        debug(str(s))\n        result.append(str(s))\n    eqV.visit(ast.parse(eq,mode="eval"))\n    return "".join(result)\ndef debug(*args,**kwargs):\n    #print(*args,**kwargs)\n    pass\n
            \n

            usage:

            \n
            functions= {\n    "f1":"def f1(x,y):return x+y**2",\n    "f2":"def f2(x,y):return sin(x+y)",\n}\neq="-(a+b)+f1(f2(+x,y),z)*4/365.12-h"\nprint(transform(eq,functions))\n
            \n

            result

            \n
            ((-(a+b)+(((sin((+x+y))+(z**2))*4)/365.12))-h)\n
            \n

            WARNING\nThe code works with python2.7 and as it is ast dependant is not garanteed to work with other version of python. The python 3 version don't work

            \n soup wrap:

            a minimal working example (+,-,*,/,** binary and unary operations and function call implemented) the priority of operation are set with parenthesis

            a little bit more than the functionality for the example given is done

            from __future__ import print_function
            import ast
            
            def transform(eq,functions):
                class EqVisitor(ast.NodeVisitor):
                    def visit_BinOp(self,node):
                        #generate("=>BinOp")
                        generate("(")
                        self.visit(node.left)
                        self.visit(node.op)
                        #generate("ici",str(node.op),node._fields,node._attributes)
                        #generate(dir(node.op))
                        self.visit(node.right)
                        generate(")")
                        #ast.NodeVisitor.generic_visit(self,node)
                    def visit_USub(self,node):
                        generate("-")
                    def visit_UAdd(self,node):
                        generate("+")
            
                    def visit_Sub(self,node):
                        generate("-")
                    def visit_Add(self,node):
                        generate("+")
                    def visit_Pow(self,node):
                        generate("**")
                    def visit_Mult(self,node):
                        generate("*")
                    def visit_Div(self,node):
                        generate("/")
                    def visit_Name(self,node):
                        generate(node.id)
                    def visit_Call(self,node):
                        debug("function",node.func.id)
                        if node.func.id in functions:
                            debug("defined function")
                            func_visit(functions[node.func.id],node.args)
                            return
                        debug("not defined function",node.func.id)    
                        #generate(node._fields)
                        #generate("args")
                        generate(node.func.id)
                        generate("(")
                        sep = ""
                        for arg in node.args:
                            generate (sep)
                            self.visit(arg)
                            sep=","
                        generate(")")
                    def visit_Num(self,node):
                        generate(node.n)
                    def generic_visit(self, node):
            
            
                        debug ("\n",type(node).__name__)
                        debug (node._fields)
                        ast.NodeVisitor.generic_visit(self, node)  
            
                def func_visit(definition,concrete_args):
                    class FuncVisitor(EqVisitor):
                        def visit_arguments(self,node):
                            #generate("visit arguments")
                            #generate(node._fields)
                            self.arguments={}
                            for concrete_arg,formal_arg in zip(concrete_args,node.args):
                                #generate(formal_arg._fields)
                                self.arguments[formal_arg.id]=concrete_arg
                            debug(self.arguments)
                        def visit_Name(self,node):
                            debug("visit Name",node.id)
                            if node.id in self.arguments:
                                eqV.visit(self.arguments[node.id])
                            else:
                                generate(node.id)
            
            
                    funcV=FuncVisitor()
                    funcV.visit(ast.parse(definition))
            
                eqV=EqVisitor()
                result = []
                def generate(s):
                    #following line maybe usefull for debug
                    debug(str(s))
                    result.append(str(s))
                eqV.visit(ast.parse(eq,mode="eval"))
                return "".join(result)
            def debug(*args,**kwargs):
                #print(*args,**kwargs)
                pass
            

            usage:

            functions= {
                "f1":"def f1(x,y):return x+y**2",
                "f2":"def f2(x,y):return sin(x+y)",
            }
            eq="-(a+b)+f1(f2(+x,y),z)*4/365.12-h"
            print(transform(eq,functions))
            

            result

            ((-(a+b)+(((sin((+x+y))+(z**2))*4)/365.12))-h)
            

            WARNING The code works with python2.7 and as it is ast dependant is not garanteed to work with other version of python. The python 3 version don't work

            qid & accept id: (27996151, 27996182) query: Can someone please explain to me the way to calculate the number of bills and coins of change in python? soup:

            % is not a module; it is called the modulus (or "remainder") operator.

            \n

            It is the counterpart of integer divison:

            \n
            9 == 4 * 2 + 1\n\n9 // 4 == 2    # integer divison\n9 % 4 == 1     # remainder\n
            \n

            so, for example:

            \n
            # paying $63.51\nx = 6351 // 1000      # == 6    maximum number of $10.00 bills\ny = 6351 % 1000       # == 351  $3.51 not payable in 10s.\n\n# you could instead do\ny = 6351 - (6351 // 1000) * 1000\n\n# this would give the same result,\n# but you've got to admit it's a lot\n# less readable.\n
            \n soup wrap:

            % is not a module; it is called the modulus (or "remainder") operator.

            It is the counterpart of integer divison:

            9 == 4 * 2 + 1
            
            9 // 4 == 2    # integer divison
            9 % 4 == 1     # remainder
            

            so, for example:

            # paying $63.51
            x = 6351 // 1000      # == 6    maximum number of $10.00 bills
            y = 6351 % 1000       # == 351  $3.51 not payable in 10s.
            
            # you could instead do
            y = 6351 - (6351 // 1000) * 1000
            
            # this would give the same result,
            # but you've got to admit it's a lot
            # less readable.
            
            qid & accept id: (28002993, 28003028) query: Remove namedtuple from list based on value soup:

            You can do it the slow way:

            \n
            def remove(self, id=None, value=None):\n    for elem in self:\n        if (id is not None and elem.id == id or\n                value is not None and elem.value == value):\n            super(Orders, self).remove(elem)\n            break\n
            \n

            You could add an index to your class that tracks maps ids and / or values to specific indices, but you'd need to keep that index up-to-date as you manipulate the contained list of orders. It'd look something like this:

            \n
            def __init__(self, *args):\n    # ...\n    self._ids = {}\n\ndef append(self, id, value):\n    if id in ids:\n        raise ValueError('This order already exists!')\n    super(Orders, self).append(Order(id, value))\n    self._ids[id] = len(self) - 1\n
            \n

            and, provided you also adjust all other methods that can alter the list and change ordering, etc., you can then find orders quickly by their id:

            \n
            def remove(self, id):\n    if id not in self._ids\n        raise ValueError('No such order exists!')\n    del self[self._ids[id]]\n
            \n soup wrap:

            You can do it the slow way:

            def remove(self, id=None, value=None):
                for elem in self:
                    if (id is not None and elem.id == id or
                            value is not None and elem.value == value):
                        super(Orders, self).remove(elem)
                        break
            

            You could add an index to your class that tracks maps ids and / or values to specific indices, but you'd need to keep that index up-to-date as you manipulate the contained list of orders. It'd look something like this:

            def __init__(self, *args):
                # ...
                self._ids = {}
            
            def append(self, id, value):
                if id in ids:
                    raise ValueError('This order already exists!')
                super(Orders, self).append(Order(id, value))
                self._ids[id] = len(self) - 1
            

            and, provided you also adjust all other methods that can alter the list and change ordering, etc., you can then find orders quickly by their id:

            def remove(self, id):
                if id not in self._ids
                    raise ValueError('No such order exists!')
                del self[self._ids[id]]
            
            qid & accept id: (28015044, 28015544) query: python - Nested list in dict to csv files soup:

            Starting from your csv_dict, you can do something like

            \n
            import csv\nimport itertools\n\ncsv_dict = {'label1': ['val1', 'val2', 'val3'],\n            'label2': ['otherval1', 'otherval2'],\n            'label3': ['yetanotherval1']}\nkeys = csv_dict.keys()\ncsvrows = itertools.izip_longest(*[csv_dict[k] for k in keys], fillvalue='dummy')\n\nwith open('out.csv', 'w') as csvfile:\n    csvwriter = csv.writer(csvfile, delimiter=';',\n                            quotechar='\\', quoting=csv.QUOTE_MINIMAL)\n    csvwriter.writerow(keys)\n    for row in csvrows:\n        csvwriter.writerow(row)\n
            \n

            Resulting out.csv:

            \n
            label1;label2;label3\nval1;otherval1;yetanotherval1\nval2;otherval2;dummy\nval3;dummy;dummy\n
            \n

            With the following remarks:

            \n
              \n
            • When you are zipping the dictionary's values, you should specify the \norder of the keys
            • \n
            • What do you want to do if the columns differ in length (in your example, there are val1 to val3, but only two othervals)? Maybe something like padding all lists to fit the longest list?
            • \n
            \n soup wrap:

            Starting from your csv_dict, you can do something like

            import csv
            import itertools
            
            csv_dict = {'label1': ['val1', 'val2', 'val3'],
                        'label2': ['otherval1', 'otherval2'],
                        'label3': ['yetanotherval1']}
            keys = csv_dict.keys()
            csvrows = itertools.izip_longest(*[csv_dict[k] for k in keys], fillvalue='dummy')
            
            with open('out.csv', 'w') as csvfile:
                csvwriter = csv.writer(csvfile, delimiter=';',
                                        quotechar='\\', quoting=csv.QUOTE_MINIMAL)
                csvwriter.writerow(keys)
                for row in csvrows:
                    csvwriter.writerow(row)
            

            Resulting out.csv:

            label1;label2;label3
            val1;otherval1;yetanotherval1
            val2;otherval2;dummy
            val3;dummy;dummy
            

            With the following remarks:

            • When you are zipping the dictionary's values, you should specify the order of the keys
            • What do you want to do if the columns differ in length (in your example, there are val1 to val3, but only two othervals)? Maybe something like padding all lists to fit the longest list?
            qid & accept id: (28016212, 28016225) query: Advance a file object more than one line as a way of skipping blank lines and lines containing strings soup:

            To advance more than one line, call next in a loop:

            \n
            for _ in range(times_to_advance):\n    next(file_object)\n
            \n
            \n

            As @MartijnPieters noted in the comments, this solution is not particularly efficient. Its main advantage is simplicity.

            \n

            If however your main concern is performance, you should use the code found in the consume() recipe of the itertools documentation:

            \n
            from itertools import islice\nnext(islice(file_object, times_to_advance, times_to_advance), None)\n
            \n soup wrap:

            To advance more than one line, call next in a loop:

            for _ in range(times_to_advance):
                next(file_object)
            

            As @MartijnPieters noted in the comments, this solution is not particularly efficient. Its main advantage is simplicity.

            If however your main concern is performance, you should use the code found in the consume() recipe of the itertools documentation:

            from itertools import islice
            next(islice(file_object, times_to_advance, times_to_advance), None)
            
            qid & accept id: (28020874, 28026091) query: Permutation of values on numpy array/matrix soup:

            You could do this by creating an (n, m, m, ..., m) array of indices for column 1, column 2, ..., column n using np.indices(), then reshaping the output into an (n ** m, n) array:

            \n
            import numpy as np\n\ndef permgrid(m, n):\n    inds = np.indices((m,) * n)\n    return inds.reshape(n, -1).T\n
            \n

            For example:

            \n
            print(permgrid(2, 3))\n\n# [[0 0 0]\n#  [0 0 1]\n#  [0 1 0]\n#  [0 1 1]\n#  [1 0 0]\n#  [1 0 1]\n#  [1 1 0]\n#  [1 1 1]]\n
            \n soup wrap:

            You could do this by creating an (n, m, m, ..., m) array of indices for column 1, column 2, ..., column n using np.indices(), then reshaping the output into an (n ** m, n) array:

            import numpy as np
            
            def permgrid(m, n):
                inds = np.indices((m,) * n)
                return inds.reshape(n, -1).T
            

            For example:

            print(permgrid(2, 3))
            
            # [[0 0 0]
            #  [0 0 1]
            #  [0 1 0]
            #  [0 1 1]
            #  [1 0 0]
            #  [1 0 1]
            #  [1 1 0]
            #  [1 1 1]]
            
            qid & accept id: (28036818, 28036943) query: How to xor in python using hex soup:

            Depending on exactly what you mean by "a hex string", it should be easy. E.g:

            \n
            >>> text=b'Hello World'\n>>> hexi=b'\12\34\45\EF\CD\AB'\n>>> xors=[ord(t)^ord(x) for t,x in zip(text,hexi)]\n>>> xors\n[66, 121, 73, 48, 42, 102, 11, 44, 54, 48, 37]\n
            \n

            Now you have to decide how you want to represent this list of small integers. array.array would be best, or, as a bytestring:

            \n
            >>> b''.join(chr(x) for x in xors)\n'ByI0*f\x0b,60%'\n
            \n

            (this would show with a leading b in Python 3, where the distinction between strings of bytes and actual text is clearer and sharper, but all the code I show here works otherwise the same in Python 2 and 3).

            \n soup wrap:

            Depending on exactly what you mean by "a hex string", it should be easy. E.g:

            >>> text=b'Hello World'
            >>> hexi=b'\12\34\45\EF\CD\AB'
            >>> xors=[ord(t)^ord(x) for t,x in zip(text,hexi)]
            >>> xors
            [66, 121, 73, 48, 42, 102, 11, 44, 54, 48, 37]
            

            Now you have to decide how you want to represent this list of small integers. array.array would be best, or, as a bytestring:

            >>> b''.join(chr(x) for x in xors)
            'ByI0*f\x0b,60%'
            

            (this would show with a leading b in Python 3, where the distinction between strings of bytes and actual text is clearer and sharper, but all the code I show here works otherwise the same in Python 2 and 3).

            qid & accept id: (28059257, 28113655) query: How to select QTableView index or row from inside of Model soup:

            You can achieve the selection within the filterAcceptsRow() method of the proxy model, but doing so would require the following:

            \n
              \n
            1. That your proxy model (or source model) contain a reference to the QTableView instance.
            2. \n
            3. That your proxy model contain an attribute indicating whether it is active. This is because you want to only select the table rows when the button is clicked, but filterAcceptsRow() is called automatically by the proxy model. Therefore, you would want to avoid calling the view's selectRow() method until the button is clicked.
            4. \n
            \n

            To achieve #1, you could define a simple setter method in your proxy model class:

            \n
            def setView(self, view):\n    self._view = view\n
            \n

            You would also need to of course invoke that setter within your MyWindow class's constructor:

            \n
            proxyModel.setView(self.tableview)\n
            \n

            Achieving #2 is a simple matter of creating this attribute in the proxy model class's constructor

            \n
            self.filterActive = False\n
            \n

            Now that your classes are prepared, you can implement your desired behavior. In your filterAcceptsRow() re-implementation, you only want to select the rows if they contain '_B_' and is the filter is active (that is, the button was clicked):

            \n
            def filterAcceptsRow(self, row, parent):\n    if self.filterActive and '_B_' in self.sourceModel().data(self.sourceModel().index(row, 0), Qt.DisplayRole).toPyObject():\n        self._view.selectRow(row)\n    return True\n
            \n

            Finally, you want to make sure that these conditions are met once the button is clicked, so in your clicked() method you need to set the proxyModel's filterActive attribute to True and you need to call the QSortFilterProxyModel class's invalidateFilter() method to indicate that the existing filter is invalid and therefore filterAcceptsRow() should be called again:

            \n
            def clicked(self, arg):\n    proxyModel=self.tableview.model()\n    self.tableview.clearSelection()\n    proxyModel.filterActive = True\n    proxyModel.invalidateFilter()\n
            \n

            So the new code, in full, is:

            \n
            from PyQt4.QtCore import *\nfrom PyQt4.QtGui import *\nimport sys\n\nclass Model(QAbstractTableModel):\n    def __init__(self, parent=None, *args):\n        QAbstractTableModel.__init__(self, parent, *args)\n        self.items = ['Item_A_001','Item_A_002','Item_B_001','Item_B_002']\n\n    def rowCount(self, parent=QModelIndex()):\n        return len(self.items)       \n    def columnCount(self, parent=QModelIndex()):\n        return 1\n\n    def data(self, index, role):\n        if not index.isValid(): return QVariant()\n        elif role != Qt.DisplayRole:\n            return QVariant()\n\n        row=index.row()\n        if row
            \n

            Having said all of that, the purpose of filterAcceptsRow() is so that you can implement your own custom filtering in a subclass of QSortFilterProxyModel. So, a more typical implementation (following your desired rule) would be:

            \n
            def filterAcceptsRow(self, row, parent):\n    if not self.filterActive or '_B_' in self.sourceModel().data(self.sourceModel().index(row, 0), Qt.DisplayRole).toPyObject():\n        return True\n    return False\n
            \n

            And even then, because the filtering could be done with regex, reimplementation of filterAcceptsRow() isn't even necessary. You could just call proxyModel.setFilterRegExp(QRegExp("_B_", Qt.CaseInsensitive, QRegExp.FixedString)) and proxyModel.setFilterKeyColumn(0) to achieve the same thing, filter-wise.

            \n

            Hope that helps!

            \n soup wrap:

            You can achieve the selection within the filterAcceptsRow() method of the proxy model, but doing so would require the following:

            1. That your proxy model (or source model) contain a reference to the QTableView instance.
            2. That your proxy model contain an attribute indicating whether it is active. This is because you want to only select the table rows when the button is clicked, but filterAcceptsRow() is called automatically by the proxy model. Therefore, you would want to avoid calling the view's selectRow() method until the button is clicked.

            To achieve #1, you could define a simple setter method in your proxy model class:

            def setView(self, view):
                self._view = view
            

            You would also need to of course invoke that setter within your MyWindow class's constructor:

            proxyModel.setView(self.tableview)
            

            Achieving #2 is a simple matter of creating this attribute in the proxy model class's constructor

            self.filterActive = False
            

            Now that your classes are prepared, you can implement your desired behavior. In your filterAcceptsRow() re-implementation, you only want to select the rows if they contain '_B_' and is the filter is active (that is, the button was clicked):

            def filterAcceptsRow(self, row, parent):
                if self.filterActive and '_B_' in self.sourceModel().data(self.sourceModel().index(row, 0), Qt.DisplayRole).toPyObject():
                    self._view.selectRow(row)
                return True
            

            Finally, you want to make sure that these conditions are met once the button is clicked, so in your clicked() method you need to set the proxyModel's filterActive attribute to True and you need to call the QSortFilterProxyModel class's invalidateFilter() method to indicate that the existing filter is invalid and therefore filterAcceptsRow() should be called again:

            def clicked(self, arg):
                proxyModel=self.tableview.model()
                self.tableview.clearSelection()
                proxyModel.filterActive = True
                proxyModel.invalidateFilter()
            

            So the new code, in full, is:

            from PyQt4.QtCore import *
            from PyQt4.QtGui import *
            import sys
            
            class Model(QAbstractTableModel):
                def __init__(self, parent=None, *args):
                    QAbstractTableModel.__init__(self, parent, *args)
                    self.items = ['Item_A_001','Item_A_002','Item_B_001','Item_B_002']
            
                def rowCount(self, parent=QModelIndex()):
                    return len(self.items)       
                def columnCount(self, parent=QModelIndex()):
                    return 1
            
                def data(self, index, role):
                    if not index.isValid(): return QVariant()
                    elif role != Qt.DisplayRole:
                        return QVariant()
            
                    row=index.row()
                    if row

            Having said all of that, the purpose of filterAcceptsRow() is so that you can implement your own custom filtering in a subclass of QSortFilterProxyModel. So, a more typical implementation (following your desired rule) would be:

            def filterAcceptsRow(self, row, parent):
                if not self.filterActive or '_B_' in self.sourceModel().data(self.sourceModel().index(row, 0), Qt.DisplayRole).toPyObject():
                    return True
                return False
            

            And even then, because the filtering could be done with regex, reimplementation of filterAcceptsRow() isn't even necessary. You could just call proxyModel.setFilterRegExp(QRegExp("_B_", Qt.CaseInsensitive, QRegExp.FixedString)) and proxyModel.setFilterKeyColumn(0) to achieve the same thing, filter-wise.

            Hope that helps!

            qid & accept id: (28067258, 28067888) query: python pandas Slicing datetime dates by number of rows soup:

            You can do it like this:

            \n
            from pandas.tseries.offsets import BDay\n\nd = pd.Timestamp('1/5/2015')\ntwo_bdays_before = d - BDay(2)   # business days\ntwo_bdays_later = d + BDay(2)\n
            \n

            Then to access all days between two_bdays_before and two_bdays_later:

            \n
            >>> df[two_bdays_before:two_bdays_later]]\n                   A         B         C         D\n2015-01-01  0.741045 -0.051576  0.228247 -0.429165\n2015-01-02 -0.312247 -0.391012 -0.256515 -0.849694\n2015-01-03 -0.581522 -1.472528  0.431249  0.673033\n2015-01-04 -1.408855  0.564948  1.019376  2.986657\n2015-01-05 -0.566606 -0.316533  1.201412 -1.390179\n2015-01-06 -0.052672  0.293277 -0.566395 -1.591686\n2015-01-07 -1.669806  1.699540  0.082697 -1.229178\n
            \n soup wrap:

            You can do it like this:

            from pandas.tseries.offsets import BDay
            
            d = pd.Timestamp('1/5/2015')
            two_bdays_before = d - BDay(2)   # business days
            two_bdays_later = d + BDay(2)
            

            Then to access all days between two_bdays_before and two_bdays_later:

            >>> df[two_bdays_before:two_bdays_later]]
                               A         B         C         D
            2015-01-01  0.741045 -0.051576  0.228247 -0.429165
            2015-01-02 -0.312247 -0.391012 -0.256515 -0.849694
            2015-01-03 -0.581522 -1.472528  0.431249  0.673033
            2015-01-04 -1.408855  0.564948  1.019376  2.986657
            2015-01-05 -0.566606 -0.316533  1.201412 -1.390179
            2015-01-06 -0.052672  0.293277 -0.566395 -1.591686
            2015-01-07 -1.669806  1.699540  0.082697 -1.229178
            
            qid & accept id: (28072914, 28074267) query: Data structure for UDP Server parsing JSON objects in python soup:

            1) You can implement a UDP server which processes incoming messages in\nan infinite loop much more simply with these lines of code:

            \n
             import socket\n\n def udp_server(udp_ip, udp_port, ...):\n   sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)\n   sock.bind((upd_ip, upd_port))\n   while True:\n     data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes\n     ...process data...\n
            \n

            See https://wiki.python.org/moin/UdpCommunication for more details\nand discussion.

            \n

            The arguments to udp_server are the udp IP and port, and whatever\nother data structures the server needs to interact with.

            \n

            And kicking this off into its own thread is very easily acccomplished\nwith:

            \n
             import threading\n\n t = threading.Thread(target = udp_server, args = (...))\n t.start()\n
            \n

            2) The NodeTable class is just a wrapper around a python dictionary,\nbut it seems that you want to have multiple threads access it at the\nsame time. In that case you should read this SO answer:\n(link).

            \n

            Depending on what other threads besides the server thread can do to\nthe node dictionary you may or may not need a lock.

            \n

            To summarize, where is how I would write the code:

            \n
             def main():\n   nodes = {}     # use a simple dict for storing the nodes\n   lock = RLock() # if you need this\n   # pass nodes and lock to server thread and start it\n   t = threading.Thread(target = udp_server, args = (udp_ip, udp_port, nodes, lock))\n   t.start() \n   ...\n
            \n

            At this point the udp server is running and the main thread can access the node\ntable via the variable nodes.

            \n

            Does the main thread need to be informed when new nodes have been added to\nthe node table? Then perhaps a Queue is what you want. You would 1) create it\nin main() and 2) pass it to udp_server:

            \n
             def main()\n   nodes = {}     # use a simple dict for storing the nodes\n   lock = RLock() # if you need this\n   q = Queue()    # create a Queue and pass it to the udp server\n   # pass nodes and lock to server thread and start it\n   t = threading.Thread(target = udp_server, args = (udp_ip, udp_port, nodes, lock, q))\n   t.start() \n   # process entries from the Queue\n   while True:\n     item = q.get()\n     ... process item...\n
            \n

            and in the udp server function ...process data... will put something onto the queue:

            \n
               while True:\n     data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes\n     ...json decode, etc. ...\n     q.put(...) \n
            \n soup wrap:

            1) You can implement a UDP server which processes incoming messages in an infinite loop much more simply with these lines of code:

             import socket
            
             def udp_server(udp_ip, udp_port, ...):
               sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
               sock.bind((upd_ip, upd_port))
               while True:
                 data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes
                 ...process data...
            

            See https://wiki.python.org/moin/UdpCommunication for more details and discussion.

            The arguments to udp_server are the udp IP and port, and whatever other data structures the server needs to interact with.

            And kicking this off into its own thread is very easily acccomplished with:

             import threading
            
             t = threading.Thread(target = udp_server, args = (...))
             t.start()
            

            2) The NodeTable class is just a wrapper around a python dictionary, but it seems that you want to have multiple threads access it at the same time. In that case you should read this SO answer: (link).

            Depending on what other threads besides the server thread can do to the node dictionary you may or may not need a lock.

            To summarize, where is how I would write the code:

             def main():
               nodes = {}     # use a simple dict for storing the nodes
               lock = RLock() # if you need this
               # pass nodes and lock to server thread and start it
               t = threading.Thread(target = udp_server, args = (udp_ip, udp_port, nodes, lock))
               t.start() 
               ...
            

            At this point the udp server is running and the main thread can access the node table via the variable nodes.

            Does the main thread need to be informed when new nodes have been added to the node table? Then perhaps a Queue is what you want. You would 1) create it in main() and 2) pass it to udp_server:

             def main()
               nodes = {}     # use a simple dict for storing the nodes
               lock = RLock() # if you need this
               q = Queue()    # create a Queue and pass it to the udp server
               # pass nodes and lock to server thread and start it
               t = threading.Thread(target = udp_server, args = (udp_ip, udp_port, nodes, lock, q))
               t.start() 
               # process entries from the Queue
               while True:
                 item = q.get()
                 ... process item...
            

            and in the udp server function ...process data... will put something onto the queue:

               while True:
                 data, addr = sock.recvfrom(1024) # buffer size is 1024 bytes
                 ...json decode, etc. ...
                 q.put(...) 
            
            qid & accept id: (28076006, 28076068) query: How to store a name and score into a list? soup:

            This will add them to the list:

            \n
            scores.append({'name': name, 'score': score})\n
            \n

            But I would change your first line as well to:

            \n
            scores = [{'Name':'Sam': 'Score':10}]\n
            \n

            Alternatively, if you want to keep them as a string, just do this:

            \n
            scores.append("%s:%s" % (name, score))\n
            \n soup wrap:

            This will add them to the list:

            scores.append({'name': name, 'score': score})
            

            But I would change your first line as well to:

            scores = [{'Name':'Sam': 'Score':10}]
            

            Alternatively, if you want to keep them as a string, just do this:

            scores.append("%s:%s" % (name, score))
            
            qid & accept id: (28083576, 28083974) query: How to recursively sum and store all child values in a tree soup:

            Just judge if a node is a leaf and add the sum to the weight, here is an example:

            \n
            class Node:\n    def __init__(self, name, weight, children):\n        self.children = children\n        self.weight = weight\n        self.weight_plus_children = weight\n\n    def get_all_weight(self):\n        if self.children is None:\n          return self.weight_plus_children\n        else:\n          for child in self.children:\n            print "child.get_all_weight()", child.get_weigth_with_children()\n            self.weight_plus_children += child.get_weigth_with_children()\n\n        return self.weight_plus_children\n\n    def get_weigth_with_children(self):\n        return self.weight_plus_children\n\nleaf1 = Node('C1', 58, None)\nleaf2 = Node('C2', 7, None)\nleaf3 = Node('C3', 10, None)\nleaf4 = Node('C4', 20, None)\n\nsubroot = Node('B1', 50, [leaf1, leaf2])\nsubroot1 = Node('B2', 50, [leaf3, leaf4])\n\nroot = Node('A', 100, [subroot, subroot1])\n\nprint subroot.get_all_weight()\nprint\nprint subroot1.get_all_weight()\nprint\nprint root.get_all_weight()\n
            \n

            OutPut:

            \n
            F:\so>python test-tree.py\nchild.get_all_weight() 58\nchild.get_all_weight() 7\n115\n\nchild.get_all_weight() 10\nchild.get_all_weight() 20\n80\n\nchild.get_all_weight() 115\nchild.get_all_weight() 80\n295\n
            \n soup wrap:

            Just judge if a node is a leaf and add the sum to the weight, here is an example:

            class Node:
                def __init__(self, name, weight, children):
                    self.children = children
                    self.weight = weight
                    self.weight_plus_children = weight
            
                def get_all_weight(self):
                    if self.children is None:
                      return self.weight_plus_children
                    else:
                      for child in self.children:
                        print "child.get_all_weight()", child.get_weigth_with_children()
                        self.weight_plus_children += child.get_weigth_with_children()
            
                    return self.weight_plus_children
            
                def get_weigth_with_children(self):
                    return self.weight_plus_children
            
            leaf1 = Node('C1', 58, None)
            leaf2 = Node('C2', 7, None)
            leaf3 = Node('C3', 10, None)
            leaf4 = Node('C4', 20, None)
            
            subroot = Node('B1', 50, [leaf1, leaf2])
            subroot1 = Node('B2', 50, [leaf3, leaf4])
            
            root = Node('A', 100, [subroot, subroot1])
            
            print subroot.get_all_weight()
            print
            print subroot1.get_all_weight()
            print
            print root.get_all_weight()
            

            OutPut:

            F:\so>python test-tree.py
            child.get_all_weight() 58
            child.get_all_weight() 7
            115
            
            child.get_all_weight() 10
            child.get_all_weight() 20
            80
            
            child.get_all_weight() 115
            child.get_all_weight() 80
            295
            
            qid & accept id: (28104377, 28111650) query: how to convert UTF-8 code to symbol characters in python soup:

            Your code is broken: u.read() returns bytes object. str(bytes_object) returns a string representation of the object (how the bytes literal would look like) -- you don't want it here:

            \n
            >>> str(b'\xe2\x86\x90')\n"b'\\xe2\\x86\\x90'"\n
            \n

            Either save the bytes on disk as is:

            \n
            import urllib.request\n\nurllib.request.urlretrieve('http://stackoverflow.com', 'so.html')\n
            \n

            or open the file in binary mode: 'wb' and save it manually:

            \n
            import shutil\nfrom urllib.request import urlopen\n\nwith urlopen('http://stackoverflow.com') as u, open('so.html', 'wb') as file:\n    shutil.copyfileobj(u, file)\n
            \n

            or convert bytes to Unicode and save them to disk using any encoding you like.

            \n
            import io\nimport shutil\nfrom urllib.request import urlopen\n\nwith urlopen('http://stackoverflow.com') as u, \\n     open('so.html', 'w', encoding='utf-8', newline='') as file, \\n     io.TextIOWrapper(u, encoding=u.headers.get_content_charset('utf-8'), newline='') as t:\n    shutil.copyfileobj(t, file)\n
            \n soup wrap:

            Your code is broken: u.read() returns bytes object. str(bytes_object) returns a string representation of the object (how the bytes literal would look like) -- you don't want it here:

            >>> str(b'\xe2\x86\x90')
            "b'\\xe2\\x86\\x90'"
            

            Either save the bytes on disk as is:

            import urllib.request
            
            urllib.request.urlretrieve('http://stackoverflow.com', 'so.html')
            

            or open the file in binary mode: 'wb' and save it manually:

            import shutil
            from urllib.request import urlopen
            
            with urlopen('http://stackoverflow.com') as u, open('so.html', 'wb') as file:
                shutil.copyfileobj(u, file)
            

            or convert bytes to Unicode and save them to disk using any encoding you like.

            import io
            import shutil
            from urllib.request import urlopen
            
            with urlopen('http://stackoverflow.com') as u, \
                 open('so.html', 'w', encoding='utf-8', newline='') as file, \
                 io.TextIOWrapper(u, encoding=u.headers.get_content_charset('utf-8'), newline='') as t:
                shutil.copyfileobj(t, file)
            
            qid & accept id: (28176483, 28176512) query: Extract a value out of n soup:

            You can do a simple list comprehension

            \n
            >>> n = 3\n>>> l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n>>> [i for i in l if i%n==0]\n[0, 3, 6, 9]\n
            \n

            If your list is always like that, then you can use strides

            \n
            >>> l[::3]\n[0, 3, 6, 9]\n
            \n

            Tip

            \n

            Use range to generate lists like that

            \n
            >>> range(10)\n[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n
            \n soup wrap:

            You can do a simple list comprehension

            >>> n = 3
            >>> l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
            >>> [i for i in l if i%n==0]
            [0, 3, 6, 9]
            

            If your list is always like that, then you can use strides

            >>> l[::3]
            [0, 3, 6, 9]
            

            Tip

            Use range to generate lists like that

            >>> range(10)
            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
            
            qid & accept id: (28187921, 28188079) query: Scipy sparse matrices element wise multiplication soup:

            I suspect that your sparse matrices are becoming non sparse when you perform the operation have you tried just:

            \n
            A.multiply(B)\n
            \n

            As I suspect that it will be better optimised than anything that you can easily do.

            \n

            If A is not already the correct type of sparse matrix you might need:

            \n
            A = A.tocsr()\n# May also need \n# B = B.tocsr()\nA = A.multiply(B)\n
            \n soup wrap:

            I suspect that your sparse matrices are becoming non sparse when you perform the operation have you tried just:

            A.multiply(B)
            

            As I suspect that it will be better optimised than anything that you can easily do.

            If A is not already the correct type of sparse matrix you might need:

            A = A.tocsr()
            # May also need 
            # B = B.tocsr()
            A = A.multiply(B)
            
            qid & accept id: (28198883, 28199417) query: Auto increament the invoice number in django backend for new invoice soup:

            Define a function to generate invoice number.

            \n
            def increment_invoice_number():\n    last_invoice = Invoice.objects.all().order_by('id').last()\n    if not last_invoice:\n         return 'MAG0001'\n    invoice_no = last_invoice.invoice_no\n    invoice_int = int(invoice_no.split('MAG')[-1])\n    new_invoice_int = invoice_int + 1\n    new_invoice_no = 'MAG' + str(new_invoice_int)\n    return new_invoice_no\n
            \n

            Now use this function as default value in your model filed.

            \n
            invoice_no = models.CharField(max_length=500, default=increment_invoice_number, null=True, blank=True)\n
            \n

            This is just an idea. Modify the function to match your preferred invoice number format.

            \n soup wrap:

            Define a function to generate invoice number.

            def increment_invoice_number():
                last_invoice = Invoice.objects.all().order_by('id').last()
                if not last_invoice:
                     return 'MAG0001'
                invoice_no = last_invoice.invoice_no
                invoice_int = int(invoice_no.split('MAG')[-1])
                new_invoice_int = invoice_int + 1
                new_invoice_no = 'MAG' + str(new_invoice_int)
                return new_invoice_no
            

            Now use this function as default value in your model filed.

            invoice_no = models.CharField(max_length=500, default=increment_invoice_number, null=True, blank=True)
            

            This is just an idea. Modify the function to match your preferred invoice number format.

            qid & accept id: (28199359, 28206516) query: Python: sharing a list between threads soup:
            l = [0]\n
            \n

            You seem to be confused about how assignment works in Python. You seem to think that the above line modifies the object to which l was previously bound. It does not.

            \n

            The above line creates a new list and binds the local name l to it. Whatever object l may have been bound to previously is no longer related to the name l. Any subsequent use of l in this scope will refer to this newly created list.

            \n

            Consider this single-threaded code:

            \n
            a = b = [1]  # a and b are both bound to the same list\nprint a,b    # [1] [1]\nb[0] = 2     # modifies the object to which a and b are bound\nprint a,b    # [2] [2]\nb = [0]      # now b is bound to a new list\nprint a,b    # [2] [0]\n
            \n

            Notice how b[0] = 2 and b = [0] differ. In the first one, the object to which b is bound is modified. In the second, b is bound to a whole new object.

            \n

            Similarly, l = [0] in your code binds l to a new object and you have lost and cannot regain any reference you had to the original object.

            \n soup wrap:
            l = [0]
            

            You seem to be confused about how assignment works in Python. You seem to think that the above line modifies the object to which l was previously bound. It does not.

            The above line creates a new list and binds the local name l to it. Whatever object l may have been bound to previously is no longer related to the name l. Any subsequent use of l in this scope will refer to this newly created list.

            Consider this single-threaded code:

            a = b = [1]  # a and b are both bound to the same list
            print a,b    # [1] [1]
            b[0] = 2     # modifies the object to which a and b are bound
            print a,b    # [2] [2]
            b = [0]      # now b is bound to a new list
            print a,b    # [2] [0]
            

            Notice how b[0] = 2 and b = [0] differ. In the first one, the object to which b is bound is modified. In the second, b is bound to a whole new object.

            Similarly, l = [0] in your code binds l to a new object and you have lost and cannot regain any reference you had to the original object.

            qid & accept id: (28208430, 28208803) query: numpy random numpers in specified shape of any complexity soup:

            If you are working with NumPy arrays, it is quite simple:

            \n
            a = np.array([[1,2,3],[4,5,6],[7,8,9]])\nnp.random.random(a.shape)\n
            \n

            If you have lists, you could do:

            \n
            import random\n\ndef shaperand(s):\n    return [shaperand(e) if isinstance(e, list) else random.random() for e in s]\n
            \n soup wrap:

            If you are working with NumPy arrays, it is quite simple:

            a = np.array([[1,2,3],[4,5,6],[7,8,9]])
            np.random.random(a.shape)
            

            If you have lists, you could do:

            import random
            
            def shaperand(s):
                return [shaperand(e) if isinstance(e, list) else random.random() for e in s]
            
            qid & accept id: (28244029, 28329426) query: Using multiple programs simultaneously in Python soup:

            Answering my own question.\nShort answer is Threading will do it. It is also unnecessary. The subprocess module has enough for me to make it work. I just wasn't doing it right

            \n
            RunSer2Command(lines2[21])\ntime.sleep(1)   \nls_output = subprocess.Popen(['tcpclient.exe','192.168.4.110','8000','10000','1400'],stdin=subprocess.PIPE,stdout=subprocess.PIPE,bufsize=3)\ntime.sleep(2)\nRunSer2Command(lines2[22])\nRunSer2Command(lines2[23])\ntime.sleep(1)\nls_output.communicate(input = '3')\nls_output.wait()\nRunSer2Command(lines2[24])\n
            \n

            For those who care threading route did get me to a certain point and I'll add that but I didn't go to the last condition since well... I found the easier route

            \n
            def start_tcp_client(cond): \n    ls_output = subprocess.Popen(['tcpclient.exe','192.168.4.110','8000','1000','1400'],stdin=subprocess.PIPE,stdout=subprocess.PIPE,bufsize=3)\n    with cond:\n        cond.wait()\n        ls_output.communicate(input = '3')\n        ls_output.communicate()\n\ndef TCPSettings(cond):\n    with cond:\n        RunSer2Command(lines2[22])\n        RunSer2Command(lines2[23])\n        cond.notify()\n\n    condition = threading.Condition()\n    condition1 = threading.Condition()\n    Client_thread=threading.Thread(name='Client_thread', target=start_tcp_client, args=(condition,))\n    TCP_thread=threading.Thread(name='TCP_thread', target=TCPSettings, args=(condition,))\n    RunSer2Command(lines2[21])\n    time.sleep(2)   \n    Client_thread.start()\n    time.sleep(2)\n    TCP_thread.start()\n    time.sleep(1)\n    Client_thread.join()\n    time.sleep(10)\n    RunSer2Command(lines2[24])\n
            \n

            I still havent managed to work out all the kinks. There appear to be some timing issues. I'll update it once I get it working perfectly.

            \n soup wrap:

            Answering my own question. Short answer is Threading will do it. It is also unnecessary. The subprocess module has enough for me to make it work. I just wasn't doing it right

            RunSer2Command(lines2[21])
            time.sleep(1)   
            ls_output = subprocess.Popen(['tcpclient.exe','192.168.4.110','8000','10000','1400'],stdin=subprocess.PIPE,stdout=subprocess.PIPE,bufsize=3)
            time.sleep(2)
            RunSer2Command(lines2[22])
            RunSer2Command(lines2[23])
            time.sleep(1)
            ls_output.communicate(input = '3')
            ls_output.wait()
            RunSer2Command(lines2[24])
            

            For those who care threading route did get me to a certain point and I'll add that but I didn't go to the last condition since well... I found the easier route

            def start_tcp_client(cond): 
                ls_output = subprocess.Popen(['tcpclient.exe','192.168.4.110','8000','1000','1400'],stdin=subprocess.PIPE,stdout=subprocess.PIPE,bufsize=3)
                with cond:
                    cond.wait()
                    ls_output.communicate(input = '3')
                    ls_output.communicate()
            
            def TCPSettings(cond):
                with cond:
                    RunSer2Command(lines2[22])
                    RunSer2Command(lines2[23])
                    cond.notify()
            
                condition = threading.Condition()
                condition1 = threading.Condition()
                Client_thread=threading.Thread(name='Client_thread', target=start_tcp_client, args=(condition,))
                TCP_thread=threading.Thread(name='TCP_thread', target=TCPSettings, args=(condition,))
                RunSer2Command(lines2[21])
                time.sleep(2)   
                Client_thread.start()
                time.sleep(2)
                TCP_thread.start()
                time.sleep(1)
                Client_thread.join()
                time.sleep(10)
                RunSer2Command(lines2[24])
            

            I still havent managed to work out all the kinks. There appear to be some timing issues. I'll update it once I get it working perfectly.

            qid & accept id: (28246655, 28255088) query: Pandas Datframe1 search for match in range of Dataframe2 soup:

            The following assumes that the columns to compare have the same names.

            \n
            def temp(row):\n    index = df2[((row-df2).abs() < .3).all(axis=1)].index\n    return df2.loc[index[0], :] if len(index) else [None]*df2.shape[1]\n
            \n

            Eg.

            \n
            df1 = pd.DataFrame([[1,2],[3,4], [5,6]], columns=["d1", "d2"])\ndf2 = pd.DataFrame([[1.1,1.9],[3.2,4.3]], columns=["d1", "d2"])\ndf1.apply(temp, axis=1)\n
            \n

            produces

            \n
                d1   d2\n0  1.1  1.9\n1  3.2  4.3\n2  NaN  NaN\n
            \n soup wrap:

            The following assumes that the columns to compare have the same names.

            def temp(row):
                index = df2[((row-df2).abs() < .3).all(axis=1)].index
                return df2.loc[index[0], :] if len(index) else [None]*df2.shape[1]
            

            Eg.

            df1 = pd.DataFrame([[1,2],[3,4], [5,6]], columns=["d1", "d2"])
            df2 = pd.DataFrame([[1.1,1.9],[3.2,4.3]], columns=["d1", "d2"])
            df1.apply(temp, axis=1)
            

            produces

                d1   d2
            0  1.1  1.9
            1  3.2  4.3
            2  NaN  NaN
            
            qid & accept id: (28255693, 28255823) query: Randomize a generator soup:

            You cannot skip ahead in a generator. There are ways to iterate and create valid random sample, but you'd have to put an upper limit on how many elements you'd iterate. It then would not represent a valid random selection from all possible values the generator could produce.

            \n

            If you are producing combinations of 3 elements from a large list, then just pick samples of 3:

            \n
            def random_combinations_sample(lst, element_count, sample_size):\n    result = set()\n    while len(result) < sample_size:\n        indices = random.sample(xrange(len(lst)), element_count)\n        sample = tuple(lst[i] for i in sorted(indices))\n        result.add(sample)\n    return list(result)\n
            \n

            There is no need to produce all possible combinations if you only need a random set of combinations. Like itertools.combinations(), elements are picked in the order they appear in the input list.

            \n

            Instead of:

            \n
            random.sample(itertools.combinations(a_large_set, 3), 10)\n
            \n

            you'd use

            \n
            random_combinations_sample(a_large_set, 3, 10)\n
            \n soup wrap:

            You cannot skip ahead in a generator. There are ways to iterate and create valid random sample, but you'd have to put an upper limit on how many elements you'd iterate. It then would not represent a valid random selection from all possible values the generator could produce.

            If you are producing combinations of 3 elements from a large list, then just pick samples of 3:

            def random_combinations_sample(lst, element_count, sample_size):
                result = set()
                while len(result) < sample_size:
                    indices = random.sample(xrange(len(lst)), element_count)
                    sample = tuple(lst[i] for i in sorted(indices))
                    result.add(sample)
                return list(result)
            

            There is no need to produce all possible combinations if you only need a random set of combinations. Like itertools.combinations(), elements are picked in the order they appear in the input list.

            Instead of:

            random.sample(itertools.combinations(a_large_set, 3), 10)
            

            you'd use

            random_combinations_sample(a_large_set, 3, 10)
            
            qid & accept id: (28257632, 28257988) query: In Python how do I parse the 11th and 12th bit of 3 bytes? soup:

            You could try:

            \n
            (int.from_bytes(bytes_input, 'big') >> bit_position) & 0b11\n
            \n

            It doesn't appear to be any quicker though, just terser.

            \n

            However, int.from_bytes(bytes_input, 'big') is the most time consuming part of that code snippet by a factor 2 to 1. If you can convert your data from bytes to int once, at the beginning of the program, then you will see quicker bit masking operations.

            \n
            In [52]: %timeit n = int.from_bytes(bytes_input, 'big')\n1000000 loops, best of 3: 237 ns per loop\n\nIn [53]: %timeit n >> bit_position & 0b11\n10000000 loops, best of 3: 107 ns per loop\n
            \n soup wrap:

            You could try:

            (int.from_bytes(bytes_input, 'big') >> bit_position) & 0b11
            

            It doesn't appear to be any quicker though, just terser.

            However, int.from_bytes(bytes_input, 'big') is the most time consuming part of that code snippet by a factor 2 to 1. If you can convert your data from bytes to int once, at the beginning of the program, then you will see quicker bit masking operations.

            In [52]: %timeit n = int.from_bytes(bytes_input, 'big')
            1000000 loops, best of 3: 237 ns per loop
            
            In [53]: %timeit n >> bit_position & 0b11
            10000000 loops, best of 3: 107 ns per loop
            
            qid & accept id: (28267563, 28268033) query: MySQL select all components of a product soup:

            MySQL doesn't support "hierarchical" queries. It is possible to emulate that functionality, to a finite number of levels, using multiple queries. The results from the queries can be combined with a UNION ALL operation.

            \n

            Getting the rows returned in a particular sequence can be problematic, depending on the actual criteria you need the rows returned.

            \n

            First level:

            \n
            SELECT t1.alt_bilesen\n  FROM urunler_seviyeler t1 \n WHERE t1.parcano = 'E40'\n
            \n

            Second level:

            \n
            SELECT t2.alt_bilesen\n  FROM urunler_seviyeler t1\n  JOIN urunler_seviyeler t2\n    ON t2.parcano = t1.alt_bilesen\n WHERE t1.parcano = 'E40'\n
            \n

            Third level:

            \n
            SELECT t3.alt_bilesen\n  FROM urunler_seviyeler t1\n  JOIN urunler_seviyeler t2 ON t2.parcano = t1.alt_bilesen\n  JOIN urunler_seviyeler t3 ON t3.parcano = t2.alt_bilesen\n WHERE t1.parcano = 'E40'\n
            \n

            Fourth level:

            \n
            SELECT t4.alt_bilesen\n  FROM urunler_seviyeler t1\n  JOIN urunler_seviyeler t2 ON t2.parcano = t1.alt_bilesen\n  JOIN urunler_seviyeler t3 ON t3.parcano = t2.alt_bilesen\n  JOIN urunler_seviyeler t4 ON t4.parcano = t3.alt_bilesen\n WHERE t1.parcano = 'E40'\n
            \n

            It is possible to combine the queries with UNION ALL set operators

            \n
            ( SELECT t1.alt_bilesen\n    FROM urunler_seviyeler t1 \n   WHERE t1.parcano = 'E40'\n)\nUNION ALL\n( SELECT t2.alt_bilesen\n    FROM urunler_seviyeler t1\n    JOIN urunler_seviyeler t2\n      ON t2.parcano = t1.alt_bilesen\n   WHERE t1.parcano = 'E40'\n)\nUNION ALL\n( SELECT t3.alt_bilesen\n   FROM urunler_seviyeler t1\n   JOIN urunler_seviyeler t2 ON t2.parcano = t1.alt_bilesen\n   JOIN urunler_seviyeler t3 ON t3.parcano = t2.alt_bilesen\n  WHERE t1.parcano = 'E40'\n) \nUNION ALL\n( SELECT t4.alt_bilesen\n    FROM urunler_seviyeler t1\n    JOIN urunler_seviyeler t2 ON t2.parcano = t1.alt_bilesen\n    JOIN urunler_seviyeler t3 ON t3.parcano = t2.alt_bilesen\n    JOIN urunler_seviyeler t4 ON t4.parcano = t3.alt_bilesen\n   WHERE t1.parcano = 'E40'\n)\nORDER BY 1\n
            \n
            \n

            It's possible to include an additional "level" column, by including a literal value in each query. To get some ordering by something other than returned column, you'd need to include some additional expressions in each query...

            \n soup wrap:

            MySQL doesn't support "hierarchical" queries. It is possible to emulate that functionality, to a finite number of levels, using multiple queries. The results from the queries can be combined with a UNION ALL operation.

            Getting the rows returned in a particular sequence can be problematic, depending on the actual criteria you need the rows returned.

            First level:

            SELECT t1.alt_bilesen
              FROM urunler_seviyeler t1 
             WHERE t1.parcano = 'E40'
            

            Second level:

            SELECT t2.alt_bilesen
              FROM urunler_seviyeler t1
              JOIN urunler_seviyeler t2
                ON t2.parcano = t1.alt_bilesen
             WHERE t1.parcano = 'E40'
            

            Third level:

            SELECT t3.alt_bilesen
              FROM urunler_seviyeler t1
              JOIN urunler_seviyeler t2 ON t2.parcano = t1.alt_bilesen
              JOIN urunler_seviyeler t3 ON t3.parcano = t2.alt_bilesen
             WHERE t1.parcano = 'E40'
            

            Fourth level:

            SELECT t4.alt_bilesen
              FROM urunler_seviyeler t1
              JOIN urunler_seviyeler t2 ON t2.parcano = t1.alt_bilesen
              JOIN urunler_seviyeler t3 ON t3.parcano = t2.alt_bilesen
              JOIN urunler_seviyeler t4 ON t4.parcano = t3.alt_bilesen
             WHERE t1.parcano = 'E40'
            

            It is possible to combine the queries with UNION ALL set operators

            ( SELECT t1.alt_bilesen
                FROM urunler_seviyeler t1 
               WHERE t1.parcano = 'E40'
            )
            UNION ALL
            ( SELECT t2.alt_bilesen
                FROM urunler_seviyeler t1
                JOIN urunler_seviyeler t2
                  ON t2.parcano = t1.alt_bilesen
               WHERE t1.parcano = 'E40'
            )
            UNION ALL
            ( SELECT t3.alt_bilesen
               FROM urunler_seviyeler t1
               JOIN urunler_seviyeler t2 ON t2.parcano = t1.alt_bilesen
               JOIN urunler_seviyeler t3 ON t3.parcano = t2.alt_bilesen
              WHERE t1.parcano = 'E40'
            ) 
            UNION ALL
            ( SELECT t4.alt_bilesen
                FROM urunler_seviyeler t1
                JOIN urunler_seviyeler t2 ON t2.parcano = t1.alt_bilesen
                JOIN urunler_seviyeler t3 ON t3.parcano = t2.alt_bilesen
                JOIN urunler_seviyeler t4 ON t4.parcano = t3.alt_bilesen
               WHERE t1.parcano = 'E40'
            )
            ORDER BY 1
            

            It's possible to include an additional "level" column, by including a literal value in each query. To get some ordering by something other than returned column, you'd need to include some additional expressions in each query...

            qid & accept id: (28280507, 28285605) query: setup relationship one-to-one in Flask + SQLAlchemy soup:

            Here I basically use your model, but:\n1) changed the name of the FK column\n1) added a relationship (please read Relationship Configuration part of the documentation)

            \n
            class Person(db.Model):\n    id = db.Column(db.Integer, primary_key=True)\n    name = db.Column(db.String(100))\n    # @note: renamed the column, so that can use the name 'region' for\n    # relationship\n    region_id = db.Column(db.Integer, db.ForeignKey('region.id'))\n\n    # define relationship\n    region = db.relationship('Region', backref='people')\n\n\nclass Region(db.Model):\n    id = db.Column(db.Integer, primary_key=True)\n    name = db.Column(db.String(50))\n
            \n

            With this you are able to get the name of the region as below:

            \n
            region_name = my_person.region.name  # navigate a 'relationship' and get its 'name' attribute\n
            \n

            In order to make sure that the region is loaded from the database at the same time as the person is, you can use joinedload option:

            \n
            p = (db.session.query(Person)\n     .options(db.eagerload(Person.region))\n     .get(1)\n     )\n\nprint(p)\n# below will not trigger any more SQL, because `p.region` is already loaded\nprint(p.region.name)\n
            \n soup wrap:

            Here I basically use your model, but: 1) changed the name of the FK column 1) added a relationship (please read Relationship Configuration part of the documentation)

            class Person(db.Model):
                id = db.Column(db.Integer, primary_key=True)
                name = db.Column(db.String(100))
                # @note: renamed the column, so that can use the name 'region' for
                # relationship
                region_id = db.Column(db.Integer, db.ForeignKey('region.id'))
            
                # define relationship
                region = db.relationship('Region', backref='people')
            
            
            class Region(db.Model):
                id = db.Column(db.Integer, primary_key=True)
                name = db.Column(db.String(50))
            

            With this you are able to get the name of the region as below:

            region_name = my_person.region.name  # navigate a 'relationship' and get its 'name' attribute
            

            In order to make sure that the region is loaded from the database at the same time as the person is, you can use joinedload option:

            p = (db.session.query(Person)
                 .options(db.eagerload(Person.region))
                 .get(1)
                 )
            
            print(p)
            # below will not trigger any more SQL, because `p.region` is already loaded
            print(p.region.name)
            
            qid & accept id: (28306700, 28308587) query: Python how to use Counter on objects according to attributes soup:

            There was a subtle bug in my previous answer, and while fixing it I came up with a much simpler and faster way to do things which no longer uses itertools.groupby().

            \n

            The updated code below now features a function designed to do exactly what you want.

            \n
            from collections import Counter\nfrom operator import attrgetter\n\nclass Record(object):\n    def __init__(self, **kwargs):\n        for key, value in kwargs.iteritems():\n             setattr(self, key, value)\n\nrecords = [Record(uid='001', url='www.google.com', status=200),\n           Record(uid='002', url='www.google.com', status=404),\n           Record(uid='339', url='www.ciq.com',    status=200)]\n\ndef count_attr(attr, records):\n    """ Returns Counter keyed by unique values of attr in records sequence. """\n    get_attr_from = attrgetter(attr)\n    return Counter(get_attr_from(r) for r in records)\n\nfor attr in ('status', 'url'):\n    print('{!r:>8}: {}'.format(attr, count_attr(attr, records)))\n
            \n

            Output:

            \n
            'status': Counter({200: 2, 404: 1})\n   'url': Counter({'www.google.com': 2, 'www.ciq.com': 1})\n
            \n soup wrap:

            There was a subtle bug in my previous answer, and while fixing it I came up with a much simpler and faster way to do things which no longer uses itertools.groupby().

            The updated code below now features a function designed to do exactly what you want.

            from collections import Counter
            from operator import attrgetter
            
            class Record(object):
                def __init__(self, **kwargs):
                    for key, value in kwargs.iteritems():
                         setattr(self, key, value)
            
            records = [Record(uid='001', url='www.google.com', status=200),
                       Record(uid='002', url='www.google.com', status=404),
                       Record(uid='339', url='www.ciq.com',    status=200)]
            
            def count_attr(attr, records):
                """ Returns Counter keyed by unique values of attr in records sequence. """
                get_attr_from = attrgetter(attr)
                return Counter(get_attr_from(r) for r in records)
            
            for attr in ('status', 'url'):
                print('{!r:>8}: {}'.format(attr, count_attr(attr, records)))
            

            Output:

            'status': Counter({200: 2, 404: 1})
               'url': Counter({'www.google.com': 2, 'www.ciq.com': 1})
            
            qid & accept id: (28330317, 28330410) query: Print timestamp for logging in Python soup:

            Something like the below would do:

            \n
            formatter = logging.Formatter(fmt='%(asctime)s %(levelname)-8s %(message)s',\n                              datefmt='%Y-%m-%d %H:%M:%S')\n
            \n

            Have a look at the logging module for Python. You don't need to mess about with creating your own date, just let the logging module do it for you. That formatter object can be applied to a logging handler so you can just log with logger.info('This is an info message.'). No print statements required.

            \n

            Here's a boilerplate procedure I use:

            \n
            import logging\n\ndef setup_custom_logger(name):\n    formatter = logging.Formatter(fmt='%(asctime)s %(levelname)-8s %(message)s',\n                                  datefmt='%Y-%m-%d %H:%M:%S')\n    handler = logging.FileHandler('log.txt', mode='w')\n    handler.setFormatter(formatter)\n    screen_handler = logging.StreamHandler(stream=sys.stdout)\n    screen_handler.setFormatter(formatter)\n    logger = logging.getLogger(name)\n    logger.setLevel(logging.DEBUG)\n    logger.addHandler(handler)\n    logger.addHandler(screen_handler)\n    return logger\n\n>>> logger = setup_custom_logger('myapp')\n>>> logger.info('This is a message!')\n2015-02-04 15:07:12 INFO     This is a message!\n>>> logger.error('Here is another')\n2015-02-04 15:07:30 ERROR    Here is another\n
            \n soup wrap:

            Something like the below would do:

            formatter = logging.Formatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
                                          datefmt='%Y-%m-%d %H:%M:%S')
            

            Have a look at the logging module for Python. You don't need to mess about with creating your own date, just let the logging module do it for you. That formatter object can be applied to a logging handler so you can just log with logger.info('This is an info message.'). No print statements required.

            Here's a boilerplate procedure I use:

            import logging
            
            def setup_custom_logger(name):
                formatter = logging.Formatter(fmt='%(asctime)s %(levelname)-8s %(message)s',
                                              datefmt='%Y-%m-%d %H:%M:%S')
                handler = logging.FileHandler('log.txt', mode='w')
                handler.setFormatter(formatter)
                screen_handler = logging.StreamHandler(stream=sys.stdout)
                screen_handler.setFormatter(formatter)
                logger = logging.getLogger(name)
                logger.setLevel(logging.DEBUG)
                logger.addHandler(handler)
                logger.addHandler(screen_handler)
                return logger
            
            >>> logger = setup_custom_logger('myapp')
            >>> logger.info('This is a message!')
            2015-02-04 15:07:12 INFO     This is a message!
            >>> logger.error('Here is another')
            2015-02-04 15:07:30 ERROR    Here is another
            
            qid & accept id: (28332217, 28333886) query: Solve Lotka-Volterra model using SciPy soup:

            The comment made by @WarrenWeckesser is a very good one, you should start there. I'll merely try to highlight the differences between the implicit plot and the explicit plot.

            \n

            First, the setup:

            \n
            import numpy as np\nfrom scipy import integrate\nimport matplotlib.pyplot as plt\n\ntime=np.linspace(0,15,5*1024)\n\ndef derivN(N, t):\n    """Return the derivative of the vector N, which represents\n    the tuple (N1, N2). """\n\n    N1, N2  = N\n    return np.array([N1*(1 - N1 - .7*N2), N2*(1 - N2 - .3*N1)])\n\ndef coupled(time, init, ax):\n    """Visualize the system of coupled equations, by passing a timerange and\n    initial conditions for the coupled equations.\n\n    The initical condition is the value that (N1, N2) will assume at the first\n    timestep. """\n\n    N = integrate.odeint(derivN, init, time)\n    ax[0].plot(N[:,0], N[:,1], label='[{:.1f}, {:.1f}]'.format(*init))  # plots N2 vs N1, with time as an implicit parameter\n    l1, = ax[1].plot(time, N[:,0], label='[{:.1f}, {:.1f}]'.format(*init))\n    ax[1].plot(time, N[:,1], color=l1.get_color())\n
            \n

            It is important to realize that your equations are coupled and you should present to odeint a function that returns the derivative of your coupled equations. Since you have 2 equations, you need to return an array of length 2, each item representing the derivative in terms of the passed in variable (which in this case is the array N(t) = [N1(t), N2(t)]).

            \n

            Then you can plot it at all, using different initial conditions for N1 and N2:

            \n
            fh, ax = plt.subplots(1,2)\ncoupled(time, [.3, 1/.7], ax)\ncoupled(time, [.4, 1/.7], ax)\ncoupled(time, [1/.7, .3], ax)\ncoupled(time, [.5, .5], ax)\ncoupled(time, [.1, .1], ax)\nax[0].legend()\nax[1].legend()\nax[0].set_xlabel('N1')\nax[0].set_ylabel('N2')\nax[1].set_xlabel('time')\nax[1].set_ylabel(r'$N_i$')\nax[0].set_title('implicit')\nax[1].set_title('explicit (i.e. vs independant variable time)')\nplt.show()\n
            \n

            time evolution of coupled equations

            \n

            You'll notice that both N1 and N2 evolve to some final value, but that both values are different. The curves in the implicit plot do not intersect for the given equations.

            \n soup wrap:

            The comment made by @WarrenWeckesser is a very good one, you should start there. I'll merely try to highlight the differences between the implicit plot and the explicit plot.

            First, the setup:

            import numpy as np
            from scipy import integrate
            import matplotlib.pyplot as plt
            
            time=np.linspace(0,15,5*1024)
            
            def derivN(N, t):
                """Return the derivative of the vector N, which represents
                the tuple (N1, N2). """
            
                N1, N2  = N
                return np.array([N1*(1 - N1 - .7*N2), N2*(1 - N2 - .3*N1)])
            
            def coupled(time, init, ax):
                """Visualize the system of coupled equations, by passing a timerange and
                initial conditions for the coupled equations.
            
                The initical condition is the value that (N1, N2) will assume at the first
                timestep. """
            
                N = integrate.odeint(derivN, init, time)
                ax[0].plot(N[:,0], N[:,1], label='[{:.1f}, {:.1f}]'.format(*init))  # plots N2 vs N1, with time as an implicit parameter
                l1, = ax[1].plot(time, N[:,0], label='[{:.1f}, {:.1f}]'.format(*init))
                ax[1].plot(time, N[:,1], color=l1.get_color())
            

            It is important to realize that your equations are coupled and you should present to odeint a function that returns the derivative of your coupled equations. Since you have 2 equations, you need to return an array of length 2, each item representing the derivative in terms of the passed in variable (which in this case is the array N(t) = [N1(t), N2(t)]).

            Then you can plot it at all, using different initial conditions for N1 and N2:

            fh, ax = plt.subplots(1,2)
            coupled(time, [.3, 1/.7], ax)
            coupled(time, [.4, 1/.7], ax)
            coupled(time, [1/.7, .3], ax)
            coupled(time, [.5, .5], ax)
            coupled(time, [.1, .1], ax)
            ax[0].legend()
            ax[1].legend()
            ax[0].set_xlabel('N1')
            ax[0].set_ylabel('N2')
            ax[1].set_xlabel('time')
            ax[1].set_ylabel(r'$N_i$')
            ax[0].set_title('implicit')
            ax[1].set_title('explicit (i.e. vs independant variable time)')
            plt.show()
            

            time evolution of coupled equations

            You'll notice that both N1 and N2 evolve to some final value, but that both values are different. The curves in the implicit plot do not intersect for the given equations.

            qid & accept id: (28348300, 28348441) query: Is there a way to create a subplot that contains plots created inside functions? soup:

            You will make life much easier for yourself if you make the function "single responsibility" and let it return the result and do the plotting from the outside:

            \n
            def random(x):\n    variable_x = x\n    return result = f(x)\n\nresult = random(x)\nplt.plot(result, x)\nplt.show()\n
            \n

            You will also be able to test these more easily. \n
            \nIf you are determined to do the plotting inside, you could pass in a plot function:

            \n
            def random(x, show):\n    variable_x = x\n    result = f(x)\n    show(result)\n\ndef show(result):\n    plt.plot(result, x)\n    plt.show()\n\nresult = random(x, show)\n
            \n

            This would allow you to control which function shows where.

            \n soup wrap:

            You will make life much easier for yourself if you make the function "single responsibility" and let it return the result and do the plotting from the outside:

            def random(x):
                variable_x = x
                return result = f(x)
            
            result = random(x)
            plt.plot(result, x)
            plt.show()
            

            You will also be able to test these more easily.
            If you are determined to do the plotting inside, you could pass in a plot function:

            def random(x, show):
                variable_x = x
                result = f(x)
                show(result)
            
            def show(result):
                plt.plot(result, x)
                plt.show()
            
            result = random(x, show)
            

            This would allow you to control which function shows where.

            qid & accept id: (28348485, 28381534) query: How to match integers in NLTK CFG? soup:

            Create a number phrase as such:

            \n
            import nltk\n\ngroucho_grammar = nltk.CFG.fromstring("""\nS -> NP VP\nPP -> P NP\nNP -> Det N | Det N PP | 'I' | NUM N\nVP -> V NP | VP PP\nDet -> 'an' | 'my'\nN -> 'elephant' | 'pajamas' | 'elephants'\nV -> 'shot'\nP -> 'in'\nNUM -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '10'\n""")\n\nsent = 'I shot 3 elephants'.split()\nparser = nltk.ChartParser(groucho_grammar)\nfor tree in parser.parse(sent):\n    print(tree)\n
            \n

            [out]:

            \n
            (S (NP I) (VP (V shot) (NP (NUM 3) (N elephants))))\n
            \n

            But note that that can only handle single digit number. So let's try compressing integers into a single token-type, e.g. '#NUM#':

            \n
            import nltk\n\ngroucho_grammar = nltk.CFG.fromstring("""\nS -> NP VP\nPP -> P NP\nNP -> Det N | Det N PP | 'I' | NUM N\nVP -> V NP | VP PP\nDet -> 'an' | 'my'\nN -> 'elephant' | 'pajamas' | 'elephants'\nV -> 'shot'\nP -> 'in'\nNUM -> '#NUM#'\n""")\n\nsent = 'I shot 333 elephants'.split()\nsent = ['#NUM#' if i.isdigit() else i for i in sent]\n\nparser = nltk.ChartParser(groucho_grammar)\nfor tree in parser.parse(sent):\n    print(tree)\n
            \n

            [out]:

            \n
            (S (NP I) (VP (V shot) (NP (NUM #NUM#) (N elephants))))\n
            \n

            To put the numbers back, try:

            \n
            import nltk\n\ngroucho_grammar = nltk.CFG.fromstring("""\nS -> NP VP\nPP -> P NP\nNP -> Det N | Det N PP | 'I' | NUM N\nVP -> V NP | VP PP\nDet -> 'an' | 'my'\nN -> 'elephant' | 'pajamas' | 'elephants'\nV -> 'shot'\nP -> 'in'\nNUM -> '#NUM#'\n""")\n\noriginal_sent = 'I shot 333 elephants'.split()\nsent = ['#NUM#' if i.isdigit() else i for i in original_sent]\nnumbers = [i for i in original_sent if i.isdigit()]\n\nparser = nltk.ChartParser(groucho_grammar)\nfor tree in parser.parse(sent):\n    treestr = str(tree)\n    for n in numbers:\n        treestr = treestr.replace('#NUM#', n, 1)\n    print(treestr)\n
            \n

            [out]:

            \n
            (S (NP I) (VP (V shot) (NP (NUM 333) (N elephants))))\n
            \n soup wrap:

            Create a number phrase as such:

            import nltk
            
            groucho_grammar = nltk.CFG.fromstring("""
            S -> NP VP
            PP -> P NP
            NP -> Det N | Det N PP | 'I' | NUM N
            VP -> V NP | VP PP
            Det -> 'an' | 'my'
            N -> 'elephant' | 'pajamas' | 'elephants'
            V -> 'shot'
            P -> 'in'
            NUM -> '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | '10'
            """)
            
            sent = 'I shot 3 elephants'.split()
            parser = nltk.ChartParser(groucho_grammar)
            for tree in parser.parse(sent):
                print(tree)
            

            [out]:

            (S (NP I) (VP (V shot) (NP (NUM 3) (N elephants))))
            

            But note that that can only handle single digit number. So let's try compressing integers into a single token-type, e.g. '#NUM#':

            import nltk
            
            groucho_grammar = nltk.CFG.fromstring("""
            S -> NP VP
            PP -> P NP
            NP -> Det N | Det N PP | 'I' | NUM N
            VP -> V NP | VP PP
            Det -> 'an' | 'my'
            N -> 'elephant' | 'pajamas' | 'elephants'
            V -> 'shot'
            P -> 'in'
            NUM -> '#NUM#'
            """)
            
            sent = 'I shot 333 elephants'.split()
            sent = ['#NUM#' if i.isdigit() else i for i in sent]
            
            parser = nltk.ChartParser(groucho_grammar)
            for tree in parser.parse(sent):
                print(tree)
            

            [out]:

            (S (NP I) (VP (V shot) (NP (NUM #NUM#) (N elephants))))
            

            To put the numbers back, try:

            import nltk
            
            groucho_grammar = nltk.CFG.fromstring("""
            S -> NP VP
            PP -> P NP
            NP -> Det N | Det N PP | 'I' | NUM N
            VP -> V NP | VP PP
            Det -> 'an' | 'my'
            N -> 'elephant' | 'pajamas' | 'elephants'
            V -> 'shot'
            P -> 'in'
            NUM -> '#NUM#'
            """)
            
            original_sent = 'I shot 333 elephants'.split()
            sent = ['#NUM#' if i.isdigit() else i for i in original_sent]
            numbers = [i for i in original_sent if i.isdigit()]
            
            parser = nltk.ChartParser(groucho_grammar)
            for tree in parser.parse(sent):
                treestr = str(tree)
                for n in numbers:
                    treestr = treestr.replace('#NUM#', n, 1)
                print(treestr)
            

            [out]:

            (S (NP I) (VP (V shot) (NP (NUM 333) (N elephants))))
            
            qid & accept id: (28348838, 28348881) query: Python replace year mentions like '85 with 1985 soup:

            Make the string a raw string

            \n
            >>> re.sub(r"'(\d\d)", r"19\1", "Today '45")\n'Today 1945'\n
            \n

            Or as Avinash suggests, Use word boundaries \b. They are better as they would help you ignore digits that are not two digits, like 3456

            \n
            >>> re.sub(r"'(\d{2})\b", r"19\1", "Today '45, '3456")\n"Today 1945, '3456"\n
            \n soup wrap:

            Make the string a raw string

            >>> re.sub(r"'(\d\d)", r"19\1", "Today '45")
            'Today 1945'
            

            Or as Avinash suggests, Use word boundaries \b. They are better as they would help you ignore digits that are not two digits, like 3456

            >>> re.sub(r"'(\d{2})\b", r"19\1", "Today '45, '3456")
            "Today 1945, '3456"
            
            qid & accept id: (28356451, 28356527) query: How to request a File in Google Drive soup:

            You have to authenticate your request. I would recommend using Google's python client to remove a lot of the boilerplate:

            \n

            pip install --upgrade google-api-python-client

            \n

            From the docs:

            \n
            from apiclient.discovery import build\n\ndef build_service(credentials):\n  http = httplib2.Http()\n  http = credentials.authorize(http)\n  return build('drive', 'v2', http=http)\n
            \n

            Then use the service:

            \n
            from apiclient import errors\ntry:\n  service = build_service(### Credentials here ###)\n  file = service.files().get(fileId=file_id).execute()\n\n  print 'Title: %s' % file['title']\n  print 'Description: %s' % file['description']\n  print 'MIME type: %s' % file['mimeType']\nexcept errors.HttpError, error:\n  if error.resp.status == 401:\n    # Credentials have been revoked.\n    # TODO: Redirect the user to the authorization URL.\n    raise NotImplementedError()\n
            \n soup wrap:

            You have to authenticate your request. I would recommend using Google's python client to remove a lot of the boilerplate:

            pip install --upgrade google-api-python-client

            From the docs:

            from apiclient.discovery import build
            
            def build_service(credentials):
              http = httplib2.Http()
              http = credentials.authorize(http)
              return build('drive', 'v2', http=http)
            

            Then use the service:

            from apiclient import errors
            try:
              service = build_service(### Credentials here ###)
              file = service.files().get(fileId=file_id).execute()
            
              print 'Title: %s' % file['title']
              print 'Description: %s' % file['description']
              print 'MIME type: %s' % file['mimeType']
            except errors.HttpError, error:
              if error.resp.status == 401:
                # Credentials have been revoked.
                # TODO: Redirect the user to the authorization URL.
                raise NotImplementedError()
            
            qid & accept id: (28364676, 28374564) query: Timeout function in Python soup:
            \n

            signal is not Windows compatible.

            \n
            \n

            You can send some signals on Windows e.g.:

            \n
            os.kill(os.getpid(), signal.CTRL_C_EVENT) # send Ctrl+C to itself\n
            \n

            You could use threading.Timer to call a function at a later time:

            \n
            from threading import Timer\n\ndef kill_yourself(delay):\n    t = Timer(delay, kill_yourself_now)\n    t.daemon = True # no need to kill yourself if we're already dead\n    t.start()\n
            \n

            where kill_yourself_now():

            \n
            import os\nimport signal\nimport sys\n\ndef kill_yourself_now():\n    sig = signal.CTRL_C_EVENT if sys.platform == 'win32' else signal.SIGINT\n    os.kill(os.getpid(), sig) # raise KeyboardInterrupt in the main thread\n
            \n

            If your scripts starts other processes then see: how to kill child process(es) when parent dies? See also, How to terminate a python subprocess launched with shell=True -- it demonstrates how to kill a process tree.

            \n soup wrap:

            signal is not Windows compatible.

            You can send some signals on Windows e.g.:

            os.kill(os.getpid(), signal.CTRL_C_EVENT) # send Ctrl+C to itself
            

            You could use threading.Timer to call a function at a later time:

            from threading import Timer
            
            def kill_yourself(delay):
                t = Timer(delay, kill_yourself_now)
                t.daemon = True # no need to kill yourself if we're already dead
                t.start()
            

            where kill_yourself_now():

            import os
            import signal
            import sys
            
            def kill_yourself_now():
                sig = signal.CTRL_C_EVENT if sys.platform == 'win32' else signal.SIGINT
                os.kill(os.getpid(), sig) # raise KeyboardInterrupt in the main thread
            

            If your scripts starts other processes then see: how to kill child process(es) when parent dies? See also, How to terminate a python subprocess launched with shell=True -- it demonstrates how to kill a process tree.

            qid & accept id: (28387109, 28387786) query: How could I delete certain columns then write wanted columns into csv python soup:

            Assuming the csv files orig.csv:

            \n
            ID,Name,Nickname,Income,Car\n1,A,test,12k,Benz\n2,B,test1,23k,Audi\n3,C,test2,20k,BMW\n
            \n

            and remove_list.csv:

            \n
            Nickname\nCar\n
            \n

            we can do something like this to filter:

            \n
            def remove_cols():\n    remove_list = []\n\n    with open('remove_list.csv') as f:\n        for line in f:\n            remove_list.append(line.strip())\n\n    colIndexesToKeep = []\n\n    with open('orig.csv') as origFile:\n        with open('filtered.csv', 'w') as filteredFile:\n            for line in origFile:\n                if not colIndexesToKeep:\n                    for ix, name in enumerate(line.split(',')):\n                        if name.strip() not in remove_list:\n                            colIndexesToKeep.append(ix)\n\n                filteredLine = [val.strip() for ix, val in \n                  enumerate(line.split(',')) if ix in colIndexesToKeep]\n                filteredFile.write(','.join(filteredLine))     \n                filteredFile.write('\n')           \n
            \n

            which gives the output filtered.csv:

            \n
            ID,Name,Income\n1,A,12k\n2,B,23k\n3,C,20k\n
            \n soup wrap:

            Assuming the csv files orig.csv:

            ID,Name,Nickname,Income,Car
            1,A,test,12k,Benz
            2,B,test1,23k,Audi
            3,C,test2,20k,BMW
            

            and remove_list.csv:

            Nickname
            Car
            

            we can do something like this to filter:

            def remove_cols():
                remove_list = []
            
                with open('remove_list.csv') as f:
                    for line in f:
                        remove_list.append(line.strip())
            
                colIndexesToKeep = []
            
                with open('orig.csv') as origFile:
                    with open('filtered.csv', 'w') as filteredFile:
                        for line in origFile:
                            if not colIndexesToKeep:
                                for ix, name in enumerate(line.split(',')):
                                    if name.strip() not in remove_list:
                                        colIndexesToKeep.append(ix)
            
                            filteredLine = [val.strip() for ix, val in 
                              enumerate(line.split(',')) if ix in colIndexesToKeep]
                            filteredFile.write(','.join(filteredLine))     
                            filteredFile.write('\n')           
            

            which gives the output filtered.csv:

            ID,Name,Income
            1,A,12k
            2,B,23k
            3,C,20k
            
            qid & accept id: (28404183, 28404302) query: Numerical value of a name soup:

            The following code does what you need:

            \n
            def main():\n    """To print a names numeric value"""\n    name = input("Enter your full name here: ")\n    return sum(ord(character) - 96 for character in name.lower() if character != " ")\n
            \n

            Examples:

            \n
            >>> main()\nEnter your full name here: a\n1\n>>> main()\nEnter your full name here: abc\n6\n>>> main()\nEnter your full name here: a b          c\n6\n
            \n

            Discussion

            \n

            The body of the code was reduced to:

            \n
            sum(ord(character) - 96 for character in name.lower() if character != " ")\n
            \n

            Taken one piece at a time:

            \n
              \n
            • sum(...)

              \n

              This sums the contents.

            • \n
            • ord(character) - 96

              \n

              This produces the number that you want.

            • \n
            • for character in name.lower()

              \n

              This loops over each character in the lower case version of name.

            • \n
            • if character != " "

              \n

              This restricts the loop to those characters which are not blanks.

            • \n
            \n

            Alternative

            \n

            A slightly different way of accomplishing the same thing:

            \n
            def main():\n    """To print a names numeric value"""\n    name = input("Enter your full name here: ")\n    return sum( ord(c) - 96 for c in name.replace(' ', '').lower())\n
            \n

            This eliminates the blanks with replace(' ', '') instead of withif character != " "`. Otherwise, it works the same way.

            \n soup wrap:

            The following code does what you need:

            def main():
                """To print a names numeric value"""
                name = input("Enter your full name here: ")
                return sum(ord(character) - 96 for character in name.lower() if character != " ")
            

            Examples:

            >>> main()
            Enter your full name here: a
            1
            >>> main()
            Enter your full name here: abc
            6
            >>> main()
            Enter your full name here: a b          c
            6
            

            Discussion

            The body of the code was reduced to:

            sum(ord(character) - 96 for character in name.lower() if character != " ")
            

            Taken one piece at a time:

            • sum(...)

              This sums the contents.

            • ord(character) - 96

              This produces the number that you want.

            • for character in name.lower()

              This loops over each character in the lower case version of name.

            • if character != " "

              This restricts the loop to those characters which are not blanks.

            Alternative

            A slightly different way of accomplishing the same thing:

            def main():
                """To print a names numeric value"""
                name = input("Enter your full name here: ")
                return sum( ord(c) - 96 for c in name.replace(' ', '').lower())
            

            This eliminates the blanks with replace(' ', '') instead of withif character != " "`. Otherwise, it works the same way.

            qid & accept id: (28416559, 28419229) query: Fill scipy / numpy matrix based on indices and values soup:

            Here's an approach that I expect to work for you. It's a stretch on my machine -- even storing two copies of the voxel adjacency matrix (using dtype=bool) pushes my (somewhat old) desktop right to the edge of its memory capacity. But I'm assuming that you have a machine capable of handling at least two (300 * 100) ** 2 = 900 MB arrays -- otherwise, you would probably have run into problems before this stage. It takes my desktop about 30 minutes to process 30000 voxels.

            \n

            This assumes that voxel_communities is a simple array containing a community label for each voxel at index i. It sounds like you can generate that pretty quickly. It also assumes that voxels are present in only one node.

            \n
            def voxel_adjacency(voxel_communities):\n    n_voxels = voxel_communities.size\n    comm_labels = sorted(set(voxel_communities))\n    comm_counts = [(voxel_communities == l).sum() for l in comm_labels]\n\n    blocks = numpy.zeros((n_voxels, n_voxels), dtype=bool)\n    start = 0\n    for c in comm_counts:\n        blocks[start:start + c, start:start + c] = 1\n        start += c\n\n    ix = numpy.empty_like(voxel_communities)\n    ix[voxel_communities.argsort()] = numpy.arange(n_voxels)\n    blocks[:] = blocks[ix,:]\n    blocks[:] = blocks[:,ix]\n    return blocks\n
            \n

            Here's a quick explanation. This uses an inverse indexing trick to reorder the columns and rows of an array of diagonal blocks into the desired matrix.

            \n
                n_voxels = voxel_communities.size\n    comm_labels = sorted(set(voxel_communities))\n    comm_counts = [(voxel_communities == l).sum() for l in comm_labels]\n\n    blocks = numpy.zeros((n_voxels, n_voxels), dtype=bool)\n    start = 0\n    for c in comm_counts:\n        blocks[start:start + c, start:start + c] = 1\n        start += c\n
            \n

            These lines are used to construct the initial block matrix. So for example, say you have six voxels and three communities, and each community contains two voxels. Then the initial block matrix will look like this:

            \n
            array([[ True,  True, False, False, False, False],\n       [ True,  True, False, False, False, False],\n       [False, False,  True,  True, False, False],\n       [False, False,  True,  True, False, False],\n       [False, False, False, False,  True,  True],\n       [False, False, False, False,  True,  True]], dtype=bool)\n
            \n

            This is essentially the same as the desired adjacency matrix after the voxels have been sorted by community membership. So we need to reverse that sorting. We do so by constructing an inverse argsort array.

            \n
                ix = numpy.empty_like(voxel_communities)\n    ix[voxel_communities.argsort()] = numpy.arange(n_voxels)\n
            \n

            Now ix will reverse the sorting process when used as an index. And since this is a symmetric matrix, we can perform the reverse sorting operation separately on columns and then on rows:

            \n
                blocks[:] = blocks[ix,:]\n    blocks[:] = blocks[:,ix]\n    return blocks\n
            \n

            Here's an example of the result it generates for a small input:

            \n
            >>> voxel_adjacency(numpy.array([0, 3, 1, 1, 0, 2]))\narray([[ True, False, False, False,  True, False],\n       [False,  True, False, False, False, False],\n       [False, False,  True,  True, False, False],\n       [False, False,  True,  True, False, False],\n       [ True, False, False, False,  True, False],\n       [False, False, False, False, False,  True]], dtype=bool)\n
            \n

            It seems to me that this does something quite similar to voxel_matrix[np.ix_(voxels1, voxels2)] = 1 as suggested by pv., except it does it all at once, instead of tracking each possible combination of nodes.

            \n

            There may be a better solution, but this should at least be an improvement.

            \n

            Also, note that if you can simply accept the new ordering of voxels as canonical, then this solution becomes as simple as creating the block array! That takes all of about 300 milliseconds.

            \n soup wrap:

            Here's an approach that I expect to work for you. It's a stretch on my machine -- even storing two copies of the voxel adjacency matrix (using dtype=bool) pushes my (somewhat old) desktop right to the edge of its memory capacity. But I'm assuming that you have a machine capable of handling at least two (300 * 100) ** 2 = 900 MB arrays -- otherwise, you would probably have run into problems before this stage. It takes my desktop about 30 minutes to process 30000 voxels.

            This assumes that voxel_communities is a simple array containing a community label for each voxel at index i. It sounds like you can generate that pretty quickly. It also assumes that voxels are present in only one node.

            def voxel_adjacency(voxel_communities):
                n_voxels = voxel_communities.size
                comm_labels = sorted(set(voxel_communities))
                comm_counts = [(voxel_communities == l).sum() for l in comm_labels]
            
                blocks = numpy.zeros((n_voxels, n_voxels), dtype=bool)
                start = 0
                for c in comm_counts:
                    blocks[start:start + c, start:start + c] = 1
                    start += c
            
                ix = numpy.empty_like(voxel_communities)
                ix[voxel_communities.argsort()] = numpy.arange(n_voxels)
                blocks[:] = blocks[ix,:]
                blocks[:] = blocks[:,ix]
                return blocks
            

            Here's a quick explanation. This uses an inverse indexing trick to reorder the columns and rows of an array of diagonal blocks into the desired matrix.

                n_voxels = voxel_communities.size
                comm_labels = sorted(set(voxel_communities))
                comm_counts = [(voxel_communities == l).sum() for l in comm_labels]
            
                blocks = numpy.zeros((n_voxels, n_voxels), dtype=bool)
                start = 0
                for c in comm_counts:
                    blocks[start:start + c, start:start + c] = 1
                    start += c
            

            These lines are used to construct the initial block matrix. So for example, say you have six voxels and three communities, and each community contains two voxels. Then the initial block matrix will look like this:

            array([[ True,  True, False, False, False, False],
                   [ True,  True, False, False, False, False],
                   [False, False,  True,  True, False, False],
                   [False, False,  True,  True, False, False],
                   [False, False, False, False,  True,  True],
                   [False, False, False, False,  True,  True]], dtype=bool)
            

            This is essentially the same as the desired adjacency matrix after the voxels have been sorted by community membership. So we need to reverse that sorting. We do so by constructing an inverse argsort array.

                ix = numpy.empty_like(voxel_communities)
                ix[voxel_communities.argsort()] = numpy.arange(n_voxels)
            

            Now ix will reverse the sorting process when used as an index. And since this is a symmetric matrix, we can perform the reverse sorting operation separately on columns and then on rows:

                blocks[:] = blocks[ix,:]
                blocks[:] = blocks[:,ix]
                return blocks
            

            Here's an example of the result it generates for a small input:

            >>> voxel_adjacency(numpy.array([0, 3, 1, 1, 0, 2]))
            array([[ True, False, False, False,  True, False],
                   [False,  True, False, False, False, False],
                   [False, False,  True,  True, False, False],
                   [False, False,  True,  True, False, False],
                   [ True, False, False, False,  True, False],
                   [False, False, False, False, False,  True]], dtype=bool)
            

            It seems to me that this does something quite similar to voxel_matrix[np.ix_(voxels1, voxels2)] = 1 as suggested by pv., except it does it all at once, instead of tracking each possible combination of nodes.

            There may be a better solution, but this should at least be an improvement.

            Also, note that if you can simply accept the new ordering of voxels as canonical, then this solution becomes as simple as creating the block array! That takes all of about 300 milliseconds.

            qid & accept id: (28416678, 28416994) query: Python - Replacing value of a row in a CSV file soup:

            Here's something to get you going in the right direction.

            \n
            with open('path/to/filename') as filehandler_name:\n    # this is how you open a file for reading\n\nwith open('path/to/filename', 'w') as filehandler_name:\n    # this is how you open a file for (over)writing\n    # note the 'w' argument to the open built-in\n\nimport csv\n# this is the module that handles csv files\n\nreader = csv.reader(filehandler_name)\n# this is how you create a csv.reader object\nwriter = csv.writer(filehandler_name)\n# this is how you create a csv.writer object\n\nfor line in reader:\n    # this is how you read a csv.reader object line by line\n    # each line is effectively a list of the fields in that line\n    # of the file.\n    # # XXXX-XXXX, 0 --> ['XXXX-XXXX', '0']\n
            \n

            For small files, you could do something like:

            \n
            import csv\n\nwith open('path/to/filename') as inf:\n    reader = csv.reader(inf.readlines())\n\nwith open('path/to/filename', 'w') as outf:\n    writer = csv.writer(outf)\n    for line in reader:\n        if line[1] == '0':\n            writer.writerow([line[0], '1')\n            break\n        else:\n            writer.writerow(line)\n    writer.writerows(reader)\n
            \n

            For large files that inf.readlines will kill your memory allocation since it's pulling the whole file into memory at once, and you should do something like:

            \n
            import csv, os\n\nwith open('path/to/filename') as inf, open('path/to/filename_temp', 'w') as outf:\n    reader = csv.reader(inf)\n    writer = csv.writer(outf)\n    for line in reader:\n        if line[1] == '0':\n           ...\n        ... # as above\n\nos.remove('path/to/filename')\nos.rename('path/to/filename_temp', 'path/to/filename')\n
            \n soup wrap:

            Here's something to get you going in the right direction.

            with open('path/to/filename') as filehandler_name:
                # this is how you open a file for reading
            
            with open('path/to/filename', 'w') as filehandler_name:
                # this is how you open a file for (over)writing
                # note the 'w' argument to the open built-in
            
            import csv
            # this is the module that handles csv files
            
            reader = csv.reader(filehandler_name)
            # this is how you create a csv.reader object
            writer = csv.writer(filehandler_name)
            # this is how you create a csv.writer object
            
            for line in reader:
                # this is how you read a csv.reader object line by line
                # each line is effectively a list of the fields in that line
                # of the file.
                # # XXXX-XXXX, 0 --> ['XXXX-XXXX', '0']
            

            For small files, you could do something like:

            import csv
            
            with open('path/to/filename') as inf:
                reader = csv.reader(inf.readlines())
            
            with open('path/to/filename', 'w') as outf:
                writer = csv.writer(outf)
                for line in reader:
                    if line[1] == '0':
                        writer.writerow([line[0], '1')
                        break
                    else:
                        writer.writerow(line)
                writer.writerows(reader)
            

            For large files that inf.readlines will kill your memory allocation since it's pulling the whole file into memory at once, and you should do something like:

            import csv, os
            
            with open('path/to/filename') as inf, open('path/to/filename_temp', 'w') as outf:
                reader = csv.reader(inf)
                writer = csv.writer(outf)
                for line in reader:
                    if line[1] == '0':
                       ...
                    ... # as above
            
            os.remove('path/to/filename')
            os.rename('path/to/filename_temp', 'path/to/filename')
            
            qid & accept id: (28417585, 28418925) query: Detagging with regex does not catch nested tags soup:

            If you're using regex package, this recursive pattern could work:

            \n
            <(?:[^><]|(?R))*>\n
            \n

            At (?R) or (?0) the pattern is pasted from start. See test at regex101.com

            \n
            \n

            Added by @noshelter: Based on this information, the function could be adjusted as follows...

            \n
            def detag(text,opentag='<',closetag='>'):\n    t1 = regex.escape(opentag)\n    t2 = regex.escape(closetag)\n    re = regex.compile(t1 + '(?:[^' + t2 + t1 + ']|(?R))*' + t2)\n    result = re.sub('',text)\n    return result\n
            \n soup wrap:

            If you're using regex package, this recursive pattern could work:

            <(?:[^><]|(?R))*>
            

            At (?R) or (?0) the pattern is pasted from start. See test at regex101.com


            Added by @noshelter: Based on this information, the function could be adjusted as follows...

            def detag(text,opentag='<',closetag='>'):
                t1 = regex.escape(opentag)
                t2 = regex.escape(closetag)
                re = regex.compile(t1 + '(?:[^' + t2 + t1 + ']|(?R))*' + t2)
                result = re.sub('',text)
                return result
            
            qid & accept id: (28418901, 28422702) query: How to do a rolling aggregation of data week wise in python? soup:

            See this answer:\nCumulative sum and percentage on column?

            \n

            and this:\nhttp://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors\nand this:\nhttp://pandas.pydata.org/pandas-docs/stable/groupby.html

            \n

            Updated for Weekly Cumulative:

            \n
            df = pd.DataFrame(data)\ndf.columns = ['Date','Profit']\ndf['Date'] = pd.to_datetime(df['Date'])\ndf['weekofyear'] = df['Date'].dt.weekofyear\ndf.reset_index('Date')\ndf.sort_index(inplace=True)\ndf['Weekly_Cum'] = df.groupby('weekofyear').cumsum()\n
            \n

            Output:

            \n
                     Date  Profit  weekofyear  Weekly_Cum\n0  2013-06-21      14          25          14\n1  2013-06-22      19          25          33\n2  2013-06-23      11          25          44\n3  2013-06-24      13          26          13\n4  2013-06-25       6          26          19\n5  2013-06-26      22          26          41\n6  2013-06-27      22          26          63\n7  2013-06-28       3          26          66\n8  2013-06-29       5          26          71\n9  2013-06-30      10          26          81\n10 2013-07-01      17          27          17\n11 2013-07-02      14          27          31\n12 2013-07-03       9          27          40\n13 2013-07-04       7          27          47\n
            \n soup wrap:

            See this answer: Cumulative sum and percentage on column?

            and this: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors and this: http://pandas.pydata.org/pandas-docs/stable/groupby.html

            Updated for Weekly Cumulative:

            df = pd.DataFrame(data)
            df.columns = ['Date','Profit']
            df['Date'] = pd.to_datetime(df['Date'])
            df['weekofyear'] = df['Date'].dt.weekofyear
            df.reset_index('Date')
            df.sort_index(inplace=True)
            df['Weekly_Cum'] = df.groupby('weekofyear').cumsum()
            

            Output:

                     Date  Profit  weekofyear  Weekly_Cum
            0  2013-06-21      14          25          14
            1  2013-06-22      19          25          33
            2  2013-06-23      11          25          44
            3  2013-06-24      13          26          13
            4  2013-06-25       6          26          19
            5  2013-06-26      22          26          41
            6  2013-06-27      22          26          63
            7  2013-06-28       3          26          66
            8  2013-06-29       5          26          71
            9  2013-06-30      10          26          81
            10 2013-07-01      17          27          17
            11 2013-07-02      14          27          31
            12 2013-07-03       9          27          40
            13 2013-07-04       7          27          47
            
            qid & accept id: (28419477, 28419683) query: Django unique random as a default value soup:

            Just test the existence of the generated code in the loop.

            \n
            from django.contrib.auth.models import User\n\ndef unique_rand():\n    while True:\n        code = password = User.objects.make_random_password(length=8)\n        if not Person.objects.filter(code=code).exists():\n            return code\n\nclass Person(models.Model):\n    code = models.CharField(max_length=8, unique=True, default=unique_rand)\n
            \n

            Note that there is no round brackets in the default=unique_rand argument.

            \n

            If you want to limit the number of attempts then change the loop from while to for:

            \n
            def unique_rand():\n    for _ in range(5):\n        ...\n    raise ValueError('Too many attempts to generate the code')\n
            \n soup wrap:

            Just test the existence of the generated code in the loop.

            from django.contrib.auth.models import User
            
            def unique_rand():
                while True:
                    code = password = User.objects.make_random_password(length=8)
                    if not Person.objects.filter(code=code).exists():
                        return code
            
            class Person(models.Model):
                code = models.CharField(max_length=8, unique=True, default=unique_rand)
            

            Note that there is no round brackets in the default=unique_rand argument.

            If you want to limit the number of attempts then change the loop from while to for:

            def unique_rand():
                for _ in range(5):
                    ...
                raise ValueError('Too many attempts to generate the code')
            
            qid & accept id: (28431519, 28432733) query: pandas multiindex assignment from another dataframe soup:

            When you use

            \n
            df.loc['A', :] = df_\n
            \n

            Pandas tries to align the index of df_ with the index of a sub-DataFrame of\ndf. However, at the point in the code where alignment is performed, the\nsub-DataFrame has a MultiIndex, not the single index you see as the result\nof df.loc['A', :].

            \n

            So the alignment fails because df_ has a single index, not the MultiIndex that\nis needed. To see that the index of df_ is indeed the problem, note that

            \n
            ix_ = pd.MultiIndex.from_product([['A'], ['a', 'b', 'c', 'd']])\ndf_.index = ix_\ndf.loc['A', :] = df_\nprint(df)\n
            \n

            succeeds, yielding something like

            \n
            A a  0.229970  0.730824  0.784356\n  b  0.584390  0.628337  0.318222\n  c  0.257192  0.624273  0.221279\n  d  0.787023  0.056342  0.240735\nB a       NaN       NaN       NaN\n  b       NaN       NaN       NaN\n  c       NaN       NaN       NaN\n  d       NaN       NaN       NaN\n
            \n

            Of course, you probably do not want to have to create a new MultiIndex every\ntime you want to assign a block of values. So instead, to work around this\nalignment problem, you can use a NumPy array as the assignment value:

            \n
            df.loc['A', :] = df_.values\n
            \n

            Since df_.values is a NumPy array and an array has no index, no alignment is\nperformed\nand the assignment yields the same result as above. This trick of using a NumPy arrays when you don't want alignment of indexes\napplies to many situations when using Pandas.

            \n

            Note also that assignment-by-NumPy-array can also help you perform more complicated assignments such as to rows which are not contiguous:

            \n
            idx = pd.IndexSlice\ndf.loc[idx[:,('a','b')], :] = df_.values\n
            \n

            yields

            \n
            In [85]: df\nOut[85]: \n          1st       2nd       3rd\nA a  0.229970  0.730824  0.784356\n  b  0.584390  0.628337  0.318222\n  c       NaN       NaN       NaN\n  d       NaN       NaN       NaN\nB a  0.257192  0.624273  0.221279\n  b  0.787023  0.056342  0.240735\n  c       NaN       NaN       NaN\n  d       NaN       NaN       NaN\n
            \n

            for example.

            \n soup wrap:

            When you use

            df.loc['A', :] = df_
            

            Pandas tries to align the index of df_ with the index of a sub-DataFrame of df. However, at the point in the code where alignment is performed, the sub-DataFrame has a MultiIndex, not the single index you see as the result of df.loc['A', :].

            So the alignment fails because df_ has a single index, not the MultiIndex that is needed. To see that the index of df_ is indeed the problem, note that

            ix_ = pd.MultiIndex.from_product([['A'], ['a', 'b', 'c', 'd']])
            df_.index = ix_
            df.loc['A', :] = df_
            print(df)
            

            succeeds, yielding something like

            A a  0.229970  0.730824  0.784356
              b  0.584390  0.628337  0.318222
              c  0.257192  0.624273  0.221279
              d  0.787023  0.056342  0.240735
            B a       NaN       NaN       NaN
              b       NaN       NaN       NaN
              c       NaN       NaN       NaN
              d       NaN       NaN       NaN
            

            Of course, you probably do not want to have to create a new MultiIndex every time you want to assign a block of values. So instead, to work around this alignment problem, you can use a NumPy array as the assignment value:

            df.loc['A', :] = df_.values
            

            Since df_.values is a NumPy array and an array has no index, no alignment is performed and the assignment yields the same result as above. This trick of using a NumPy arrays when you don't want alignment of indexes applies to many situations when using Pandas.

            Note also that assignment-by-NumPy-array can also help you perform more complicated assignments such as to rows which are not contiguous:

            idx = pd.IndexSlice
            df.loc[idx[:,('a','b')], :] = df_.values
            

            yields

            In [85]: df
            Out[85]: 
                      1st       2nd       3rd
            A a  0.229970  0.730824  0.784356
              b  0.584390  0.628337  0.318222
              c       NaN       NaN       NaN
              d       NaN       NaN       NaN
            B a  0.257192  0.624273  0.221279
              b  0.787023  0.056342  0.240735
              c       NaN       NaN       NaN
              d       NaN       NaN       NaN
            

            for example.

            qid & accept id: (28438247, 28438311) query: Computer Shut Off Python 3.4 soup:

            OS X is a Unix at its base, so you can use Unix commands. you can use either:

            \n
            #!/usr/bin/python\nimport os\nos.system("shutdown -h now")\n
            \n

            Must be run as root, won't work otherwise. You can however add to sudoers (man visudo) shutdown as NOPASSWD program for the users that want to execute the script and use sudo shutdown … instead of just shutdown…

            \n

            or

            \n
            import subprocess\nsubprocess.call(['osascript', '-e',\n'tell app "System Events" to shut down'])\n
            \n soup wrap:

            OS X is a Unix at its base, so you can use Unix commands. you can use either:

            #!/usr/bin/python
            import os
            os.system("shutdown -h now")
            

            Must be run as root, won't work otherwise. You can however add to sudoers (man visudo) shutdown as NOPASSWD program for the users that want to execute the script and use sudo shutdown … instead of just shutdown…

            or

            import subprocess
            subprocess.call(['osascript', '-e',
            'tell app "System Events" to shut down'])
            
            qid & accept id: (28443428, 28443522) query: Convert a string with whitespaces to a dataframe with desired dimensions in Python soup:

            Use Numpy.reshape()

            \n
            import numpy as np\nimport pandas as pd\n\nstring = 'A B C D E F G H I J K L'\n\nlist1 = [char for char in string.split(' ') if char != '']\ndf = pd.DataFrame(np.reshape(list1,[3,4]))\n
            \n

            Outputs:

            \n
               0  1  2  3\n0  A  B  C  D\n1  E  F  G  H\n2  I  J  K  L\n
            \n

            Whoops... here it is with 3 col x 4 rows:

            \n
            pd.DataFrame(np.reshape(list1,[4,3]))\n\n   0  1  2\n0  A  B  C\n1  D  E  F\n2  G  H  I\n3  J  K  L\n
            \n

            Edit: put the imports on top.

            \n soup wrap:

            Use Numpy.reshape()

            import numpy as np
            import pandas as pd
            
            string = 'A B C D E F G H I J K L'
            
            list1 = [char for char in string.split(' ') if char != '']
            df = pd.DataFrame(np.reshape(list1,[3,4]))
            

            Outputs:

               0  1  2  3
            0  A  B  C  D
            1  E  F  G  H
            2  I  J  K  L
            

            Whoops... here it is with 3 col x 4 rows:

            pd.DataFrame(np.reshape(list1,[4,3]))
            
               0  1  2
            0  A  B  C
            1  D  E  F
            2  G  H  I
            3  J  K  L
            

            Edit: put the imports on top.

            qid & accept id: (28461458, 28461538) query: (Python) Breaking an output text file into tokens soup:

            Just use str.translate to remove the |, split on the , and filter empty strings:

            \n
            In [9]: s="|22|,|XXX|,|YYY|,|ZZZ|,|3|,|WWW|,|2273|,|QQQ|,||,||,|25/05/2009|,||,|29/01/2010|,||,||,||,||,|EEE EEE|,||,|True|,|False|,||,||,||"\n\n\n\nIn [10]: print(filter(None,s.translate(None,"|").split(",")))\n['22', 'XXX', 'YYY', 'ZZZ', '3', 'WWW', '2273', 'QQQ', '25/05/2009', '29/01/2010', 'EEE EEE', 'True', 'False']\n
            \n

            If you need the data to line up to the columns don't filter.

            \n

            So using your input all you need is something like the following depending on how you want to write the data to the output file:

            \n
            with open("test.txt") as f, open('test_output.txt',"w") as out:\n    wr = csv.writer(out, delimiter=",")\n    for line in f:\n        wr.writerow(filter(None, line.rstrip().translate(None, "|").split(",")))\n
            \n

            Your output will be:

            \n
            Operation_ID,Operation_Name,business_group_name,business_unit_name,Program_ID,Program_Name,Project_ID,Project_Name,Program_Type_Name,Program_Cost_Type_Name,Start_date,Estimated_End_Date,End_Date,SQA_Name,CMA_Name,SSE_Name,PMs,TLs,PortfolioManager,Finished,Research,SQA_ID,CMA_ID,SSE_ID\n20,XXX,YYY,ZZZ,1,WWW,2163,QQQ,15/12/2008,22/01/2009,EEE EEE ,True\n22,XXX,YYY,ZZZ,3,WWW,2165,QQQ,01/01/2009,09/04/2010,EEE EEE EEE,True,False\n20,XXX,YYY,ZZZ,10,WWW,2164,QQQ,Development,Direct,15/12/2008,26/02/2010,EEE ,EEE EEE ; EEE EEE ; EEE EEE ,True,False\n22,XXX,YYY,ZZZ,3,WWW,2166,QQQ,15/12/2008,31/05/2010,True,False\n20,XXX,YYY,ZZZ,10,WWW,2168,QQQ,Development,Direct,05/01/2009,20/05/2009,EEE EEE EEE,EEE EEE ,True\n20,XXX,YYY,ZZZ,1,WWW,2169,QQQ,13/01/2009,22/05/2009,EEE EEE EEE,EEE EEE EEE EEE,True\n etc.................\n
            \n

            As tdelaney mentioned in a comment this does presume you don't have any pipes inside pipes.

            \n

            For python3 we need to do a bit more work as str.translate is slightly different. We need to use str.maketrans to create a table:

            \n
            import csv\n\nwith open("test.txt") as f, open('test_output.txt', "w") as out:\n    wr = csv.writer(out, delimiter=",")\n    table = str.maketrans("|",",")\n    for line in f:\n        wr.writerow(list(filter(None, line.rstrip().translate(table).split(","))\n
            \n

            Another approach would be to just split on "|" and filter commas and empty strings:

            \n
            with open("in.txt") as f, open('test_output.txt', "w") as out:\n    wr = csv.writer(out, delimiter=",")\n    for line in f:\n        wr.writerow(filter(lambda x: x not in  {",",""},line.rstrip().split("|")))\n
            \n soup wrap:

            Just use str.translate to remove the |, split on the , and filter empty strings:

            In [9]: s="|22|,|XXX|,|YYY|,|ZZZ|,|3|,|WWW|,|2273|,|QQQ|,||,||,|25/05/2009|,||,|29/01/2010|,||,||,||,||,|EEE EEE|,||,|True|,|False|,||,||,||"
            
            
            
            In [10]: print(filter(None,s.translate(None,"|").split(",")))
            ['22', 'XXX', 'YYY', 'ZZZ', '3', 'WWW', '2273', 'QQQ', '25/05/2009', '29/01/2010', 'EEE EEE', 'True', 'False']
            

            If you need the data to line up to the columns don't filter.

            So using your input all you need is something like the following depending on how you want to write the data to the output file:

            with open("test.txt") as f, open('test_output.txt',"w") as out:
                wr = csv.writer(out, delimiter=",")
                for line in f:
                    wr.writerow(filter(None, line.rstrip().translate(None, "|").split(",")))
            

            Your output will be:

            Operation_ID,Operation_Name,business_group_name,business_unit_name,Program_ID,Program_Name,Project_ID,Project_Name,Program_Type_Name,Program_Cost_Type_Name,Start_date,Estimated_End_Date,End_Date,SQA_Name,CMA_Name,SSE_Name,PMs,TLs,PortfolioManager,Finished,Research,SQA_ID,CMA_ID,SSE_ID
            20,XXX,YYY,ZZZ,1,WWW,2163,QQQ,15/12/2008,22/01/2009,EEE EEE ,True
            22,XXX,YYY,ZZZ,3,WWW,2165,QQQ,01/01/2009,09/04/2010,EEE EEE EEE,True,False
            20,XXX,YYY,ZZZ,10,WWW,2164,QQQ,Development,Direct,15/12/2008,26/02/2010,EEE ,EEE EEE ; EEE EEE ; EEE EEE ,True,False
            22,XXX,YYY,ZZZ,3,WWW,2166,QQQ,15/12/2008,31/05/2010,True,False
            20,XXX,YYY,ZZZ,10,WWW,2168,QQQ,Development,Direct,05/01/2009,20/05/2009,EEE EEE EEE,EEE EEE ,True
            20,XXX,YYY,ZZZ,1,WWW,2169,QQQ,13/01/2009,22/05/2009,EEE EEE EEE,EEE EEE EEE EEE,True
             etc.................
            

            As tdelaney mentioned in a comment this does presume you don't have any pipes inside pipes.

            For python3 we need to do a bit more work as str.translate is slightly different. We need to use str.maketrans to create a table:

            import csv
            
            with open("test.txt") as f, open('test_output.txt', "w") as out:
                wr = csv.writer(out, delimiter=",")
                table = str.maketrans("|",",")
                for line in f:
                    wr.writerow(list(filter(None, line.rstrip().translate(table).split(","))
            

            Another approach would be to just split on "|" and filter commas and empty strings:

            with open("in.txt") as f, open('test_output.txt', "w") as out:
                wr = csv.writer(out, delimiter=",")
                for line in f:
                    wr.writerow(filter(lambda x: x not in  {",",""},line.rstrip().split("|")))
            
            qid & accept id: (28467688, 28467714) query: how to make post request in python soup:

            With the standard Python httplib and urllib libraries you can do

            \n
            import httplib, urllib\n\nheaders = {'X-API-TOKEN': 'your_token_here'}\npayload = "'title'='value1'&'name'='value2'"\n\nconn = httplib.HTTPConnection("heise.de")\nconn.request("POST", "", payload, headers)\nresponse = conn.getresponse()\n\nprint response\n
            \n

            or if you want to use the nice HTTP library called "Requests".

            \n
            import requests\n\nheaders = {'X-API-TOKEN': 'your_token_here'}\npayload = {'title': 'value1', 'name': 'value2'}\n\nr = requests.post("http://foo.com/foo/bar", data=payload, headers=headers)\n
            \n soup wrap:

            With the standard Python httplib and urllib libraries you can do

            import httplib, urllib
            
            headers = {'X-API-TOKEN': 'your_token_here'}
            payload = "'title'='value1'&'name'='value2'"
            
            conn = httplib.HTTPConnection("heise.de")
            conn.request("POST", "", payload, headers)
            response = conn.getresponse()
            
            print response
            

            or if you want to use the nice HTTP library called "Requests".

            import requests
            
            headers = {'X-API-TOKEN': 'your_token_here'}
            payload = {'title': 'value1', 'name': 'value2'}
            
            r = requests.post("http://foo.com/foo/bar", data=payload, headers=headers)
            
            qid & accept id: (28486144, 28486199) query: Sorting an array by a number string (Python 3.4.2) soup:

            You could use the following lambda as the sort key:

            \n
            >>> sorted(sav, key=lambda x: int(x[3]))\n[['Name: ', 'James', 'Score: ', '1'],\n ['Name: ', 'Alex', 'Score: ', '2'],\n ['Name: ', 'Josh', 'Score: ', '3']]\n
            \n

            The lambda function here picks out the element at index 3 from each list and treats it as integer (using int). The list is sorted by these integers.

            \n

            If left as strings, you'd get odd results when sorting since strings are sorted in lexicographical order. For example, '12' < '8'.

            \n

            This returns a sorted copy of the list sav - you can rebind the name to the sorted list:

            \n
            sav = sorted(sav, key=lambda x: int(x[3]))\n
            \n soup wrap:

            You could use the following lambda as the sort key:

            >>> sorted(sav, key=lambda x: int(x[3]))
            [['Name: ', 'James', 'Score: ', '1'],
             ['Name: ', 'Alex', 'Score: ', '2'],
             ['Name: ', 'Josh', 'Score: ', '3']]
            

            The lambda function here picks out the element at index 3 from each list and treats it as integer (using int). The list is sorted by these integers.

            If left as strings, you'd get odd results when sorting since strings are sorted in lexicographical order. For example, '12' < '8'.

            This returns a sorted copy of the list sav - you can rebind the name to the sorted list:

            sav = sorted(sav, key=lambda x: int(x[3]))
            
            qid & accept id: (28486550, 28486755) query: How to convert an urlopen into a string in python soup:

            You can use the metod .read()

            \n
            data = web.urlopen(message)\nstr_data = data.read()\n
            \n

            This will return the html on the page.\nYou can use dir(web.urlopen(message)) to see all the methods available for that object. You can use dir() to see the methods available for anything in python.

            \n

            To sum up the answer, on that object you crated you can call the method .read() ( like data.read()) or you can use .readline( like data.readline(), this will read just a line from the file, you can use this method to read just a line when you need it).When you read from that object you will get a string back.

            \n

            If you do data.info() you will get something like this :

            \n
            \n
            \n

            You can read more about this here .

            \n soup wrap:

            You can use the metod .read()

            data = web.urlopen(message)
            str_data = data.read()
            

            This will return the html on the page. You can use dir(web.urlopen(message)) to see all the methods available for that object. You can use dir() to see the methods available for anything in python.

            To sum up the answer, on that object you crated you can call the method .read() ( like data.read()) or you can use .readline( like data.readline(), this will read just a line from the file, you can use this method to read just a line when you need it).When you read from that object you will get a string back.

            If you do data.info() you will get something like this :

            
            

            You can read more about this here .

            qid & accept id: (28500524, 28500620) query: Python: Extending a predefined named tuple soup:

            You can subclass a namedtuple-produced class, but you need to study the generated class more closely. You'll need to add another __slots__ attribute with the extra fields, update the _fields attribute, create new __repr__ and _replace methods (they hardcode the field list and class name) and add extra property objects for the additional fields. See the example in the documentation.

            \n

            That's all a little too much work. Rather than subclass, I'd just reuse the somenamedtuple._fields attribute of the source type:

            \n
            LookupElement = namedtuple('LookupElement', ReadElement._fields + ('lookups',))\n
            \n

            The field_names argument to the namedtuple() constructor doesn't have to be a string, it can also be a sequence of strings. Simply take the _fields and add more elements by concatenating a new tuple.

            \n

            Demo:

            \n
            >>> from collections import namedtuple\n>>> ReadElement = namedtuple('ReadElement', 'address value')\n>>> LookupElement = namedtuple('LookupElement', ReadElement._fields + ('lookups',))\n>>> LookupElement._fields\n('address', 'value', 'lookups')\n>>> LookupElement('addr', 'val', 'lookup') \nLookupElement(address='addr', value='val', lookups='lookup')\n
            \n soup wrap:

            You can subclass a namedtuple-produced class, but you need to study the generated class more closely. You'll need to add another __slots__ attribute with the extra fields, update the _fields attribute, create new __repr__ and _replace methods (they hardcode the field list and class name) and add extra property objects for the additional fields. See the example in the documentation.

            That's all a little too much work. Rather than subclass, I'd just reuse the somenamedtuple._fields attribute of the source type:

            LookupElement = namedtuple('LookupElement', ReadElement._fields + ('lookups',))
            

            The field_names argument to the namedtuple() constructor doesn't have to be a string, it can also be a sequence of strings. Simply take the _fields and add more elements by concatenating a new tuple.

            Demo:

            >>> from collections import namedtuple
            >>> ReadElement = namedtuple('ReadElement', 'address value')
            >>> LookupElement = namedtuple('LookupElement', ReadElement._fields + ('lookups',))
            >>> LookupElement._fields
            ('address', 'value', 'lookups')
            >>> LookupElement('addr', 'val', 'lookup') 
            LookupElement(address='addr', value='val', lookups='lookup')
            
            qid & accept id: (28500718, 28501082) query: How to deal with special characters in make command expansion? soup:
            $ cat BP.mk\nVAR := $(shell python -c 'print("include_path_with\\[weird\\]characters")')\n\nall:\n        echo 'DIRECT := `python -c '\''print("include_path_with\\[weird\\]characters")'\''`'\n        echo "DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`"\n        echo DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`\n        echo 'VAR := $(VAR)'\n        echo "VAR := $(VAR)"\n        echo VAR := $(VAR)\n$ make -f BP.mk\necho 'DIRECT := `python -c '\''print("include_path_with\\[weird\\]characters")'\''`'\nDIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`\necho "DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`"\nDIRECT := include_path_with\[weird\]characters\necho DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`\nDIRECT := include_path_with\[weird\]characters\necho 'VAR := include_path_with\[weird\]characters'\nVAR := include_path_with\[weird\]characters\necho "VAR := include_path_with\[weird\]characters"\nVAR := include_path_with\[weird\]characters\necho VAR := include_path_with\[weird\]characters\nVAR := include_path_with[weird]characters\n
            \n

            Notice how in all cases but the last that the backslashes persist into the output? That's the problem. You don't want them there. So what you want is not to print them at all and then quote the expansion so the shell doesn't then process the result at all.

            \n

            So either

            \n
            VAR2 := $(shell python -c 'print("include_path_with[weird]characters")')\ng++ main.cpp -I'$(OUT)'\n
            \n

            or

            \n
            g++ main.cpp -I"$$(python -c 'print("include_path_with[weird]characters")')"\n
            \n soup wrap:
            $ cat BP.mk
            VAR := $(shell python -c 'print("include_path_with\\[weird\\]characters")')
            
            all:
                    echo 'DIRECT := `python -c '\''print("include_path_with\\[weird\\]characters")'\''`'
                    echo "DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`"
                    echo DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`
                    echo 'VAR := $(VAR)'
                    echo "VAR := $(VAR)"
                    echo VAR := $(VAR)
            $ make -f BP.mk
            echo 'DIRECT := `python -c '\''print("include_path_with\\[weird\\]characters")'\''`'
            DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`
            echo "DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`"
            DIRECT := include_path_with\[weird\]characters
            echo DIRECT := `python -c 'print("include_path_with\\[weird\\]characters")'`
            DIRECT := include_path_with\[weird\]characters
            echo 'VAR := include_path_with\[weird\]characters'
            VAR := include_path_with\[weird\]characters
            echo "VAR := include_path_with\[weird\]characters"
            VAR := include_path_with\[weird\]characters
            echo VAR := include_path_with\[weird\]characters
            VAR := include_path_with[weird]characters
            

            Notice how in all cases but the last that the backslashes persist into the output? That's the problem. You don't want them there. So what you want is not to print them at all and then quote the expansion so the shell doesn't then process the result at all.

            So either

            VAR2 := $(shell python -c 'print("include_path_with[weird]characters")')
            g++ main.cpp -I'$(OUT)'
            

            or

            g++ main.cpp -I"$$(python -c 'print("include_path_with[weird]characters")')"
            
            qid & accept id: (28510518, 28510547) query: Extract Text from HTML Python (BeautifulSoup, RE, Other Option?) soup:

            With BeautifulSoup it is fairly straight-forward:

            \n
            from bs4 import BeautifulSoup\n\ndata = """\n\n
            \n
            \n
            \n
            \n\n"""\n\nsoup = BeautifulSoup(data)\nprint(soup.td['data-tooltip'])\n
            \n

            If you have multiple td elements and you need to extract the data-tooltip from each one:

            \n
            for td in soup.find_all('td', {'data-tooltip': True}):\n    print(td['data-tooltip'])\n
            \n soup wrap:

            With BeautifulSoup it is fairly straight-forward:

            from bs4 import BeautifulSoup
            
            data = """
            
            
            """ soup = BeautifulSoup(data) print(soup.td['data-tooltip'])

            If you have multiple td elements and you need to extract the data-tooltip from each one:

            for td in soup.find_all('td', {'data-tooltip': True}):
                print(td['data-tooltip'])
            
            qid & accept id: (28566041, 28566134) query: How can I find the average of each similar entry in a list of tuples? soup:

            There's a couple ways to do this. One is easy, one is pretty.

            \n

            Easy:

            \n

            Use a dictionary! It's easy to build a for loop that goes through your tuples and appends the second element to a dictionary, keyed on the first element.

            \n
            d = {}\ntuples = [('Jem', 10), ('Sam', 10), ('Sam', 2), ('Jem', 9), ('Jem', 10)]\nfor tuple in tuples:\n    key,val = tuple\n    d.setdefault(key, []).append(val)\n
            \n

            Once it's in a dictionary, you can do:

            \n
            for name, values in d.items():\n    print("{name} {avg}".format(name=name, avg=sum(values)/len(values)))\n
            \n

            Pretty:

            \n

            Use itertools.groupby. This only works if your data is sorted by the key you want to group by (in this case, t[0] for each t in tuples) so it's not ideal in this case, but it's a nice way to highlight the function.

            \n
            from itertools import groupby\n\ntuples = [('Jem', 10), ('Sam', 10), ('Sam', 2), ('Jem', 9), ('Jem', 10)]\ntuples.sort(key=lambda tup: tup[0])\n# tuples is now [('Jem', 10), ('Jem', 9), ('Jem', 10), ('Sam', 10), ('Sam', 2)]\n\ngroups = groupby(tuples, lambda tup: tup[0])\n
            \n

            This builds a structure that looks kind of like:

            \n
            [('Jem', [('Jem', 10), ('Jem', 9), ('Jem', 10)]),\n ('Sam', [('Sam', 10), ('Sam', 2)])]\n
            \n

            We can use that to build our names and averages:

            \n
            for groupname, grouptuples in groups:\n    values = [t[1] for t in groupvalues]\n    print("{name} {avg}".format(name=groupname, avg=sum(values)/len(values)))\n
            \n soup wrap:

            There's a couple ways to do this. One is easy, one is pretty.

            Easy:

            Use a dictionary! It's easy to build a for loop that goes through your tuples and appends the second element to a dictionary, keyed on the first element.

            d = {}
            tuples = [('Jem', 10), ('Sam', 10), ('Sam', 2), ('Jem', 9), ('Jem', 10)]
            for tuple in tuples:
                key,val = tuple
                d.setdefault(key, []).append(val)
            

            Once it's in a dictionary, you can do:

            for name, values in d.items():
                print("{name} {avg}".format(name=name, avg=sum(values)/len(values)))
            

            Pretty:

            Use itertools.groupby. This only works if your data is sorted by the key you want to group by (in this case, t[0] for each t in tuples) so it's not ideal in this case, but it's a nice way to highlight the function.

            from itertools import groupby
            
            tuples = [('Jem', 10), ('Sam', 10), ('Sam', 2), ('Jem', 9), ('Jem', 10)]
            tuples.sort(key=lambda tup: tup[0])
            # tuples is now [('Jem', 10), ('Jem', 9), ('Jem', 10), ('Sam', 10), ('Sam', 2)]
            
            groups = groupby(tuples, lambda tup: tup[0])
            

            This builds a structure that looks kind of like:

            [('Jem', [('Jem', 10), ('Jem', 9), ('Jem', 10)]),
             ('Sam', [('Sam', 10), ('Sam', 2)])]
            

            We can use that to build our names and averages:

            for groupname, grouptuples in groups:
                values = [t[1] for t in groupvalues]
                print("{name} {avg}".format(name=groupname, avg=sum(values)/len(values)))
            
            qid & accept id: (28596212, 28596834) query: Killing Thread and releasing memory in Python soup:

            ok after all this garbage I think what you want is the multiprocessing module as I believe you can actually send a sigkill on that

            \n
            class MyThread:\n    def __init__(self):\n        self.result = None\n        self.error = None\n    def start(self):\n        self.proc = multiprocessing.Process(target=self.run)\n        self.proc.start()\n    def stop(self):\n       self.proc.send_signal(multiprocessing.SIG_KILL)\n    def run(self):\n        try:\n            self.result = myfun(*args, **kw) #run external resource and the interrupt it\n        except Exception as e:\n            self.error = e\n
            \n

            then you would call c.stop() in order to halt the thread with the sig_kill (of coarse the other thing should respond appropriately to this)

            \n

            you can probably even just use the builtin subprocess.Process.kill() (see the docs https://docs.python.org/2/library/subprocess.html#subprocess.Popen.send_signal)

            \n

            WRT YOUR QUESTION##

            \n

            (Id there any way to manually release resources? )

            \n
             t = Thread(target=some_long_running_external_process)\n t.start()\n
            \n

            there is no way to exit your thread (t) from outside of some_long_running_external_process

            \n soup wrap:

            ok after all this garbage I think what you want is the multiprocessing module as I believe you can actually send a sigkill on that

            class MyThread:
                def __init__(self):
                    self.result = None
                    self.error = None
                def start(self):
                    self.proc = multiprocessing.Process(target=self.run)
                    self.proc.start()
                def stop(self):
                   self.proc.send_signal(multiprocessing.SIG_KILL)
                def run(self):
                    try:
                        self.result = myfun(*args, **kw) #run external resource and the interrupt it
                    except Exception as e:
                        self.error = e
            

            then you would call c.stop() in order to halt the thread with the sig_kill (of coarse the other thing should respond appropriately to this)

            you can probably even just use the builtin subprocess.Process.kill() (see the docs https://docs.python.org/2/library/subprocess.html#subprocess.Popen.send_signal)

            WRT YOUR QUESTION##

            (Id there any way to manually release resources? )

             t = Thread(target=some_long_running_external_process)
             t.start()
            

            there is no way to exit your thread (t) from outside of some_long_running_external_process

            qid & accept id: (28606124, 28607830) query: Finding time intervals per day from a list of timestamps in Python soup:

            If the list sorted as in your case then you could use itertools.groupby() to group the timestamps into days:

            \n
            #!/usr/bin/env python\nfrom datetime import date, timedelta\nfrom itertools import groupby\n\nepoch = date(1970, 1, 1)\n\nresult = {}\nassert timestamps == sorted(timestamps)\nfor day, group in groupby(timestamps, key=lambda ts: ts // 86400):\n    # store the interval + day/month in a dictionary.\n    same_day = list(group)\n    assert max(same_day) == same_day[-1] and min(same_day) == same_day[0]\n    result[epoch + timedelta(day)] = same_day[0], same_day[-1] \nprint(result)\n
            \n

            Output

            \n
            {datetime.date(2007, 4, 10): (1176239419.0, 1176239419.0),\n datetime.date(2007, 4, 11): (1176334733.0, 1176334733.0),\n datetime.date(2007, 4, 13): (1176445137.0, 1176445137.0),\n datetime.date(2007, 4, 26): (1177619954.0, 1177621082.0),\n datetime.date(2007, 4, 29): (1177838576.0, 1177838576.0),\n datetime.date(2007, 5, 5): (1178349385.0, 1178401697.0),\n datetime.date(2007, 5, 6): (1178437886.0, 1178437886.0),\n datetime.date(2007, 5, 11): (1178926650.0, 1178926650.0),\n datetime.date(2007, 5, 12): (1178982127.0, 1178982127.0),\n datetime.date(2007, 5, 14): (1179130340.0, 1179130340.0),\n datetime.date(2007, 5, 15): (1179263733.0, 1179264930.0),\n datetime.date(2007, 5, 19): (1179574273.0, 1179574273.0),\n datetime.date(2007, 5, 20): (1179671730.0, 1179671730.0),\n datetime.date(2007, 5, 30): (1180549056.0, 1180549056.0),\n datetime.date(2007, 6, 2): (1180763342.0, 1180763342.0),\n datetime.date(2007, 6, 9): (1181386289.0, 1181386289.0),\n datetime.date(2007, 6, 16): (1181990860.0, 1181990860.0),\n datetime.date(2007, 6, 27): (1182979573.0, 1182979573.0),\n datetime.date(2007, 7, 1): (1183326862.0, 1183326862.0)}\n
            \n

            If there is only one timestamp in that day than it is repeated twice.

            \n
            \n

            how would you afterwards do to test if the last (for example) 5 entries in the result have a larger interval than the previous 14?

            \n
            \n
            entries = sorted(result.items())\nintervals = [(end - start) for _, (start, end) in entries]\nprint(max(intervals[-5:]) > max(intervals[-5-14:-5]))\n# -> False\n
            \n soup wrap:

            If the list sorted as in your case then you could use itertools.groupby() to group the timestamps into days:

            #!/usr/bin/env python
            from datetime import date, timedelta
            from itertools import groupby
            
            epoch = date(1970, 1, 1)
            
            result = {}
            assert timestamps == sorted(timestamps)
            for day, group in groupby(timestamps, key=lambda ts: ts // 86400):
                # store the interval + day/month in a dictionary.
                same_day = list(group)
                assert max(same_day) == same_day[-1] and min(same_day) == same_day[0]
                result[epoch + timedelta(day)] = same_day[0], same_day[-1] 
            print(result)
            

            Output

            {datetime.date(2007, 4, 10): (1176239419.0, 1176239419.0),
             datetime.date(2007, 4, 11): (1176334733.0, 1176334733.0),
             datetime.date(2007, 4, 13): (1176445137.0, 1176445137.0),
             datetime.date(2007, 4, 26): (1177619954.0, 1177621082.0),
             datetime.date(2007, 4, 29): (1177838576.0, 1177838576.0),
             datetime.date(2007, 5, 5): (1178349385.0, 1178401697.0),
             datetime.date(2007, 5, 6): (1178437886.0, 1178437886.0),
             datetime.date(2007, 5, 11): (1178926650.0, 1178926650.0),
             datetime.date(2007, 5, 12): (1178982127.0, 1178982127.0),
             datetime.date(2007, 5, 14): (1179130340.0, 1179130340.0),
             datetime.date(2007, 5, 15): (1179263733.0, 1179264930.0),
             datetime.date(2007, 5, 19): (1179574273.0, 1179574273.0),
             datetime.date(2007, 5, 20): (1179671730.0, 1179671730.0),
             datetime.date(2007, 5, 30): (1180549056.0, 1180549056.0),
             datetime.date(2007, 6, 2): (1180763342.0, 1180763342.0),
             datetime.date(2007, 6, 9): (1181386289.0, 1181386289.0),
             datetime.date(2007, 6, 16): (1181990860.0, 1181990860.0),
             datetime.date(2007, 6, 27): (1182979573.0, 1182979573.0),
             datetime.date(2007, 7, 1): (1183326862.0, 1183326862.0)}
            

            If there is only one timestamp in that day than it is repeated twice.

            how would you afterwards do to test if the last (for example) 5 entries in the result have a larger interval than the previous 14?

            entries = sorted(result.items())
            intervals = [(end - start) for _, (start, end) in entries]
            print(max(intervals[-5:]) > max(intervals[-5-14:-5]))
            # -> False
            
            qid & accept id: (28616651, 28616747) query: manipulate column fields for clean representation soup:

            If you want to consider awk:

            \n
            awk -F '[ ,]' '{sub(/:.+$/, "", $3); sub(/:.+$/, "", $5); print $3, $5, $11}' file\n10.20.10.144 10.1.1.98 1295\n10.20.10.144 10.1.1.98 956\n10.20.10.144 10.1.1.97 645\n
            \n
            \n

            EDIT: Based on comments below:

            \n
            awk -F '[ ,]' '{sub(/:.+$/, "", $3); a[$3]+=$11} END{for (i in a) print i, a[i]}' file\n10.20.10.144 2896\n
            \n soup wrap:

            If you want to consider awk:

            awk -F '[ ,]' '{sub(/:.+$/, "", $3); sub(/:.+$/, "", $5); print $3, $5, $11}' file
            10.20.10.144 10.1.1.98 1295
            10.20.10.144 10.1.1.98 956
            10.20.10.144 10.1.1.97 645
            

            EDIT: Based on comments below:

            awk -F '[ ,]' '{sub(/:.+$/, "", $3); a[$3]+=$11} END{for (i in a) print i, a[i]}' file
            10.20.10.144 2896
            
            qid & accept id: (28617958, 28645031) query: Casting string/buffer data using swig wrapped typedef structs and enums in python soup:

            You can do this with SWIG, the simplest solution is to use %extend to supply an extra constructor from within Python that takes a PyObect to use as a buffer:

            \n
            %module test\n\n%include \n\n%inline %{\n#ifdef SWIG\n#define __attribute__(x)\n#endif\n\n#define TOKEN_TYPE_SYNC_VALUE 1\n#define TOKEN_TYPE_DELTA 2\n\ntypedef struct __attribute__((packed))\n{\n    uint8_t token_type;\n    int16_t delta;\n} struct_token_type_delta;\n%}\n\n%extend struct_token_type_delta {\n  struct_token_type_delta(PyObject *in) {\n    assert(PyObject_CheckBuffer(in));\n    Py_buffer view;\n    const int ret = PyObject_GetBuffer(in, &view, PyBUF_SIMPLE);\n    assert(0==ret);\n    assert(view.len >= sizeof(struct_token_type_delta));\n    struct_token_type_delta *result = new struct_token_type_delta(*static_cast(view.buf));\n    PyBuffer_Release(&view); // Note you could/should retain view.obj for the life of this object to prevent use after free\n    return result;\n  }\n}\n
            \n

            You'd need to do this for each type you wanted to construct from a buffer, but the actual code for the constructor of each remains the same so could be wrapped as a macro (using %define) quite simply. You would also want to do something to prevent the use after free error, by retaining the reference to the underlying buffer for longer.

            \n
            \n

            Personally if it were me doing this though I'd look for a different solution, because there are nicer ways of getting the same result and writing code that creates and maintains thin POD/bean like objects is tedious and dull in any language let alone 2 or more. Assuming protbuf is too heavyweight to use in your embedded system I'd look to solve this in reverse, using ctypes for Python and then having your Python code also generate the header for your C build tools as well. So something like:

            \n
            import ctypes\n\nclass ProtocolStructure(type(ctypes.Structure)):\n  def __str__(self):\n    s='''\ntypedef struct __attribute__((packed)) {\n\t%s\n}'''\n    return s % '\n\t'.join(('%s %s;' % (ty.__name__[2:], name) for name,ty in self._fields_))\n\nclass struct_token_type_delta(ctypes.Structure, metaclass=ProtocolStructure):\n  _fields_ = (('token_type', ctypes.c_uint8),\n              ('delta', ctypes.c_int16))\n\nif __name__ == '__main__':\n  # when this file is run instead of imported print the header file to stdout\n\n  h='''\n#ifndef PROTO_H\n#define PROTO_H\n%s\n#endif\n'''\n\n  print(h % ';\n'.join('%s %s;\n' % (ty, name)  for name,ty in globals().items() if issubclass(type(ty), ProtocolStructure)))\n
            \n

            Which then lets you write:

            \n
            import proto\nproto.struct_token_type_delta.from_buffer(bytearray(b'\xff\x11\x22'))\n
            \n soup wrap:

            You can do this with SWIG, the simplest solution is to use %extend to supply an extra constructor from within Python that takes a PyObect to use as a buffer:

            %module test
            
            %include 
            
            %inline %{
            #ifdef SWIG
            #define __attribute__(x)
            #endif
            
            #define TOKEN_TYPE_SYNC_VALUE 1
            #define TOKEN_TYPE_DELTA 2
            
            typedef struct __attribute__((packed))
            {
                uint8_t token_type;
                int16_t delta;
            } struct_token_type_delta;
            %}
            
            %extend struct_token_type_delta {
              struct_token_type_delta(PyObject *in) {
                assert(PyObject_CheckBuffer(in));
                Py_buffer view;
                const int ret = PyObject_GetBuffer(in, &view, PyBUF_SIMPLE);
                assert(0==ret);
                assert(view.len >= sizeof(struct_token_type_delta));
                struct_token_type_delta *result = new struct_token_type_delta(*static_cast(view.buf));
                PyBuffer_Release(&view); // Note you could/should retain view.obj for the life of this object to prevent use after free
                return result;
              }
            }
            

            You'd need to do this for each type you wanted to construct from a buffer, but the actual code for the constructor of each remains the same so could be wrapped as a macro (using %define) quite simply. You would also want to do something to prevent the use after free error, by retaining the reference to the underlying buffer for longer.


            Personally if it were me doing this though I'd look for a different solution, because there are nicer ways of getting the same result and writing code that creates and maintains thin POD/bean like objects is tedious and dull in any language let alone 2 or more. Assuming protbuf is too heavyweight to use in your embedded system I'd look to solve this in reverse, using ctypes for Python and then having your Python code also generate the header for your C build tools as well. So something like:

            import ctypes
            
            class ProtocolStructure(type(ctypes.Structure)):
              def __str__(self):
                s='''
            typedef struct __attribute__((packed)) {
            \t%s
            }'''
                return s % '\n\t'.join(('%s %s;' % (ty.__name__[2:], name) for name,ty in self._fields_))
            
            class struct_token_type_delta(ctypes.Structure, metaclass=ProtocolStructure):
              _fields_ = (('token_type', ctypes.c_uint8),
                          ('delta', ctypes.c_int16))
            
            if __name__ == '__main__':
              # when this file is run instead of imported print the header file to stdout
            
              h='''
            #ifndef PROTO_H
            #define PROTO_H
            %s
            #endif
            '''
            
              print(h % ';\n'.join('%s %s;\n' % (ty, name)  for name,ty in globals().items() if issubclass(type(ty), ProtocolStructure)))
            

            Which then lets you write:

            import proto
            proto.struct_token_type_delta.from_buffer(bytearray(b'\xff\x11\x22'))
            
            qid & accept id: (28643401, 28644470) query: How to call python script from CasperJS soup:

            The problem is that you exit prematurely. An empty casper.run() means that it will exit as soon as all casper steps are executed. The child_process module is not a CasperJS module (it's provided by PhantomJS) so it cannot know that it is executing.

            \n

            You could either simply use

            \n
            casp.run(function(){});\n
            \n

            to prevent the exiting. But then you will probably need to kill the CasperJS process.

            \n

            A better way would be to set a variable when the execution finished and continue only then:

            \n
            casp.start().then(function() {\n  var finished = false;\n  var cp = require('child_process');\n  cp.execFile('/usr/bin/python','test.py', {},function(_,stdout,stderr){\n    console.log(stdout);\n    console.log(stderr);\n    finished = true;\n  });\n  this.waitFor(function check(){\n    return finished;\n  }, function then(){\n    // can stay empty\n  });\n}).run();\n
            \n

            If you want to pass multiple arguments to the external program, you should use an array as the second argument to execFile

            \n soup wrap:

            The problem is that you exit prematurely. An empty casper.run() means that it will exit as soon as all casper steps are executed. The child_process module is not a CasperJS module (it's provided by PhantomJS) so it cannot know that it is executing.

            You could either simply use

            casp.run(function(){});
            

            to prevent the exiting. But then you will probably need to kill the CasperJS process.

            A better way would be to set a variable when the execution finished and continue only then:

            casp.start().then(function() {
              var finished = false;
              var cp = require('child_process');
              cp.execFile('/usr/bin/python','test.py', {},function(_,stdout,stderr){
                console.log(stdout);
                console.log(stderr);
                finished = true;
              });
              this.waitFor(function check(){
                return finished;
              }, function then(){
                // can stay empty
              });
            }).run();
            

            If you want to pass multiple arguments to the external program, you should use an array as the second argument to execFile

            qid & accept id: (28647371, 28647430) query: Sort generated numbers using another python generator soup:

            You are replacing your (value, index) tuples with just the value:

            \n
            self.values[index] = self.generators[index].next()\n
            \n

            You need to replace that with a new tuple:

            \n
            self.values[index] = (self.generators[index].next(), index)\n
            \n

            otherwise the iterable assignment fails; you cannot assign one int to two variables.

            \n

            Your generator is missing a loop and handling of empty generators:

            \n
            def generate(self):\n    while any(self.values):\n        r, index = min(v for v in self.values if v)\n        try:\n            self.values[index] = (self.generators[index].next(), index)\n        except StopIteration:\n            self.values[index] = None\n        yield r\n
            \n

            This sets elements of your self.values list to None to indicate the iterable has been exhausted. This is not the most efficient way to handle this edge case; in a version I wrote before I used a dictionary to track active iterables and simply deleted from that to keep indices (keys) stable.

            \n

            Note that you can replace your t() function with the built-in iter() function.

            \n

            Demo:

            \n
            >>> class GeneratorSort():\n...     def __init__(self, *args):\n...         self.values = [(arg.next(), i) for i, arg in enumerate(args)]\n...         self.generators = args\n...     def generate(self):\n...         while any(self.values):\n...             r, index = min(v for v in self.values if v)\n...             try:\n...                 self.values[index] = (self.generators[index].next(), index)\n...             except StopIteration:\n...                 self.values[index] = None\n...             yield r\n... \n>>> l1 = [2, 5, 6, 8]\n>>> l2 = [1, 4, 5, 7]\n>>> l3 = [0, 3, 9, 10]\n>>> a = GeneratorSort(iter(l1), iter(l2), iter(l3))\n>>> list(a.generate())\n[0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10]\n
            \n

            The standard library does it more efficiently still with the heapq.merge() function; it uses a heap to keep the iterables sorted by lowest value in a very efficient manner; min() needs to loop through all K iterables, while using a heap only takes log-K steps to keep the heap invariant intact.

            \n
            >>> import heapq\n>>> list(heapq.merge(l1, l2, l3))\n[0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10]\n
            \n

            You can study the source code, which has been highly tuned for maximum performance.

            \n soup wrap:

            You are replacing your (value, index) tuples with just the value:

            self.values[index] = self.generators[index].next()
            

            You need to replace that with a new tuple:

            self.values[index] = (self.generators[index].next(), index)
            

            otherwise the iterable assignment fails; you cannot assign one int to two variables.

            Your generator is missing a loop and handling of empty generators:

            def generate(self):
                while any(self.values):
                    r, index = min(v for v in self.values if v)
                    try:
                        self.values[index] = (self.generators[index].next(), index)
                    except StopIteration:
                        self.values[index] = None
                    yield r
            

            This sets elements of your self.values list to None to indicate the iterable has been exhausted. This is not the most efficient way to handle this edge case; in a version I wrote before I used a dictionary to track active iterables and simply deleted from that to keep indices (keys) stable.

            Note that you can replace your t() function with the built-in iter() function.

            Demo:

            >>> class GeneratorSort():
            ...     def __init__(self, *args):
            ...         self.values = [(arg.next(), i) for i, arg in enumerate(args)]
            ...         self.generators = args
            ...     def generate(self):
            ...         while any(self.values):
            ...             r, index = min(v for v in self.values if v)
            ...             try:
            ...                 self.values[index] = (self.generators[index].next(), index)
            ...             except StopIteration:
            ...                 self.values[index] = None
            ...             yield r
            ... 
            >>> l1 = [2, 5, 6, 8]
            >>> l2 = [1, 4, 5, 7]
            >>> l3 = [0, 3, 9, 10]
            >>> a = GeneratorSort(iter(l1), iter(l2), iter(l3))
            >>> list(a.generate())
            [0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10]
            

            The standard library does it more efficiently still with the heapq.merge() function; it uses a heap to keep the iterables sorted by lowest value in a very efficient manner; min() needs to loop through all K iterables, while using a heap only takes log-K steps to keep the heap invariant intact.

            >>> import heapq
            >>> list(heapq.merge(l1, l2, l3))
            [0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 10]
            

            You can study the source code, which has been highly tuned for maximum performance.

            qid & accept id: (28654483, 28661230) query: How do I find the maximum amount of possible correct matches in these arrays? soup:

            The following is an unoptimized implementation of finding a maximum-matching in a bipartite graph that iterates all the unmatched women and tries to change the current matching by pairing each of them with one of her candidates as follows:

            \n
              \n
            • if there's a "free" candidate - add the match to the graph
            • \n
            • if all the candidates are already paired, "un-pair" one of them and match him to her, then start an iterative process of going to the woman that this man was already paired with, marking her as the "unmatched" and call recursively.
            • \n
            \n

            I named this process "relax" because it somewhat reminds the recursive relaxation step in Dijkstra's algorithm.

            \n

            Here's the code:

            \n
            \n
            import random\n\ndef read_file():\n    res = {}\n    start = True\n    with open('pairs.txt', 'r') as f:\n        for line in f.readlines():\n            if start:\n                start = False\n                continue\n            woman, matches = line.strip().split(': ')\n            woman = int(woman)\n            matches = map(int, matches.split(' '))\n            res[woman] = matches\n    return res\n\n\ndef build_random_match(graph):\n    edges = {}\n    for woman in graph:\n        for man in graph[woman]:\n            if already_in_edges(man, edges):\n                continue\n            else:\n                edges[woman] = man\n                break\n    return edges\n\n\ndef already_in_edges(man, edges):\n    for woman in edges:\n        if edges[woman] == man:\n            return True\n    else:\n        return False\n\n\ndef get_unmatched_women(match, graph):\n    return  [woman for woman in graph.keys() if woman not in match.keys()]\n\n\ndef not_in_match(man, match):\n    for woman in match:\n        if match[woman] == man:\n            return False\n    else:\n        return True\n\n\ndef find_unmatched_man(graph, match, woman):    \n    potentials = graph[woman]\n    for man in potentials:\n        if not_in_match(man, match):\n            return man\n    else:\n        return False\n\n\ndef remove_man_from_match(man, unmatched_woman, match, graph):  \n    # find the woman that this man is currently matched with\n    # and cancel this matching\n    for woman in match:\n        if match[woman] == man:\n            match_to_del = woman\n            break   \n    del match[match_to_del]\n    # also remove the man from the orig woman (graph) \n    # to prevent infinite loop\n    men = graph[unmatched_woman]\n    men.remove(man)\n    graph[unmatched_woman] = men\n\n    return match_to_del\n\n\ndef relax(unmatched_woman, match, graph):   \n    unmatched_man = find_unmatched_man(graph, match, unmatched_woman)\n    if unmatched_man:\n        match[unmatched_woman] = unmatched_man      \n    elif len(graph[unmatched_woman]) == 0:\n        return match\n    else:\n        # grab one of the possible matchings randomly\n        rand_index = random.randint(0, len(graph[unmatched_woman])-1)\n        man = graph[unmatched_woman][rand_index]\n        new_unmatched_woman = remove_man_from_match(man, unmatched_woman, match, graph)\n        match[unmatched_woman] = man\n        match = relax(new_unmatched_woman, match, graph)\n\n    return match\n\n\ndef improve_match(match, graph):\n    if len(match) == len(graph):\n        return match\n\n    unmatched_women = get_unmatched_women(match, graph) \n    for woman in unmatched_women:\n        copy_graph = graph.copy()\n        suggested = relax(woman, match, copy_graph)\n        if len(suggested) > len(match):\n            return suggested\n        else:\n            suggested = match\n    else:\n        return suggested\n\n\ndef main():\n    graph = read_file()\n    match = build_random_match(graph)   \n    if len(match) == len(graph):\n        print 'Got a perfect match:', match\n    else:\n        match_size = 0\n        while match_size < len(match):\n            match_size = len(match)\n            match = improve_match(match, graph)\n\n    return match\n\nif __name__ == '__main__':\n    res = main()    \n    print "Size of match:", len(res)\n    print "Match:", res\n
            \n
            \n

            OUTPUT:

            \n
            Size of match: 17\nMatch: {2: 28, 3: 32, 4: 22, 5: 38, 6: 34, 7: 37, 8: 30, 9: 23, 10: 24, 11: 29, 12: 26, 13: 21, 15: 20, 16: 31, 17: 27, 18: 35, 19: 25}\n
            \n soup wrap:

            The following is an unoptimized implementation of finding a maximum-matching in a bipartite graph that iterates all the unmatched women and tries to change the current matching by pairing each of them with one of her candidates as follows:

            • if there's a "free" candidate - add the match to the graph
            • if all the candidates are already paired, "un-pair" one of them and match him to her, then start an iterative process of going to the woman that this man was already paired with, marking her as the "unmatched" and call recursively.

            I named this process "relax" because it somewhat reminds the recursive relaxation step in Dijkstra's algorithm.

            Here's the code:


            import random
            
            def read_file():
                res = {}
                start = True
                with open('pairs.txt', 'r') as f:
                    for line in f.readlines():
                        if start:
                            start = False
                            continue
                        woman, matches = line.strip().split(': ')
                        woman = int(woman)
                        matches = map(int, matches.split(' '))
                        res[woman] = matches
                return res
            
            
            def build_random_match(graph):
                edges = {}
                for woman in graph:
                    for man in graph[woman]:
                        if already_in_edges(man, edges):
                            continue
                        else:
                            edges[woman] = man
                            break
                return edges
            
            
            def already_in_edges(man, edges):
                for woman in edges:
                    if edges[woman] == man:
                        return True
                else:
                    return False
            
            
            def get_unmatched_women(match, graph):
                return  [woman for woman in graph.keys() if woman not in match.keys()]
            
            
            def not_in_match(man, match):
                for woman in match:
                    if match[woman] == man:
                        return False
                else:
                    return True
            
            
            def find_unmatched_man(graph, match, woman):    
                potentials = graph[woman]
                for man in potentials:
                    if not_in_match(man, match):
                        return man
                else:
                    return False
            
            
            def remove_man_from_match(man, unmatched_woman, match, graph):  
                # find the woman that this man is currently matched with
                # and cancel this matching
                for woman in match:
                    if match[woman] == man:
                        match_to_del = woman
                        break   
                del match[match_to_del]
                # also remove the man from the orig woman (graph) 
                # to prevent infinite loop
                men = graph[unmatched_woman]
                men.remove(man)
                graph[unmatched_woman] = men
            
                return match_to_del
            
            
            def relax(unmatched_woman, match, graph):   
                unmatched_man = find_unmatched_man(graph, match, unmatched_woman)
                if unmatched_man:
                    match[unmatched_woman] = unmatched_man      
                elif len(graph[unmatched_woman]) == 0:
                    return match
                else:
                    # grab one of the possible matchings randomly
                    rand_index = random.randint(0, len(graph[unmatched_woman])-1)
                    man = graph[unmatched_woman][rand_index]
                    new_unmatched_woman = remove_man_from_match(man, unmatched_woman, match, graph)
                    match[unmatched_woman] = man
                    match = relax(new_unmatched_woman, match, graph)
            
                return match
            
            
            def improve_match(match, graph):
                if len(match) == len(graph):
                    return match
            
                unmatched_women = get_unmatched_women(match, graph) 
                for woman in unmatched_women:
                    copy_graph = graph.copy()
                    suggested = relax(woman, match, copy_graph)
                    if len(suggested) > len(match):
                        return suggested
                    else:
                        suggested = match
                else:
                    return suggested
            
            
            def main():
                graph = read_file()
                match = build_random_match(graph)   
                if len(match) == len(graph):
                    print 'Got a perfect match:', match
                else:
                    match_size = 0
                    while match_size < len(match):
                        match_size = len(match)
                        match = improve_match(match, graph)
            
                return match
            
            if __name__ == '__main__':
                res = main()    
                print "Size of match:", len(res)
                print "Match:", res
            

            OUTPUT:

            Size of match: 17
            Match: {2: 28, 3: 32, 4: 22, 5: 38, 6: 34, 7: 37, 8: 30, 9: 23, 10: 24, 11: 29, 12: 26, 13: 21, 15: 20, 16: 31, 17: 27, 18: 35, 19: 25}
            
            qid & accept id: (28658817, 28659290) query: Partitioning a set of values in Python soup:

            We can do linear:

            \n
            values = [7, 3, 2, 7, 1, 9, 8]\n\nrange_by_min, range_by_max = {}, {}\n\nfor v in values:\n    range_by_min[v] = range_by_max[v] = [v, v]\n\nfor v in values:\n    if v - 1 in range_by_max and v in range_by_min:\n        p, q = range_by_max[v - 1], range_by_min[v]\n        del range_by_min[q[0]]\n        del range_by_max[p[1]]\n        p[1] = q[1]\n        range_by_max[p[1]] = p\n\nprint(range_by_min, range_by_max)\n\nresult = {k: v[1] - v[0] + 1 for k, v in range_by_min.iteritems()}\nprint(result)\n
            \n

            Result:

            \n
            ({1: [1, 3], 7: [7, 9]}, {3: [1, 3], 9: [7, 9]})\n{1: 3, 7: 3}\n
            \n

            The idea is to keep two dictionaries that store ranges (a range is represented as a list of it's minimum and maximum value). The first maps the minimum key to the range. The second maps the maximum key to the range.

            \n

            Then we traverse the list of values and we join neighboring ranges. If we are visiting 4 and there is a range 4..6 then we check if there is a range ending at 3, let's say 1..3. So we join them in one: 1..6.

            \n

            The algorithm is linear to the hash table accesses. Since we expect constant access to the dictionaries, the expected running time is linear to the size of values. With this way we don't even have to sort the input array.

            \n

            EDIT:

            \n

            I saw the link suggested by David Eisenstat. Based on this link, the implementation can be updated to use only one dictionary:

            \n
            ranges = {v: [v, v] for v in values}\n\nfor v in values:\n    if v - 1 in ranges and v in ranges:\n        p, q = ranges[v - 1], ranges[v]\n        if p[1] == v - 1 and q[0] == v:\n            if q[0] != q[1]:\n                del ranges[q[0]]\n            if p[0] != p[1]:\n                del ranges[p[1]]\n            p[1] = q[1]\n            ranges[p[1]] = p\n\nresult = {k: v[1] - v[0] + 1 for k, v in ranges.iteritems() if k == v[0]}\n
            \n soup wrap:

            We can do linear:

            values = [7, 3, 2, 7, 1, 9, 8]
            
            range_by_min, range_by_max = {}, {}
            
            for v in values:
                range_by_min[v] = range_by_max[v] = [v, v]
            
            for v in values:
                if v - 1 in range_by_max and v in range_by_min:
                    p, q = range_by_max[v - 1], range_by_min[v]
                    del range_by_min[q[0]]
                    del range_by_max[p[1]]
                    p[1] = q[1]
                    range_by_max[p[1]] = p
            
            print(range_by_min, range_by_max)
            
            result = {k: v[1] - v[0] + 1 for k, v in range_by_min.iteritems()}
            print(result)
            

            Result:

            ({1: [1, 3], 7: [7, 9]}, {3: [1, 3], 9: [7, 9]})
            {1: 3, 7: 3}
            

            The idea is to keep two dictionaries that store ranges (a range is represented as a list of it's minimum and maximum value). The first maps the minimum key to the range. The second maps the maximum key to the range.

            Then we traverse the list of values and we join neighboring ranges. If we are visiting 4 and there is a range 4..6 then we check if there is a range ending at 3, let's say 1..3. So we join them in one: 1..6.

            The algorithm is linear to the hash table accesses. Since we expect constant access to the dictionaries, the expected running time is linear to the size of values. With this way we don't even have to sort the input array.

            EDIT:

            I saw the link suggested by David Eisenstat. Based on this link, the implementation can be updated to use only one dictionary:

            ranges = {v: [v, v] for v in values}
            
            for v in values:
                if v - 1 in ranges and v in ranges:
                    p, q = ranges[v - 1], ranges[v]
                    if p[1] == v - 1 and q[0] == v:
                        if q[0] != q[1]:
                            del ranges[q[0]]
                        if p[0] != p[1]:
                            del ranges[p[1]]
                        p[1] = q[1]
                        ranges[p[1]] = p
            
            result = {k: v[1] - v[0] + 1 for k, v in ranges.iteritems() if k == v[0]}
            
            qid & accept id: (28665356, 28665434) query: Line breaks with lists soup:

            Use enumerate with a start index of 1 and str.format:

            \n
            while True:\n    myInput = input()\n    if myInput == "nothing":\n        print('There are {} items in the basket: '.format(len(basket)))\n        for ind, item in enumerate(basket,1):\n            print("Item{}: {} ".format(ind,item))\n        break\n    else:\n        basket.append(myInput)\n        print('Okay, what else?')\n
            \n

            You can also use a list comprehension and iter without needing a while loop, it will keep looping until the user enters the sentinel value "nothing":

            \n
            print('Add as many items to the basket as you want. When you are done, enter "nothing".')\nprint('What do you want to put into the basket now?')\nbasket = [ line for line in iter(lambda:input("Please enter an item to add"), "nothing")]\n\nprint('There are {} items in the basket: '.format(len(basket)))\nfor ind,item in enumerate(basket,1):\n    print("Item{}: {} ".format(ind,item))\n
            \n soup wrap:

            Use enumerate with a start index of 1 and str.format:

            while True:
                myInput = input()
                if myInput == "nothing":
                    print('There are {} items in the basket: '.format(len(basket)))
                    for ind, item in enumerate(basket,1):
                        print("Item{}: {} ".format(ind,item))
                    break
                else:
                    basket.append(myInput)
                    print('Okay, what else?')
            

            You can also use a list comprehension and iter without needing a while loop, it will keep looping until the user enters the sentinel value "nothing":

            print('Add as many items to the basket as you want. When you are done, enter "nothing".')
            print('What do you want to put into the basket now?')
            basket = [ line for line in iter(lambda:input("Please enter an item to add"), "nothing")]
            
            print('There are {} items in the basket: '.format(len(basket)))
            for ind,item in enumerate(basket,1):
                print("Item{}: {} ".format(ind,item))
            
            qid & accept id: (28677840, 28678312) query: Get system metrics using PowerShell soup:

            You can do it the easy way by using the TerminalServerSession bool-property from .NET Framework:

            \n
            Add-Type -AssemblyName System.Windows.Forms\n[System.Windows.Forms.SystemInformation]::TerminalServerSession\n
            \n

            Output:

            \n
            False\n
            \n

            Or you could do it the manual way (like TerminalServerSession does internally) and use C# and P/Invoke to add load and use GetSystemMetrics() in PowerShell.

            \n
            $def = @"\n//I removed every other enum-value to shorten the sample\npublic enum SystemMetric\n   {\n     SM_REMOTESESSION           = 0x1000, // 0x1000\n   }\n\n[DllImport("user32.dll")]\npublic static extern int GetSystemMetrics(SystemMetric smIndex);\n"@\n\nAdd-Type -Namespace NativeMethods -Name User32Dll -MemberDefinition $def\n\n[NativeMethods.User32Dll]::GetSystemMetrics([NativeMethods.User32Dll+SystemMetric]::SM_REMOTESESSION)\n
            \n

            Output:

            \n
            0\n
            \n soup wrap:

            You can do it the easy way by using the TerminalServerSession bool-property from .NET Framework:

            Add-Type -AssemblyName System.Windows.Forms
            [System.Windows.Forms.SystemInformation]::TerminalServerSession
            

            Output:

            False
            

            Or you could do it the manual way (like TerminalServerSession does internally) and use C# and P/Invoke to add load and use GetSystemMetrics() in PowerShell.

            $def = @"
            //I removed every other enum-value to shorten the sample
            public enum SystemMetric
               {
                 SM_REMOTESESSION           = 0x1000, // 0x1000
               }
            
            [DllImport("user32.dll")]
            public static extern int GetSystemMetrics(SystemMetric smIndex);
            "@
            
            Add-Type -Namespace NativeMethods -Name User32Dll -MemberDefinition $def
            
            [NativeMethods.User32Dll]::GetSystemMetrics([NativeMethods.User32Dll+SystemMetric]::SM_REMOTESESSION)
            

            Output:

            0
            
            qid & accept id: (28687087, 28687377) query: How to override OSX's version of numpy when I import in Python 2.7? soup:

            It is important to note that there are two different installations of Python on your Mac. There is the System Python (/usr/bin), and also the /usr/local/bin python.

            \n

            There are also two installations of pip. For example:

            \n
            $ which pip\n/usr/local/bin/pip\n$ ls -l /usr/local/bin/pip\nlrwxr-xr-x  1 dmao  admin  30 Feb 14 19:09 /usr/local/bin/pip -> ../Cellar/python/2.7.9/bin/pip\n
            \n

            This is the homebrew pip. I assume you have numpy installed on the homebrew version of pip.

            \n

            There is no System version of pip installed by default. The usual solution is to run easy_install pip and install a system version of pip, then pip install numpy (using system pip). However, you mentioned you wanted to leave the system numpy.

            \n
            \n

            If you need to leave the system numpy untouched, you can run the /usr/local Python as your default Python instead of the system Python. Here we create a symbolic link from the default python to the local python, so that the local python becomes the default.

            \n
            sudo ln -s /usr/bin/python /usr/local/bin/python\n
            \n

            Then your default Python version becomes the one which matches your default version of pip.

            \n

            You can restore your default Python version anytime by replacing the symlink. /usr/bin has the links you need.

            \n
            $ ls -l /usr/bin/ | grep python\nlrwxr-xr-x   1 root   wheel        76 Feb 21  2014 pythonw2.5 -> ../../System/Library/Frameworks/Python.framework/Versions/2.5/bin/pythonw2.5\nlrwxr-xr-x   1 root   wheel        76 Feb 21  2014 pythonw2.6 -> ../../System/Library/Frameworks/Python.framework/Versions/2.6/bin/pythonw2.6\nlrwxr-xr-x   1 root   wheel        76 Feb 21  2014 pythonw2.7 -> ../../System/Library/Frameworks/Python.framework/Versions/2.7/bin/pythonw2.7\n
            \n
            \n

            Alternatively, if your System Python is being used for something, and/or you need to keep switching between versions of python packages, you could use virtualenv, which makes this much easier.

            \n
            \n

            There are many different ways to manage python modules on a Mac. For example, What is the most compatible way to install python modules on a Mac?

            \n soup wrap:

            It is important to note that there are two different installations of Python on your Mac. There is the System Python (/usr/bin), and also the /usr/local/bin python.

            There are also two installations of pip. For example:

            $ which pip
            /usr/local/bin/pip
            $ ls -l /usr/local/bin/pip
            lrwxr-xr-x  1 dmao  admin  30 Feb 14 19:09 /usr/local/bin/pip -> ../Cellar/python/2.7.9/bin/pip
            

            This is the homebrew pip. I assume you have numpy installed on the homebrew version of pip.

            There is no System version of pip installed by default. The usual solution is to run easy_install pip and install a system version of pip, then pip install numpy (using system pip). However, you mentioned you wanted to leave the system numpy.


            If you need to leave the system numpy untouched, you can run the /usr/local Python as your default Python instead of the system Python. Here we create a symbolic link from the default python to the local python, so that the local python becomes the default.

            sudo ln -s /usr/bin/python /usr/local/bin/python
            

            Then your default Python version becomes the one which matches your default version of pip.

            You can restore your default Python version anytime by replacing the symlink. /usr/bin has the links you need.

            $ ls -l /usr/bin/ | grep python
            lrwxr-xr-x   1 root   wheel        76 Feb 21  2014 pythonw2.5 -> ../../System/Library/Frameworks/Python.framework/Versions/2.5/bin/pythonw2.5
            lrwxr-xr-x   1 root   wheel        76 Feb 21  2014 pythonw2.6 -> ../../System/Library/Frameworks/Python.framework/Versions/2.6/bin/pythonw2.6
            lrwxr-xr-x   1 root   wheel        76 Feb 21  2014 pythonw2.7 -> ../../System/Library/Frameworks/Python.framework/Versions/2.7/bin/pythonw2.7
            

            Alternatively, if your System Python is being used for something, and/or you need to keep switching between versions of python packages, you could use virtualenv, which makes this much easier.


            There are many different ways to manage python modules on a Mac. For example, What is the most compatible way to install python modules on a Mac?

            qid & accept id: (28716265, 28721902) query: How can I use python pandas to parse CSV into the format I want? soup:

            There is a groupby/cumcount/unstack trick which converts long-format DataFrames to wide-format DataFrames:

            \n
            import pandas as pd\ndf = pd.read_table('data', sep='\s+')\n\ncommon = ['weather', 'location', 'time', 'date', 'Condition']\ngrouped = df.groupby(common)\ndf['idx'] = grouped.cumcount()\ndf2 = df.set_index(common+['idx'])\ndf2 = df2.unstack('idx')\ndf2 = df2.swaplevel(0, 1, axis=1)\ndf2 = df2.sortlevel(axis=1)\ndf2.columns = df2.columns.droplevel(0)\ndf2 = df2.reset_index()\nprint(df2)\n
            \n

            yields

            \n
              weather  location       time        date  Condition insectName  count  \\n0   sunny  balabala  0900:1200  1990-02-10         25        aaa     15   \n1   sunny  balabala  1300:1500  1990-02-15         38        XXX     40   \n\n  insectName  count insectName  count insectName  count  \n0        bbb     10        ccc     20        ddd     50  \n1        yyy     10        yyy     25        NaN    NaN  \n
            \n

            While wide-format may be useful for presentation, note that long-format\nis usually the right format for data processing. See Hadley Wickham's article on the virtues of tidy data (PDF).

            \n soup wrap:

            There is a groupby/cumcount/unstack trick which converts long-format DataFrames to wide-format DataFrames:

            import pandas as pd
            df = pd.read_table('data', sep='\s+')
            
            common = ['weather', 'location', 'time', 'date', 'Condition']
            grouped = df.groupby(common)
            df['idx'] = grouped.cumcount()
            df2 = df.set_index(common+['idx'])
            df2 = df2.unstack('idx')
            df2 = df2.swaplevel(0, 1, axis=1)
            df2 = df2.sortlevel(axis=1)
            df2.columns = df2.columns.droplevel(0)
            df2 = df2.reset_index()
            print(df2)
            

            yields

              weather  location       time        date  Condition insectName  count  \
            0   sunny  balabala  0900:1200  1990-02-10         25        aaa     15   
            1   sunny  balabala  1300:1500  1990-02-15         38        XXX     40   
            
              insectName  count insectName  count insectName  count  
            0        bbb     10        ccc     20        ddd     50  
            1        yyy     10        yyy     25        NaN    NaN  
            

            While wide-format may be useful for presentation, note that long-format is usually the right format for data processing. See Hadley Wickham's article on the virtues of tidy data (PDF).

            qid & accept id: (28720893, 28721513) query: Save text cursor position in the currently focused application/control, then restore it and paste text soup:

            Using this python code:

            \n

            https://github.com/SavinaRoja/PyUserInput

            \n

            I can generate a string in a window as if it was typed there. Although it does fail with Unicode in Linux:

            \n
            >>> time.sleep(5) ; k.type_string('ΣΣ')\nTraceback (most recent call last):\n  File "", line 1, in \n  File "/usr/local/lib/python2.7/dist-packages/pykeyboard/base.py", line 48, in type_string\n    self.tap_key(i)\n  File "/usr/local/lib/python2.7/dist-packages/pykeyboard/base.py", line 40, in tap_key\n    self.press_key(character)\n  File "/usr/local/lib/python2.7/dist-packages/pykeyboard/x11.py", line 91, in press_key\n    keycode = self.lookup_character_keycode(character)\n  File "/usr/local/lib/python2.7/dist-packages/pykeyboard/x11.py", line 222, in lookup_character_keycode\n    keysym = Xlib.XK.string_to_keysym(special_X_keysyms[character])\nKeyError: '\xce'\n
            \n

            Not sure what the solution there is. How do you type any Unicode character into a text window anyway? There's a Gnomey-Linux standard where you can type Ctrl-Shift-u and then hex digits, then Ctrl-Shift to end. Do that with:

            \n
            k.press_key(k.shift_key)\nk.press_key(k.control_key)\nk.type_string("u03a3")\nk.release_key(k.shift_key)\nk.release_key(k.control_key)\n
            \n

            and get a Σ

            \n

            The package code seems to be cross-platform, don't know if the unicode entry method is. I've only tested on Linux.

            \n soup wrap:

            Using this python code:

            https://github.com/SavinaRoja/PyUserInput

            I can generate a string in a window as if it was typed there. Although it does fail with Unicode in Linux:

            >>> time.sleep(5) ; k.type_string('ΣΣ')
            Traceback (most recent call last):
              File "", line 1, in 
              File "/usr/local/lib/python2.7/dist-packages/pykeyboard/base.py", line 48, in type_string
                self.tap_key(i)
              File "/usr/local/lib/python2.7/dist-packages/pykeyboard/base.py", line 40, in tap_key
                self.press_key(character)
              File "/usr/local/lib/python2.7/dist-packages/pykeyboard/x11.py", line 91, in press_key
                keycode = self.lookup_character_keycode(character)
              File "/usr/local/lib/python2.7/dist-packages/pykeyboard/x11.py", line 222, in lookup_character_keycode
                keysym = Xlib.XK.string_to_keysym(special_X_keysyms[character])
            KeyError: '\xce'
            

            Not sure what the solution there is. How do you type any Unicode character into a text window anyway? There's a Gnomey-Linux standard where you can type Ctrl-Shift-u and then hex digits, then Ctrl-Shift to end. Do that with:

            k.press_key(k.shift_key)
            k.press_key(k.control_key)
            k.type_string("u03a3")
            k.release_key(k.shift_key)
            k.release_key(k.control_key)
            

            and get a Σ

            The package code seems to be cross-platform, don't know if the unicode entry method is. I've only tested on Linux.

            qid & accept id: (28754280, 28754422) query: Python3 Make a list that increments for a certain amount, decrements for a certain amount soup:

            There are two problems with your approach:

            \n
              \n
            1. You are reversing the entire list, not just that slice. Let's say the list is [1,2,3,4], and we want to reverse the second half, i.e. get [1,2,4,3]; with your approach, you would take the third and fourth element from the reversed list, [4,3,2,1], and end up with [1,2,2,1]
            2. \n
            3. The to-index in a range is exclusive, thus by using range(17) and then range(18,35) and so forth, you are missing out on the elements at index 17, 35, and 53
            4. \n
            \n

            You can use a loop for the different parts to be reversed, and then replace that slice of the list with the same slice in reverse order.

            \n
            lst = list(range(20))\nfor start in range(5, len(lst), 10):\n    lst[start:start+5] = lst[start+4:start-1:-1]\n
            \n

            Or this way, as pointed out in comments, which also gets rid of those nasty off-by-one indices:

            \n
            for start in range(5, len(lst), 10):\n    lst[start:start+5] = reversed(lst[start:start+5])\n
            \n

            Afterwards, lst is [0, 1, 2, 3, 4, 9, 8, 7, 6, 5, 10, 11, 12, 13, 14, 19, 18, 17, 16, 15].

            \n

            Or, in case the intervals to be reversed are irregular (as it seems to be in your question):

            \n
            reverse = [(3, 7), (12,17)]\nfor start, end in reverse:\n    lst[start:end] = reversed(lst[start:end])\n
            \n soup wrap:

            There are two problems with your approach:

            1. You are reversing the entire list, not just that slice. Let's say the list is [1,2,3,4], and we want to reverse the second half, i.e. get [1,2,4,3]; with your approach, you would take the third and fourth element from the reversed list, [4,3,2,1], and end up with [1,2,2,1]
            2. The to-index in a range is exclusive, thus by using range(17) and then range(18,35) and so forth, you are missing out on the elements at index 17, 35, and 53

            You can use a loop for the different parts to be reversed, and then replace that slice of the list with the same slice in reverse order.

            lst = list(range(20))
            for start in range(5, len(lst), 10):
                lst[start:start+5] = lst[start+4:start-1:-1]
            

            Or this way, as pointed out in comments, which also gets rid of those nasty off-by-one indices:

            for start in range(5, len(lst), 10):
                lst[start:start+5] = reversed(lst[start:start+5])
            

            Afterwards, lst is [0, 1, 2, 3, 4, 9, 8, 7, 6, 5, 10, 11, 12, 13, 14, 19, 18, 17, 16, 15].

            Or, in case the intervals to be reversed are irregular (as it seems to be in your question):

            reverse = [(3, 7), (12,17)]
            for start, end in reverse:
                lst[start:end] = reversed(lst[start:end])
            
            qid & accept id: (28755798, 28756473) query: JSON to Pandas: is there a more elegant solution? soup:

            The read_json function of pandas is a very tricky method to use. If you don't know with certainty the validity of your JSON object or whether its initial structure is sane enough to build a dataframe around, it's much better to stick to tried and tested methods to break your data down to something that pandas can use without issues 100%.

            \n

            In your case, I suggest breaking down your data to a list of lists. Out of all that JSON, the only part you really need is in the data and column keys.

            \n

            Try this:

            \n
            import pandas as pd\nimport json\nimport urllib\n\njs = json.loads(urllib.urlopen("test.json").read())\ndata = js["data"]\nrows = [row["row"] for row in data] # Transform the 'row' keys to list of lists.\ndf = pd.DataFrame(rows, columns=js["columns"])\nprint df\n
            \n

            This gives me the desired result:

            \n
               rank          name    deaths\n0     1    Mao Zedong  63000000\n1     2  Jozef Stalin  23000000\n
            \n soup wrap:

            The read_json function of pandas is a very tricky method to use. If you don't know with certainty the validity of your JSON object or whether its initial structure is sane enough to build a dataframe around, it's much better to stick to tried and tested methods to break your data down to something that pandas can use without issues 100%.

            In your case, I suggest breaking down your data to a list of lists. Out of all that JSON, the only part you really need is in the data and column keys.

            Try this:

            import pandas as pd
            import json
            import urllib
            
            js = json.loads(urllib.urlopen("test.json").read())
            data = js["data"]
            rows = [row["row"] for row in data] # Transform the 'row' keys to list of lists.
            df = pd.DataFrame(rows, columns=js["columns"])
            print df
            

            This gives me the desired result:

               rank          name    deaths
            0     1    Mao Zedong  63000000
            1     2  Jozef Stalin  23000000
            
            qid & accept id: (28766133, 28769537) query: Faster way to read Excel files to pandas dataframe soup:

            As others have suggested, csv reading is faster. So if you are on windows and have Excel, you could call a vbscript to convert the Excel to csv and then read the csv. I tried the script below and it took about 30 seconds.

            \n
            # create a list with sheet numbers you want to process\nsheets = map(str,range(1,6))\n\n# convert each sheet to csv and then read it using read_csv\ndf={}\nfrom subprocess import call\nexcel='C:\\Users\\rsignell\\OTT_Data_All_stations.xlsx'\nfor sheet in sheets:\n    csv = 'C:\\Users\\rsignell\\test' + sheet + '.csv' \n    call(['cscript.exe', 'C:\\Users\\rsignell\\ExcelToCsv.vbs', excel, csv, sheet])\n    df[sheet]=pd.read_csv(csv)\n
            \n

            Here's a little snippet of python to create the ExcelToCsv.vbs script:

            \n
            #write vbscript to file\nvbscript="""if WScript.Arguments.Count < 3 Then\n    WScript.Echo "Please specify the source and the destination files. Usage: ExcelToCsv   "\n    Wscript.Quit\nEnd If\n\ncsv_format = 6\n\nSet objFSO = CreateObject("Scripting.FileSystemObject")\n\nsrc_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0))\ndest_file = objFSO.GetAbsolutePathName(WScript.Arguments.Item(1))\nworksheet_number = CInt(WScript.Arguments.Item(2))\n\nDim oExcel\nSet oExcel = CreateObject("Excel.Application")\n\nDim oBook\nSet oBook = oExcel.Workbooks.Open(src_file)\noBook.Worksheets(worksheet_number).Activate\n\noBook.SaveAs dest_file, csv_format\n\noBook.Close False\noExcel.Quit\n""";\n\nf = open('ExcelToCsv.vbs','w')\nf.write(vbscript.encode('utf-8'))\nf.close()\n
            \n

            This answer benefited from Convert XLS to CSV on command line and csv & xlsx files import to pandas data frame: speed issue

            \n soup wrap:

            As others have suggested, csv reading is faster. So if you are on windows and have Excel, you could call a vbscript to convert the Excel to csv and then read the csv. I tried the script below and it took about 30 seconds.

            # create a list with sheet numbers you want to process
            sheets = map(str,range(1,6))
            
            # convert each sheet to csv and then read it using read_csv
            df={}
            from subprocess import call
            excel='C:\\Users\\rsignell\\OTT_Data_All_stations.xlsx'
            for sheet in sheets:
                csv = 'C:\\Users\\rsignell\\test' + sheet + '.csv' 
                call(['cscript.exe', 'C:\\Users\\rsignell\\ExcelToCsv.vbs', excel, csv, sheet])
                df[sheet]=pd.read_csv(csv)
            

            Here's a little snippet of python to create the ExcelToCsv.vbs script:

            #write vbscript to file
            vbscript="""if WScript.Arguments.Count < 3 Then
                WScript.Echo "Please specify the source and the destination files. Usage: ExcelToCsv   "
                Wscript.Quit
            End If
            
            csv_format = 6
            
            Set objFSO = CreateObject("Scripting.FileSystemObject")
            
            src_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0))
            dest_file = objFSO.GetAbsolutePathName(WScript.Arguments.Item(1))
            worksheet_number = CInt(WScript.Arguments.Item(2))
            
            Dim oExcel
            Set oExcel = CreateObject("Excel.Application")
            
            Dim oBook
            Set oBook = oExcel.Workbooks.Open(src_file)
            oBook.Worksheets(worksheet_number).Activate
            
            oBook.SaveAs dest_file, csv_format
            
            oBook.Close False
            oExcel.Quit
            """;
            
            f = open('ExcelToCsv.vbs','w')
            f.write(vbscript.encode('utf-8'))
            f.close()
            

            This answer benefited from Convert XLS to CSV on command line and csv & xlsx files import to pandas data frame: speed issue

            qid & accept id: (28767484, 28767693) query: Python: Decode base64 multiple strings in a file soup:

            I assume that you created your test input string yourself.

            \n

            If I split your test input string in blocks of 4 characters and decode each one apart, I get the following:

            \n
            >>> import base64\n>>> s = 'cw==ZA==YQ==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==dA==ZQ==cw==dA=='\n>>> ''.join(base64.b64decode(s[i:i+4]) for i in range(0, len(s), 4))\n\n'sdadasdasdasdasdtest'\n
            \n

            However, the correct base64 encoding of your test string sdadasdasdasdasdtest is:

            \n
            >>> base64.b64encode('sdadasdasdasdasdtest')\n'c2RhZGFzZGFzZGFzZGFzZHRlc3Q='\n
            \n

            If you place this string in my_file.txt (and rewriting your code to be a bit more concise) then it all works.

            \n
            import base64\n\nwith open("my_file.txt") as f, open("original_b64.txt", 'w') as g:\n    encoded = f.read()\n    decoded = base64.b64decode(encoded)\n    g.write(decoded)\n
            \n soup wrap:

            I assume that you created your test input string yourself.

            If I split your test input string in blocks of 4 characters and decode each one apart, I get the following:

            >>> import base64
            >>> s = 'cw==ZA==YQ==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==YQ==cw==ZA==dA==ZQ==cw==dA=='
            >>> ''.join(base64.b64decode(s[i:i+4]) for i in range(0, len(s), 4))
            
            'sdadasdasdasdasdtest'
            

            However, the correct base64 encoding of your test string sdadasdasdasdasdtest is:

            >>> base64.b64encode('sdadasdasdasdasdtest')
            'c2RhZGFzZGFzZGFzZGFzZHRlc3Q='
            

            If you place this string in my_file.txt (and rewriting your code to be a bit more concise) then it all works.

            import base64
            
            with open("my_file.txt") as f, open("original_b64.txt", 'w') as g:
                encoded = f.read()
                decoded = base64.b64decode(encoded)
                g.write(decoded)
            
            qid & accept id: (28774852, 34366589) query: PyPI API - How to get stable package version soup:

            Version scheme defined in the PEP-440. There is a module packaging, which can handle version parsing and comparison.

            \n

            I came up with this function to get latest stable version of a package:

            \n
            import requests\nimport json\ntry:\n    from packaging.version import parse\nexcept ImportError:\n    from pip._vendor.packaging.version import parse\n\n\nURL_PATTERN = 'https://pypi.python.org/pypi/{package}/json'\n\n\ndef get_version(package, url_pattern=URL_PATTERN):\n    """Return version of package on pypi.python.org using json."""\n    req = requests.get(url_pattern.format(package=package))\n    version = parse('0')\n    if req.status_code == requests.codes.ok:\n        j = json.loads(req.text.encode(req.encoding))\n        if 'releases' in j:\n            releases = j['releases']\n            for release in releases:\n                ver = parse(release)\n                if not ver.is_prerelease:\n                    version = max(version, ver)\n    return version\n\n\nif __name__ == '__main__':\n    print("Django==%s" % get_version('Django'))\n
            \n

            When executed, this produces following results:

            \n
            $ python v.py\nDjango==1.9\n
            \n soup wrap:

            Version scheme defined in the PEP-440. There is a module packaging, which can handle version parsing and comparison.

            I came up with this function to get latest stable version of a package:

            import requests
            import json
            try:
                from packaging.version import parse
            except ImportError:
                from pip._vendor.packaging.version import parse
            
            
            URL_PATTERN = 'https://pypi.python.org/pypi/{package}/json'
            
            
            def get_version(package, url_pattern=URL_PATTERN):
                """Return version of package on pypi.python.org using json."""
                req = requests.get(url_pattern.format(package=package))
                version = parse('0')
                if req.status_code == requests.codes.ok:
                    j = json.loads(req.text.encode(req.encoding))
                    if 'releases' in j:
                        releases = j['releases']
                        for release in releases:
                            ver = parse(release)
                            if not ver.is_prerelease:
                                version = max(version, ver)
                return version
            
            
            if __name__ == '__main__':
                print("Django==%s" % get_version('Django'))
            

            When executed, this produces following results:

            $ python v.py
            Django==1.9
            
            qid & accept id: (28795561, 28797512) query: Support multiple API versions in flask soup:

            I am the author of the accepted answer on the question you referenced. I think the //users approach is not very effective as you say. If you have to manage three or four different versions you'll end up with spaghetti code.

            \n

            The nginx idea I proposed there is better, but has the drawback that you have to host two separate applications. Back then I missed to mention a third alternative, which is to use a blueprint for each API version. For example, consider the following app structure (greatly simplified for clarity):

            \n
            my_project\n+-- api/\n    +-- v1/\n        +-- __init__.py\n        +-- routes.py\n    +-- v1_1/\n        +-- __init__.py\n        +-- routes.py\n    +-- v2/\n        +-- __init__.py\n        +-- routes.py\n    +-- __init__.py\n    +-- common.py\n
            \n

            Here you have a api/common.py that implements common functions that all versions of the API need. For example, you can have an auxiliary function (not decorated as a route) that responds to your /users route that is identical in v1 and v1.1.

            \n

            The routes.py for each API version define the routes, and when necessary call into common.py functions to avoid duplicating logic. For example, your v1 and v1.1 routes.py can have:

            \n
            from api import common\n\n@api.route('/users')\ndef get_users():\n    return common.get_users()\n
            \n

            Note the api.route. Here api is a blueprint. Having each API version implemented as a blueprint helps to combine everything with the proper versioned URLs. Here is an example app setup code that imports the API blueprints into the application instance:

            \n
            from api.v1 import api as api_v1\nfrom api.v1_1 import api as api_v1_1\nfrom api.v2 import api as api_v2\n\napp.register_blueprint(api_v1, url_prefix='/v1')\napp.register_blueprint(api_v1_1, url_prefix='/v1.1')\napp.register_blueprint(api_v2, url_prefix='/v2')\n
            \n

            This structure is very nice because it keeps all API versions separate, yet they are served by the same application. As an added benefit, when the time comes to stop supporting v1, you just remove the register_blueprint call for that version, delete the v1 package from your sources and you are done.

            \n

            Now, with all of this said, you should really make an effort to design your API in a way that minimizes the risk of having to rev the version. Consider that adding new routes does not require a new API version, it is perfectly fine to extend an API with new routes. And changes in existing routes can sometimes be designed in a way that do not affect old clients. Sometimes it is less painful to rev the API and have more freedom to change things, but ideally that doesn't happen too often.

            \n soup wrap:

            I am the author of the accepted answer on the question you referenced. I think the //users approach is not very effective as you say. If you have to manage three or four different versions you'll end up with spaghetti code.

            The nginx idea I proposed there is better, but has the drawback that you have to host two separate applications. Back then I missed to mention a third alternative, which is to use a blueprint for each API version. For example, consider the following app structure (greatly simplified for clarity):

            my_project
            +-- api/
                +-- v1/
                    +-- __init__.py
                    +-- routes.py
                +-- v1_1/
                    +-- __init__.py
                    +-- routes.py
                +-- v2/
                    +-- __init__.py
                    +-- routes.py
                +-- __init__.py
                +-- common.py
            

            Here you have a api/common.py that implements common functions that all versions of the API need. For example, you can have an auxiliary function (not decorated as a route) that responds to your /users route that is identical in v1 and v1.1.

            The routes.py for each API version define the routes, and when necessary call into common.py functions to avoid duplicating logic. For example, your v1 and v1.1 routes.py can have:

            from api import common
            
            @api.route('/users')
            def get_users():
                return common.get_users()
            

            Note the api.route. Here api is a blueprint. Having each API version implemented as a blueprint helps to combine everything with the proper versioned URLs. Here is an example app setup code that imports the API blueprints into the application instance:

            from api.v1 import api as api_v1
            from api.v1_1 import api as api_v1_1
            from api.v2 import api as api_v2
            
            app.register_blueprint(api_v1, url_prefix='/v1')
            app.register_blueprint(api_v1_1, url_prefix='/v1.1')
            app.register_blueprint(api_v2, url_prefix='/v2')
            

            This structure is very nice because it keeps all API versions separate, yet they are served by the same application. As an added benefit, when the time comes to stop supporting v1, you just remove the register_blueprint call for that version, delete the v1 package from your sources and you are done.

            Now, with all of this said, you should really make an effort to design your API in a way that minimizes the risk of having to rev the version. Consider that adding new routes does not require a new API version, it is perfectly fine to extend an API with new routes. And changes in existing routes can sometimes be designed in a way that do not affect old clients. Sometimes it is less painful to rev the API and have more freedom to change things, but ideally that doesn't happen too often.

            qid & accept id: (28800634, 28800674) query: Python: how to create a list from elements that don't meet a certain condition soup:

            You can use filter:

            \n
            small_names = filter(lambda n: len(n)<=4, names)\n#equivalent to: small_names = [n for n in names if len(n) <=4]\n\nprint(small_names) # ['jake', 'Brad', 'Tony']\n
            \n

            Using for loop:

            \n
            small_names = []\n\nfor n in names:\n    if len(n) <= 4:\n        small_names.append(n)\n
            \n soup wrap:

            You can use filter:

            small_names = filter(lambda n: len(n)<=4, names)
            #equivalent to: small_names = [n for n in names if len(n) <=4]
            
            print(small_names) # ['jake', 'Brad', 'Tony']
            

            Using for loop:

            small_names = []
            
            for n in names:
                if len(n) <= 4:
                    small_names.append(n)
            
            qid & accept id: (28812868, 28812970) query: Python Version Specific Code soup:

            Use the mebmers of sys.version_info if you want to base your code on the Python version:

            \n
            sys.version_info.major\nsys.version_info.minor\nsys.version_info.micro\n
            \n

            Use these members like this:

            \n
            if sys.version_info.major == 3 and sys.version_info.minor == 4:\n    print("I like Python 3.4")\n
            \n

            pygame has a similar structure:

            \n
            pygame.version.vernum\n
            \n
            \n

            tupled integers of the version: vernum = (1, 5, 3)

            \n
            \n soup wrap:

            Use the mebmers of sys.version_info if you want to base your code on the Python version:

            sys.version_info.major
            sys.version_info.minor
            sys.version_info.micro
            

            Use these members like this:

            if sys.version_info.major == 3 and sys.version_info.minor == 4:
                print("I like Python 3.4")
            

            pygame has a similar structure:

            pygame.version.vernum
            

            tupled integers of the version: vernum = (1, 5, 3)

            qid & accept id: (28823052, 28823356) query: stdout from python to stdin java soup:
            \n

            python:

            \n
            p.stdin.write("haha")\n
            \n

            java:

            \n
            Scanner in = new Scanner(System.in)\ndata = in.next()\n
            \n
            \n

            From the Java Scanner docs:

            \n
            \n

            By default, a scanner uses white space to separate tokens. (White\n space characters include blanks, tabs, and line terminators.

            \n
            \n

            Your python code does not write anything that a Scanner recognizes as the end of a token, so the Scanner sits there waiting to read more data. In other words, next() reads input until it encounters a whitespace character, then it returns the data read in, minus the terminating whitespace.

            \n

            This python code:

            \n
            import subprocess\n\np = subprocess.Popen(\n    [\n        'java',  \n        '-cp',\n        '/Users/7stud/java_programs/myjar.jar',\n        'MyProg'\n    ],\n    stdout = subprocess.PIPE, \n    stdin = subprocess.PIPE,\n)\n\n\np.stdin.write("haha\n")\nprint "i am done" \nprint p.stdout.readline().rstrip()\n
            \n

            ...with this java code:

            \n
            public class MyProg {\n    public static void main(String[] args) {\n        Scanner in = new Scanner(System.in);\n        String data = in.next();\n\n        System.out.println("Java program received: " + data);\n    }\n}\n
            \n

            ...produces this output:

            \n
            i am done\nJava program received: haha\n
            \n soup wrap:

            python:

            p.stdin.write("haha")
            

            java:

            Scanner in = new Scanner(System.in)
            data = in.next()
            

            From the Java Scanner docs:

            By default, a scanner uses white space to separate tokens. (White space characters include blanks, tabs, and line terminators.

            Your python code does not write anything that a Scanner recognizes as the end of a token, so the Scanner sits there waiting to read more data. In other words, next() reads input until it encounters a whitespace character, then it returns the data read in, minus the terminating whitespace.

            This python code:

            import subprocess
            
            p = subprocess.Popen(
                [
                    'java',  
                    '-cp',
                    '/Users/7stud/java_programs/myjar.jar',
                    'MyProg'
                ],
                stdout = subprocess.PIPE, 
                stdin = subprocess.PIPE,
            )
            
            
            p.stdin.write("haha\n")
            print "i am done" 
            print p.stdout.readline().rstrip()
            

            ...with this java code:

            public class MyProg {
                public static void main(String[] args) {
                    Scanner in = new Scanner(System.in);
                    String data = in.next();
            
                    System.out.println("Java program received: " + data);
                }
            }
            

            ...produces this output:

            i am done
            Java program received: haha
            
            qid & accept id: (28823170, 28823374) query: single line if statement - Python soup:

            Oldschool Python had a trick for doing the ternary before Python had a ternary operator. Actually, this trick will work in many programming languages. I'm only going to tell you if you promise not to use it.

            \n

            Promise?

            \n
            \n

            [val_if_false, val_if_true][bool(condition)]

            \n
            \n

            OK, now Python has a ternary, which looks like

            \n
            val_if_true if condition else val_if_false\n
            \n

            But even that is kinda sloppy. If you really want a one liner, make a function.

            \n
            def val():\n    if condition:\n        return val_if_true\n    else:\n        return val_if_false\n
            \n

            Your specific case is a little more specialized, since you want the else to be the original value. You could do

            \n
            box[a][b] = box[a][b] or chr(current_char)\n
            \n

            But again, the if statement is just more readable and clear as to the intent.

            \n soup wrap:

            Oldschool Python had a trick for doing the ternary before Python had a ternary operator. Actually, this trick will work in many programming languages. I'm only going to tell you if you promise not to use it.

            Promise?

            [val_if_false, val_if_true][bool(condition)]

            OK, now Python has a ternary, which looks like

            val_if_true if condition else val_if_false
            

            But even that is kinda sloppy. If you really want a one liner, make a function.

            def val():
                if condition:
                    return val_if_true
                else:
                    return val_if_false
            

            Your specific case is a little more specialized, since you want the else to be the original value. You could do

            box[a][b] = box[a][b] or chr(current_char)
            

            But again, the if statement is just more readable and clear as to the intent.

            qid & accept id: (28832166, 28832516) query: List to nested dictionary in python soup:

            Simple one-liner:

            \n
            a = ['item1', 'item2', 'item3','item4']\nprint reduce(lambda x, y: {y: x}, reversed(a))\n
            \n

            For better understanding the above code can be expanded to:

            \n
            def nest_me(x, y):\n    """\n    Take two arguments and return a one element dict with first\n    argument as a value and second as a key\n    """\n    return {y: x}\n\na = ['item1', 'item2', 'item3','item4']\nrev_a = reversed(a) # ['item4', 'item3', 'item2','item1']\nprint reduce(\n    nest_me, # Function applied until the list is reduced to one element list\n    rev_a # Iterable to be reduced\n)\n# {'item1': {'item2': {'item3': 'item4'}}}\n
            \n soup wrap:

            Simple one-liner:

            a = ['item1', 'item2', 'item3','item4']
            print reduce(lambda x, y: {y: x}, reversed(a))
            

            For better understanding the above code can be expanded to:

            def nest_me(x, y):
                """
                Take two arguments and return a one element dict with first
                argument as a value and second as a key
                """
                return {y: x}
            
            a = ['item1', 'item2', 'item3','item4']
            rev_a = reversed(a) # ['item4', 'item3', 'item2','item1']
            print reduce(
                nest_me, # Function applied until the list is reduced to one element list
                rev_a # Iterable to be reduced
            )
            # {'item1': {'item2': {'item3': 'item4'}}}
            
            qid & accept id: (28855043, 28855133) query: Append in each line of a .txt file a specific string using Python soup:

            You could use re.sub function.

            \n

            To append at the start of each line.

            \n
            with open("test.txt", "r") as myfile:\n    fil = myfile.read().rstrip('\n')\nwith open("test.txt", "w") as f:\n    f.write(re.sub(r'(?m)^', r'append text', fil))\n
            \n

            To append at the end of each line.

            \n
            with open("test.txt", "r") as myfile:\n    fil = myfile.read().rstrip('\n')\nwith open("test.txt", "w") as f:\n    f.write(re.sub(r'(?m)$', r'append text', fil))\n
            \n soup wrap:

            You could use re.sub function.

            To append at the start of each line.

            with open("test.txt", "r") as myfile:
                fil = myfile.read().rstrip('\n')
            with open("test.txt", "w") as f:
                f.write(re.sub(r'(?m)^', r'append text', fil))
            

            To append at the end of each line.

            with open("test.txt", "r") as myfile:
                fil = myfile.read().rstrip('\n')
            with open("test.txt", "w") as f:
                f.write(re.sub(r'(?m)$', r'append text', fil))
            
            qid & accept id: (28857930, 28858580) query: Bitwise operations to produce power of two in Python soup:

            For integers only, you can get there in a slightly devious way with the following:

            \n
            def justify(n):\n    return n / 1<<(n.bit_length()-1)\n
            \n

            I've no idea if it's faster without significant testing but a quick test with timeit shows it to be about twice as fast as your first snippet.

            \n

            However, converting n to a float in the numerator (to get a float return) slows it to the same speed as your original.

            \n
            def justify(n):\n    return float(n) / 1<<(n.bit_length()-1)\n
            \n

            bit_length gives the minimum number of bits required to represent abs(x) which is actually going to be one more than you want for your calculation.

            \n

            I would expect log(n,2) to be heavily optimized for powers of two in the base - and it's implemented in C. So you will have trouble beating it.

            \n

            Possibly changing the denominator to 1< may give you better performance than the 2** approach .. and it seems to be about 30% faster giving

            \n
            def justify(n):\n    return float(n) / (1<
            \n

            It is possible to do it completely with bitwise operators:

            \n
            def justify_bitwise(n):\n   int_n = int(abs(n))\n   p = 0\n   while int_n != 1:\n       p += 1\n       int_n >>= 1\n\n   return float(n) / (1<
            \n

            But timeit clocks this at 2.16 microseconds. An order of magnitude slower than using bit_length

            \n soup wrap:

            For integers only, you can get there in a slightly devious way with the following:

            def justify(n):
                return n / 1<<(n.bit_length()-1)
            

            I've no idea if it's faster without significant testing but a quick test with timeit shows it to be about twice as fast as your first snippet.

            However, converting n to a float in the numerator (to get a float return) slows it to the same speed as your original.

            def justify(n):
                return float(n) / 1<<(n.bit_length()-1)
            

            bit_length gives the minimum number of bits required to represent abs(x) which is actually going to be one more than you want for your calculation.

            I would expect log(n,2) to be heavily optimized for powers of two in the base - and it's implemented in C. So you will have trouble beating it.

            Possibly changing the denominator to 1< may give you better performance than the 2** approach .. and it seems to be about 30% faster giving

            def justify(n):
                return float(n) / (1<

            It is possible to do it completely with bitwise operators:

            def justify_bitwise(n):
               int_n = int(abs(n))
               p = 0
               while int_n != 1:
                   p += 1
                   int_n >>= 1
            
               return float(n) / (1<

            But timeit clocks this at 2.16 microseconds. An order of magnitude slower than using bit_length

            qid & accept id: (28868384, 28868630) query: Pandas Compute Unique Values per Column as Series soup:
            import numpy as np\nimport pandas as pd\nnp.random.seed(0)\ndf = pd.DataFrame(np.random.randint(5, size=(5,4)), columns=list('ABCD'))\nprint(df)\n#    A  B  C  D\n# 0  4  0  3  3\n# 1  3  1  3  2\n# 2  4  0  0  4\n# 3  2  1  0  1\n# 4  1  0  1  4\ndct = {func.__name__:df.apply(func) for func in (pd.Series.nunique, pd.Series.count)}\nprint(pd.concat(dct, axis=1))\n
            \n

            yields

            \n
               count  nunique\nA      5        4\nB      5        2\nC      5        3\nD      5        4\n
            \n soup wrap:
            import numpy as np
            import pandas as pd
            np.random.seed(0)
            df = pd.DataFrame(np.random.randint(5, size=(5,4)), columns=list('ABCD'))
            print(df)
            #    A  B  C  D
            # 0  4  0  3  3
            # 1  3  1  3  2
            # 2  4  0  0  4
            # 3  2  1  0  1
            # 4  1  0  1  4
            dct = {func.__name__:df.apply(func) for func in (pd.Series.nunique, pd.Series.count)}
            print(pd.concat(dct, axis=1))
            

            yields

               count  nunique
            A      5        4
            B      5        2
            C      5        3
            D      5        4
            
            qid & accept id: (28870344, 28870429) query: separate list elements based on semicolon soup:

            I like the idea of breaking out a function to do the parsing. You can then use that function with map, or within a list comprehension.

            \n
            inval = ['48998.tyrone-cluster;gic1_nwgs;mbupi;18:45:44;R;qp32\n', '48999.tyrone-cluster;gic2_nwgs;mbupi;0;Q;batch\n', '49005.tyrone-cluster;...01R-1849-01_2;mcbkss;00:44:23;R;qp32\n', '49032.tyrone-cluster;gaussian_top.sh;chemraja;0;Q;qp32\n', '49047.tyrone-cluster;jet_egrid;asevelt;312:33:0;R;qp128\n', '49052.tyrone-cluster;case3sqTS1e-4;mecvamsi;0;Q;qp32\n', '49053.tyrone-cluster;...01R-1850-01_1;mcbkss;0;Q;batch\n', '49054.tyrone-cluster;...01R-1850-01_2;mcbkss;0;Q;batch\n']\n\ndef parse(raw):\n    parts = raw.strip().split(';')\n    _id, _ = parts[0].split('.')\n    return _id, parts[3], parts[4], parts[5]\n\nprint map(parse, inval)\n\n# or \n# print [parse(val) for val in inval]\n
            \n

            OUTPUT

            \n
            [('48998', '18:45:44', 'R', 'qp32'),\n ('48999', '0', 'Q', 'batch'),\n ('49005', '00:44:23', 'R', 'qp32'),\n ('49032', '0', 'Q', 'qp32'),\n ('49047', '312:33:0', 'R', 'qp128'),\n ('49052', '0', 'Q', 'qp32'),\n ('49053', '0', 'Q', 'batch'),\n ('49054', '0', 'Q', 'batch')]\n
            \n

            Personally I favor readability in this type of parsing. Nested list comprehensions or more advanced techniques are completely acceptable, but simple, easy-to-follow code has extreme value in my book.

            \n soup wrap:

            I like the idea of breaking out a function to do the parsing. You can then use that function with map, or within a list comprehension.

            inval = ['48998.tyrone-cluster;gic1_nwgs;mbupi;18:45:44;R;qp32\n', '48999.tyrone-cluster;gic2_nwgs;mbupi;0;Q;batch\n', '49005.tyrone-cluster;...01R-1849-01_2;mcbkss;00:44:23;R;qp32\n', '49032.tyrone-cluster;gaussian_top.sh;chemraja;0;Q;qp32\n', '49047.tyrone-cluster;jet_egrid;asevelt;312:33:0;R;qp128\n', '49052.tyrone-cluster;case3sqTS1e-4;mecvamsi;0;Q;qp32\n', '49053.tyrone-cluster;...01R-1850-01_1;mcbkss;0;Q;batch\n', '49054.tyrone-cluster;...01R-1850-01_2;mcbkss;0;Q;batch\n']
            
            def parse(raw):
                parts = raw.strip().split(';')
                _id, _ = parts[0].split('.')
                return _id, parts[3], parts[4], parts[5]
            
            print map(parse, inval)
            
            # or 
            # print [parse(val) for val in inval]
            

            OUTPUT

            [('48998', '18:45:44', 'R', 'qp32'),
             ('48999', '0', 'Q', 'batch'),
             ('49005', '00:44:23', 'R', 'qp32'),
             ('49032', '0', 'Q', 'qp32'),
             ('49047', '312:33:0', 'R', 'qp128'),
             ('49052', '0', 'Q', 'qp32'),
             ('49053', '0', 'Q', 'batch'),
             ('49054', '0', 'Q', 'batch')]
            

            Personally I favor readability in this type of parsing. Nested list comprehensions or more advanced techniques are completely acceptable, but simple, easy-to-follow code has extreme value in my book.

            qid & accept id: (28898735, 28899047) query: Python: replace multiple values of a Matrix soup:
            M = [[0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 5, 1], [0, 1, 0, 0]]\nx=1\n\nwhile (x<=2):\n    if (x==1):\n        y=2\n    else:\n        y=0\n    while (y<=3):\n        M[x][y]="x"\n        y+=1\n    x+=1\n
            \n

            Output:

            \n
            [[0, 0, 1, 0], [1, 0, "x", "x"], ["x", "x", "x", "x"], [0, 1, 0, 0]]\n
            \n soup wrap:
            M = [[0, 0, 1, 0], [1, 0, 0, 0], [0, 0, 5, 1], [0, 1, 0, 0]]
            x=1
            
            while (x<=2):
                if (x==1):
                    y=2
                else:
                    y=0
                while (y<=3):
                    M[x][y]="x"
                    y+=1
                x+=1
            

            Output:

            [[0, 0, 1, 0], [1, 0, "x", "x"], ["x", "x", "x", "x"], [0, 1, 0, 0]]
            
            qid & accept id: (28907503, 28908257) query: mongo - find items who at least match an array of values soup:

            You don't need to use $in

            \n
            for item in col.find({"$and": [{"item1": 'a', "item1": 'b', "item1": 'c'}]}):\n    print(item)\n
            \n

            Output

            \n
            {'_id': ObjectId('54fa181014d995a397252a1a'), 'item1': ['a', 'b', 'c']}\n
            \n

            You could also use aggregation pipelines and the $setIsSubset operator.

            \n
            col.aggregate([{"$project": { "item1": 1, "is_subset": { "$setIsSubset": [ ['a', 'b', 'c'], "$item1" ] }}},{"$match": {"is_subset": True}}])\n
            \n

            Output

            \n
            {\n    'ok': 1.0,\n    'result': [\n                  {\n                      '_id': ObjectId('54fa181014d995a397252a1a'), \n                      'item1': ['a', 'b', 'c'], \n                      'is_subset': True\n                   }\n               ]\n }\n
            \n soup wrap:

            You don't need to use $in

            for item in col.find({"$and": [{"item1": 'a', "item1": 'b', "item1": 'c'}]}):
                print(item)
            

            Output

            {'_id': ObjectId('54fa181014d995a397252a1a'), 'item1': ['a', 'b', 'c']}
            

            You could also use aggregation pipelines and the $setIsSubset operator.

            col.aggregate([{"$project": { "item1": 1, "is_subset": { "$setIsSubset": [ ['a', 'b', 'c'], "$item1" ] }}},{"$match": {"is_subset": True}}])
            

            Output

            {
                'ok': 1.0,
                'result': [
                              {
                                  '_id': ObjectId('54fa181014d995a397252a1a'), 
                                  'item1': ['a', 'b', 'c'], 
                                  'is_subset': True
                               }
                           ]
             }
            
            qid & accept id: (28928049, 28928117) query: Arranging keys and values from a dictionary in a csv file - Python soup:

            You need to loop over your dictionary items then write them:

            \n
            dictionary = {'Alice': ['10', '10'], 'Tom': ['9', '8'], 'Ben': ['10', '9']}\n\nimport csv\nwith open('eggs.csv', 'wb') as csvfile:\n    spamwriter = csv.writer(csvfile, delimiter=',')\n    for item in dictionary.items():\n         spamwriter.writerow(item)\n
            \n

            Result:

            \n
            Ben     ['10', '9']\nAlice   ['10', '10']\nTom     ['9', '8']\n
            \n soup wrap:

            You need to loop over your dictionary items then write them:

            dictionary = {'Alice': ['10', '10'], 'Tom': ['9', '8'], 'Ben': ['10', '9']}
            
            import csv
            with open('eggs.csv', 'wb') as csvfile:
                spamwriter = csv.writer(csvfile, delimiter=',')
                for item in dictionary.items():
                     spamwriter.writerow(item)
            

            Result:

            Ben     ['10', '9']
            Alice   ['10', '10']
            Tom     ['9', '8']
            
            qid & accept id: (29033262, 29033521) query: Execute regex located in an external file in python soup:

            Using json as external file, try this:

            \n
            import json\n\njson_data=open('regex.json')\ndata = json.load(json_data)\n\nfor label, regex in data.items():\n    print label\n    print regex # process your regex here instead print\n
            \n

            json file:

            \n
            {\n "email" : "([^@|\\s]+@[^@]+\\.[^@|\\s]+)",\n "phone" : "(\\d{3}[-\\.\\s]??\\d{3}[-\\.\\s]??\\d{4}|\\(\\d{3}\\)\\s*\\d{3}[-\\.\\s]??\\d{4}|\\d{3}[-\\.\\s]??\\d{4})"\n}\n
            \n soup wrap:

            Using json as external file, try this:

            import json
            
            json_data=open('regex.json')
            data = json.load(json_data)
            
            for label, regex in data.items():
                print label
                print regex # process your regex here instead print
            

            json file:

            {
             "email" : "([^@|\\s]+@[^@]+\\.[^@|\\s]+)",
             "phone" : "(\\d{3}[-\\.\\s]??\\d{3}[-\\.\\s]??\\d{4}|\\(\\d{3}\\)\\s*\\d{3}[-\\.\\s]??\\d{4}|\\d{3}[-\\.\\s]??\\d{4})"
            }
            
            qid & accept id: (29036344, 29072305) query: Django Model Design - Many-To-Many Fields soup:

            Тhis is probably the simplest solution:

            \n

            models.py

            \n
            from django.db import models\n\nclass CheckList(models.Model):\n    name = models.CharField(max_length=255)\n    checklist_type = models.ForeignKey('CheckListType')\n    options = models.ManyToManyField('CheckListOption', blank=True)\n\n    def __unicode__(self):\n        return self.name\n\nclass CheckListType(models.Model):\n    name = models.CharField(max_length=255)\n    options = models.ManyToManyField('CheckListOption')\n\n    def __unicode__(self):\n        return self.name\n\nclass CheckListOption(models.Model):\n    name = models.CharField(max_length=255)\n\n    def __unicode__(self):\n        return self.name\n
            \n

            forms.py

            \n
            from django import forms\n\nfrom .models import CheckList, CheckListOption\n\nclass CheckListForm(forms.ModelForm):\n    class Meta:\n        model = CheckList\n        fields = '__all__'\n\n    def __init__(self, *args, **kwargs):\n        super(CheckListForm, self).__init__(*args, **kwargs)\n        if self.instance.pk:\n            self.fields['options'].queryset = CheckListOption.objects.filter(\n                checklisttype=self.instance.checklist_type_id\n            )\n        else:\n            self.fields['options'].queryset = CheckListOption.objects.none()\n
            \n

            admin.py

            \n
            from django.contrib import admin\n\nfrom .forms import CheckListForm\nfrom .models import CheckList, CheckListType, CheckListOption\n\nclass CheckListAdmin(admin.ModelAdmin):\n    form = CheckListForm\n\nadmin.site.register(CheckList, CheckListAdmin)\nadmin.site.register(CheckListType)\nadmin.site.register(CheckListOption)\n
            \n

            There is only one drawback - when you already have a saved CheckList instance and you want to change the checklist_type, you wont get the new options on the moment. The user doing the change should unselect the selected options (this is kind of an optional, but if not done, the selected options will remain until the next save), save the model and edit it again to chose the new options.

            \n soup wrap:

            Тhis is probably the simplest solution:

            models.py

            from django.db import models
            
            class CheckList(models.Model):
                name = models.CharField(max_length=255)
                checklist_type = models.ForeignKey('CheckListType')
                options = models.ManyToManyField('CheckListOption', blank=True)
            
                def __unicode__(self):
                    return self.name
            
            class CheckListType(models.Model):
                name = models.CharField(max_length=255)
                options = models.ManyToManyField('CheckListOption')
            
                def __unicode__(self):
                    return self.name
            
            class CheckListOption(models.Model):
                name = models.CharField(max_length=255)
            
                def __unicode__(self):
                    return self.name
            

            forms.py

            from django import forms
            
            from .models import CheckList, CheckListOption
            
            class CheckListForm(forms.ModelForm):
                class Meta:
                    model = CheckList
                    fields = '__all__'
            
                def __init__(self, *args, **kwargs):
                    super(CheckListForm, self).__init__(*args, **kwargs)
                    if self.instance.pk:
                        self.fields['options'].queryset = CheckListOption.objects.filter(
                            checklisttype=self.instance.checklist_type_id
                        )
                    else:
                        self.fields['options'].queryset = CheckListOption.objects.none()
            

            admin.py

            from django.contrib import admin
            
            from .forms import CheckListForm
            from .models import CheckList, CheckListType, CheckListOption
            
            class CheckListAdmin(admin.ModelAdmin):
                form = CheckListForm
            
            admin.site.register(CheckList, CheckListAdmin)
            admin.site.register(CheckListType)
            admin.site.register(CheckListOption)
            

            There is only one drawback - when you already have a saved CheckList instance and you want to change the checklist_type, you wont get the new options on the moment. The user doing the change should unselect the selected options (this is kind of an optional, but if not done, the selected options will remain until the next save), save the model and edit it again to chose the new options.

            qid & accept id: (29041324, 29041623) query: Split a string by three delimiters, and adding them to different lists soup:

            This should work. First it splits by the |. The last element of that contains your answers. Then we remove the answers from the keys. Now we split the answers into their parts. Finally we get the correct answer by searching for the ! and save that in the string answer.

            \n
            with open("questions.txt", "r") as questions:\n    keys = questions.read().split('|')\n    answers = keys[3]\n    keys[3] = keys[3].split('/', 1)[0]\n\n    answers = answers.split('/')[1:]\n\n    answer = [x for x in answers if '!' in x][0].strip('!')\n\n    answers = [x.strip('!') for x in answers]\n\n    print(keys)\n    print(answers)\n    print(answer)\n
            \n

            Output

            \n
            ['Poles', 'Magnet', '?', 'Battery']\n['Charge', 'Ends', 'Magic', 'Metal']\nCharge\n
            \n soup wrap:

            This should work. First it splits by the |. The last element of that contains your answers. Then we remove the answers from the keys. Now we split the answers into their parts. Finally we get the correct answer by searching for the ! and save that in the string answer.

            with open("questions.txt", "r") as questions:
                keys = questions.read().split('|')
                answers = keys[3]
                keys[3] = keys[3].split('/', 1)[0]
            
                answers = answers.split('/')[1:]
            
                answer = [x for x in answers if '!' in x][0].strip('!')
            
                answers = [x.strip('!') for x in answers]
            
                print(keys)
                print(answers)
                print(answer)
            

            Output

            ['Poles', 'Magnet', '?', 'Battery']
            ['Charge', 'Ends', 'Magic', 'Metal']
            Charge
            
            qid & accept id: (29058135, 29058168) query: Python - Split a list of integers into positive and negative soup:

            You can do this in O(n) using a defaultdict():

            \n
            In [3]: from collections import defaultdict\n\nIn [4]: d = defaultdict(list)\n\nIn [5]: for num in A:\n   ...:     if num < 0:\n   ...:         d['neg'].append(num)\n   ...:     else: # This will also append zero to the positive list, you can change the behavior by modifying the conditions \n   ...:         d['pos'].append(num)\n   ...:         \n\nIn [6]: d\nOut[6]: defaultdict(, {'neg': [-3, -2, -5, -7], 'pos': [1, 8, 4, 6]})\n
            \n

            Another way is using two separate list comprehensions (not recommended fro long lists):

            \n
            >>> B,C=[i for i in A if i<0 ],[j for j in A if j>0]\n>>> B\n[-3, -2, -5, -7]\n>>> C\n[1, 8, 4, 6]\n
            \n

            Or using filter function :

            \n
            In [19]: list(filter((0).__lt__,A))\nOut[19]: [1, 8, 4, 6]\n\nIn [20]: list(filter((0).__gt__,A))\nOut[20]: [-3, -2, -5, -7]\n
            \n soup wrap:

            You can do this in O(n) using a defaultdict():

            In [3]: from collections import defaultdict
            
            In [4]: d = defaultdict(list)
            
            In [5]: for num in A:
               ...:     if num < 0:
               ...:         d['neg'].append(num)
               ...:     else: # This will also append zero to the positive list, you can change the behavior by modifying the conditions 
               ...:         d['pos'].append(num)
               ...:         
            
            In [6]: d
            Out[6]: defaultdict(, {'neg': [-3, -2, -5, -7], 'pos': [1, 8, 4, 6]})
            

            Another way is using two separate list comprehensions (not recommended fro long lists):

            >>> B,C=[i for i in A if i<0 ],[j for j in A if j>0]
            >>> B
            [-3, -2, -5, -7]
            >>> C
            [1, 8, 4, 6]
            

            Or using filter function :

            In [19]: list(filter((0).__lt__,A))
            Out[19]: [1, 8, 4, 6]
            
            In [20]: list(filter((0).__gt__,A))
            Out[20]: [-3, -2, -5, -7]
            
            qid & accept id: (29058643, 29058652) query: Extend Python list "inline" soup:

            Just concatenate these lists with + operator:

            \n
            range(15, 30, 3) + [0]\n
            \n

            Or, if you need an iterator and the list is huge, use itertools.chain:

            \n
            import itertools\nit = itertools.chain(range(15, 30, 3), [0])\n
            \n

            A quick note: range creates a range object in Python 3+, which doesn't allow concatenation:

            \n
            \n

            Ranges implement all of the common sequence operations except\n concatenation and repetition (due to the fact that range objects can\n only represent sequences that follow a strict pattern and repetition\n and concatenation will usually violate that pattern).

            \n
            \n soup wrap:

            Just concatenate these lists with + operator:

            range(15, 30, 3) + [0]
            

            Or, if you need an iterator and the list is huge, use itertools.chain:

            import itertools
            it = itertools.chain(range(15, 30, 3), [0])
            

            A quick note: range creates a range object in Python 3+, which doesn't allow concatenation:

            Ranges implement all of the common sequence operations except concatenation and repetition (due to the fact that range objects can only represent sequences that follow a strict pattern and repetition and concatenation will usually violate that pattern).

            qid & accept id: (29083478, 29083822) query: Diagonals at different points of a 2D list in Python soup:

            I'm not quite sure what you want, but this code gives you all the complete diagonals in each direction:

            \n
            L = [[1,2,3],[4,5,6], [7,8,9]]\n# number of rows, number of columns: ie L is m x n\nm, n = len(L), len(L[0])\n\n# Retreive the NE-SW (diag1) and NW-SE (diag2) diagonals\ndiag1 = []\ndiag2 = []\nfor p in range(m+n-1):\n    diag1.append([])\n    diag2.append([])\n    q1 = 0\n    if p >= n:\n        q1 = p - n + 1\n    q2 = m\n    if p < m-1:\n        q2 = p+1\n    for q in range(q1, q2):\n        x, y = p - q, q\n        diag1[-1].append(L[y][x])\n        # To get the other diagonal, read each row "backwards"\n        x = n - x - 1\n        diag2[-1].append(L[y][x])\nprint 'diag1:', diag1\nprint 'diag2:', diag2\n
            \n

            That is:

            \n
            diag1: [[1], [2, 4], [3, 5, 7], [6, 8], [9]]\ndiag2: [[3], [2, 6], [1, 5, 9], [4, 8], [7]]\n
            \n soup wrap:

            I'm not quite sure what you want, but this code gives you all the complete diagonals in each direction:

            L = [[1,2,3],[4,5,6], [7,8,9]]
            # number of rows, number of columns: ie L is m x n
            m, n = len(L), len(L[0])
            
            # Retreive the NE-SW (diag1) and NW-SE (diag2) diagonals
            diag1 = []
            diag2 = []
            for p in range(m+n-1):
                diag1.append([])
                diag2.append([])
                q1 = 0
                if p >= n:
                    q1 = p - n + 1
                q2 = m
                if p < m-1:
                    q2 = p+1
                for q in range(q1, q2):
                    x, y = p - q, q
                    diag1[-1].append(L[y][x])
                    # To get the other diagonal, read each row "backwards"
                    x = n - x - 1
                    diag2[-1].append(L[y][x])
            print 'diag1:', diag1
            print 'diag2:', diag2
            

            That is:

            diag1: [[1], [2, 4], [3, 5, 7], [6, 8], [9]]
            diag2: [[3], [2, 6], [1, 5, 9], [4, 8], [7]]
            
            qid & accept id: (29087010, 29087042) query: call __str__ inside a class? python 3.X soup:

            You should not be calling __str__ directly. It should be called like str(object) or let it automatically convert.

            \n

            Also that just returns a readable string representing the object. It does not print it. You would want to do.

            \n
            print(self.__str__())\n
            \n

            or just

            \n
            print(self)\n
            \n soup wrap:

            You should not be calling __str__ directly. It should be called like str(object) or let it automatically convert.

            Also that just returns a readable string representing the object. It does not print it. You would want to do.

            print(self.__str__())
            

            or just

            print(self)
            
            qid & accept id: (29112610, 29114421) query: Finding median in Spark Streaming soup:

            To expand on @Justin 's answer: what is happening:

            \n
            median()\n
            \n

            is being applied to each DStream individually. However the result is not used .. by anyone.. Why? foreachRdd() is an action not a transformation.

            \n

            You should look at the DStream transformations: e.g. map(): here is code that is not 100% debugged yet - but it provides a structure:

            \n
            from pyspark.streaming import *\nssc = StreamingContext(sc, 30)\ndataRdd = [sc.parallelize(d, 1) for d in [[1,2,3],[4,5],[6,7,8,9,9]]]\nqs = ssc.queueStream(dataRdd)\n\ndef list_median((med,mylist),newval):\n    mylist = [newval] if not mylist else mylist.append(newval)\n    mylist = sorted(mylist)\n    return (mylist[int(len(mylist)/2)], mylist)\n\nmedians = qs.reduce(list_median).map(lambda (med,list): med)\ndef printRec(rdd):\n    import sys\n    rdd.foreach(lambda rec: sys.stderr.write(repr(rec)))\n\nmedians.foreachRDD(printRec)\nssc.start(); ssc.awaitTermination()\n
            \n soup wrap:

            To expand on @Justin 's answer: what is happening:

            median()
            

            is being applied to each DStream individually. However the result is not used .. by anyone.. Why? foreachRdd() is an action not a transformation.

            You should look at the DStream transformations: e.g. map(): here is code that is not 100% debugged yet - but it provides a structure:

            from pyspark.streaming import *
            ssc = StreamingContext(sc, 30)
            dataRdd = [sc.parallelize(d, 1) for d in [[1,2,3],[4,5],[6,7,8,9,9]]]
            qs = ssc.queueStream(dataRdd)
            
            def list_median((med,mylist),newval):
                mylist = [newval] if not mylist else mylist.append(newval)
                mylist = sorted(mylist)
                return (mylist[int(len(mylist)/2)], mylist)
            
            medians = qs.reduce(list_median).map(lambda (med,list): med)
            def printRec(rdd):
                import sys
                rdd.foreach(lambda rec: sys.stderr.write(repr(rec)))
            
            medians.foreachRDD(printRec)
            ssc.start(); ssc.awaitTermination()
            
            qid & accept id: (29114133, 29115134) query: Iterator for all lexicographically ordered variable strings up to length n soup:

            Working with itertools early in the morning is a recipe for disaster, but something like

            \n
            from itertools import product, takewhile\ndef new(max_len_string, alphabet=range(2)):\n    alphabet = list(alphabet)\n    zero = alphabet[0]\n    for p in product(alphabet, repeat=max_len_string):\n        right_zeros = sum(1 for _ in takewhile(lambda x: x==zero, reversed(p)))\n        base = p[:-right_zeros]\n        yield from filter(None, (base+(zero,)*i for i in range(right_zeros)))\n        yield p\n
            \n

            should work:

            \n
            >>> list(new(3)) == list(variable_strings_complete(3))\nTrue\n>>> list(new(20)) == list(variable_strings_complete(20))\nTrue\n>>> list(new(10, alphabet=range(4))) == list(variable_strings_complete(10, range(4)))\nTrue\n
            \n

            This assumes the alphabet is passed in the canonical order; list can be replaced with sorted if that's not the case.

            \n soup wrap:

            Working with itertools early in the morning is a recipe for disaster, but something like

            from itertools import product, takewhile
            def new(max_len_string, alphabet=range(2)):
                alphabet = list(alphabet)
                zero = alphabet[0]
                for p in product(alphabet, repeat=max_len_string):
                    right_zeros = sum(1 for _ in takewhile(lambda x: x==zero, reversed(p)))
                    base = p[:-right_zeros]
                    yield from filter(None, (base+(zero,)*i for i in range(right_zeros)))
                    yield p
            

            should work:

            >>> list(new(3)) == list(variable_strings_complete(3))
            True
            >>> list(new(20)) == list(variable_strings_complete(20))
            True
            >>> list(new(10, alphabet=range(4))) == list(variable_strings_complete(10, range(4)))
            True
            

            This assumes the alphabet is passed in the canonical order; list can be replaced with sorted if that's not the case.

            qid & accept id: (29114975, 29115196) query: Need to read from a file and add the elements and get avg in python 3.4 soup:
            array = [ [int(s) for s in line.split()] for line in open('file') ]\nfor line in array:\n    print('%08i %3.1f %3i %3i' % (line[0], sum(line[1:])/len(line[1:]), min(line[1:]), max(line[1:])))\n
            \n

            This produces the output:

            \n
            75647485 14.4  10  20\n63338495 17.2  11  20\n00453621 11.2   3  20\n90812341 15.2   7  20\n
            \n

            Alternate Version

            \n

            To assure that the file handle is properly closed, this version uses with. Also, string formatting is done with the more modern format function:

            \n
            with open('file') as f:\n    array = [ [int(s) for s in line.split()] for line in f ]\nfor line in array:\n    print('{:08.0f} {:3.1f} {:3.0f} {:3.0f}'.format(line[0], sum(line[1:])/len(line[1:]), min(line[1:]), max(line[1:])))\n
            \n soup wrap:
            array = [ [int(s) for s in line.split()] for line in open('file') ]
            for line in array:
                print('%08i %3.1f %3i %3i' % (line[0], sum(line[1:])/len(line[1:]), min(line[1:]), max(line[1:])))
            

            This produces the output:

            75647485 14.4  10  20
            63338495 17.2  11  20
            00453621 11.2   3  20
            90812341 15.2   7  20
            

            Alternate Version

            To assure that the file handle is properly closed, this version uses with. Also, string formatting is done with the more modern format function:

            with open('file') as f:
                array = [ [int(s) for s in line.split()] for line in f ]
            for line in array:
                print('{:08.0f} {:3.1f} {:3.0f} {:3.0f}'.format(line[0], sum(line[1:])/len(line[1:]), min(line[1:]), max(line[1:])))
            
            qid & accept id: (29153805, 29153865) query: pandas: Use if-else to populate new column soup:

            You could convert the boolean series df.col2 > 0 to an integer series (True becomes 1 and False becomes 0):

            \n
            df['col3'] = (df.col2 > 0).astype('int')\n
            \n

            (To create a new column, you simply need to name it and assign it to a Series, array or list of the same length as your DataFrame.)

            \n

            This produces col3 as:

            \n
               col2  col3\n0     0     0\n1     1     1\n2     0     0\n3     0     0\n4     3     1\n5     0     0\n6     4     1\n
            \n

            Another way to create the column could be to use np.where, which lets you specify a value for either of the true or false values and is perhaps closer to the syntax of the R function ifelse. For example:

            \n
            >>> np.where(df['col2'] > 0, 4, -1)\narray([-1,  4, -1, -1,  4, -1,  4])\n
            \n soup wrap:

            You could convert the boolean series df.col2 > 0 to an integer series (True becomes 1 and False becomes 0):

            df['col3'] = (df.col2 > 0).astype('int')
            

            (To create a new column, you simply need to name it and assign it to a Series, array or list of the same length as your DataFrame.)

            This produces col3 as:

               col2  col3
            0     0     0
            1     1     1
            2     0     0
            3     0     0
            4     3     1
            5     0     0
            6     4     1
            

            Another way to create the column could be to use np.where, which lets you specify a value for either of the true or false values and is perhaps closer to the syntax of the R function ifelse. For example:

            >>> np.where(df['col2'] > 0, 4, -1)
            array([-1,  4, -1, -1,  4, -1,  4])
            
            qid & accept id: (29153930, 29153957) query: Changing constraint naming conventions in Flask-SQLAlchemy soup:

            Version 2.1 introduced the metadata argument to the extension. If all you want to customize about the base model is the metadata, you can pass a custom MetaData instance to it.

            \n
            db = SQLAlchemy(metadata=MetaData(naming_convention={\n    'pk': 'pk_%(table_name)s',\n    'fk': 'fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s',\n    'ix': 'ix_%(table_name)s_%(column_0_name)s',\n    'uq': 'uq_%(table_name)s_%(column_0_name)s',\n    'ck': 'ck_%(table_name)s_%(constraint_name)s',\n}))\n
            \n
            \n

            Previously, you would subclass the SQLAlchemy class and override make_declarative_base. This still works, and is useful if you need to further customize the base model.

            \n
            from flask_sqlalchemy import SQLAlchemy as BaseSQLAlchemy, Model, _BoundDeclarativeMeta, _QueryProperty\nfrom sqlalchemy import MetaData\n\nclass SQLAlchemy(BaseSQLAlchemy):\n    def make_declarative_base(self):\n        metadata = MetaData(naming_convention={\n            'pk': 'pk_%(table_name)s',\n            'fk': 'fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s',\n            'ix': 'ix_%(table_name)s_%(column_0_name)s',\n            'uq': 'uq_%(table_name)s_%(column_0_name)s',\n            'ck': 'ck_%(table_name)s_%(constraint_name)s',\n        })\n        base = declarative_base(metadata=metadata, cls=Model, name='Model', metaclass=_BoundDeclarativeMeta)\n        base.query = _QueryProperty(self)\n        return base\n
            \n soup wrap:

            Version 2.1 introduced the metadata argument to the extension. If all you want to customize about the base model is the metadata, you can pass a custom MetaData instance to it.

            db = SQLAlchemy(metadata=MetaData(naming_convention={
                'pk': 'pk_%(table_name)s',
                'fk': 'fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s',
                'ix': 'ix_%(table_name)s_%(column_0_name)s',
                'uq': 'uq_%(table_name)s_%(column_0_name)s',
                'ck': 'ck_%(table_name)s_%(constraint_name)s',
            }))
            

            Previously, you would subclass the SQLAlchemy class and override make_declarative_base. This still works, and is useful if you need to further customize the base model.

            from flask_sqlalchemy import SQLAlchemy as BaseSQLAlchemy, Model, _BoundDeclarativeMeta, _QueryProperty
            from sqlalchemy import MetaData
            
            class SQLAlchemy(BaseSQLAlchemy):
                def make_declarative_base(self):
                    metadata = MetaData(naming_convention={
                        'pk': 'pk_%(table_name)s',
                        'fk': 'fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s',
                        'ix': 'ix_%(table_name)s_%(column_0_name)s',
                        'uq': 'uq_%(table_name)s_%(column_0_name)s',
                        'ck': 'ck_%(table_name)s_%(constraint_name)s',
                    })
                    base = declarative_base(metadata=metadata, cls=Model, name='Model', metaclass=_BoundDeclarativeMeta)
                    base.query = _QueryProperty(self)
                    return base
            
            qid & accept id: (29155628, 29156034) query: Python minidom - Parse XML file and write to CSV soup:

            Use csv module.

            \n
            # Test Parser\n\nfrom xml.dom.minidom import parse\nimport csv \n\n\ndef writeToCSV(myLibrary):\n    csvfile = open('output.csv', 'w')\n    fieldnames = ['title', 'author']\n    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)\n    writer.writeheader()\n\n    books = myLibrary.getElementsByTagName("book")\n    for book in books:\n        titleValue = book.getElementsByTagName("title")[0].childNodes[0].data\n        for author in book.getElementsByTagName("author"):\n            authorValue = author.childNodes[0].data\n            writer.writerow({'title': titleValue, 'author': authorValue})\n\ndoc = parse('library.xml')\nmyLibrary = doc.getElementsByTagName("library")[0]\n\n# Get book elements in library\nbooks = myLibrary.getElementsByTagName("book")\n\n# Print each book's title\nwriteToCSV(myLibrary)\n
            \n

            Output file:

            \n
            title,author\nSandman Volume 1: Preludes and Nocturnes,Neil Gaiman\nGood Omens,Neil Gamain\nGood Omens,Terry Pratchett\n"""Repent, Harlequin!"" Said the Tick-Tock Man",Harlan Ellison\n
            \n soup wrap:

            Use csv module.

            # Test Parser
            
            from xml.dom.minidom import parse
            import csv 
            
            
            def writeToCSV(myLibrary):
                csvfile = open('output.csv', 'w')
                fieldnames = ['title', 'author']
                writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
                writer.writeheader()
            
                books = myLibrary.getElementsByTagName("book")
                for book in books:
                    titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
                    for author in book.getElementsByTagName("author"):
                        authorValue = author.childNodes[0].data
                        writer.writerow({'title': titleValue, 'author': authorValue})
            
            doc = parse('library.xml')
            myLibrary = doc.getElementsByTagName("library")[0]
            
            # Get book elements in library
            books = myLibrary.getElementsByTagName("book")
            
            # Print each book's title
            writeToCSV(myLibrary)
            

            Output file:

            title,author
            Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
            Good Omens,Neil Gamain
            Good Omens,Terry Pratchett
            """Repent, Harlequin!"" Said the Tick-Tock Man",Harlan Ellison
            
            qid & accept id: (29192826, 29192961) query: R's relevel() and factor variables in linear regression in pandas soup:

            You could use pd.get_dummies:

            \n
            import pandas as pd\nd = {'a': [1,2,3,4,3,3,3], 'b': [5,6,7,8,4,4,4], 'c': [9,10,11,12,3,3,3], \n     'd': pd.Series(['red', 'blue', 'green', 'red', 'orange', 'blue', 'red'], \n                    dtype='category')}\ndf = pd.DataFrame(d)\ndummies = pd.get_dummies(df['d'])\ndf = pd.concat([df, dummies], axis=1)\ndf = df.drop(['d', 'green'], axis=1)\nprint(df)\n
            \n

            yields

            \n
               a  b   c  blue  orange  red\n0  1  5   9     0       0    1\n1  2  6  10     1       0    0\n2  3  7  11     0       0    0\n3  4  8  12     0       0    1\n4  3  4   3     0       1    0\n5  3  4   3     1       0    0\n6  3  4   3     0       0    1\n
            \n
            \n

            Using statsmodels,

            \n
            import statsmodels.formula.api as smf\nmodel = smf.ols('a ~ b + c + blue + orange + red', df).fit()\nprint(model.summary())\n
            \n

            yields

            \n
                                        OLS Regression Results                            \n==============================================================================\nDep. Variable:                      a   R-squared:                       1.000\nModel:                            OLS   Adj. R-squared:                  1.000\nMethod:                 Least Squares   F-statistic:                 2.149e+25\nDate:                Sun, 22 Mar 2015   Prob (F-statistic):           1.64e-13\nTime:                        05:57:33   Log-Likelihood:                 200.74\nNo. Observations:                   7   AIC:                            -389.5\nDf Residuals:                       1   BIC:                            -389.8\nDf Model:                           5                                         \nCovariance Type:            nonrobust                                         \n==============================================================================\n                 coef    std err          t      P>|t|      [95.0% Conf. Int.]\n------------------------------------------------------------------------------\nIntercept     -1.6000   6.11e-13  -2.62e+12      0.000        -1.600    -1.600\nb              1.6000   1.59e-13   1.01e+13      0.000         1.600     1.600\nc             -0.6000   6.36e-14  -9.44e+12      0.000        -0.600    -0.600\nblue         1.11e-16   3.08e-13      0.000      1.000     -3.91e-12  3.91e-12\norange      7.994e-15   3.87e-13      0.021      0.987     -4.91e-12  4.93e-12\nred         4.829e-15   2.75e-13      0.018      0.989     -3.49e-12   3.5e-12\n==============================================================================\nOmnibus:                          nan   Durbin-Watson:                   0.203\nProb(Omnibus):                    nan   Jarque-Bera (JB):                0.752\nSkew:                           0.200   Prob(JB):                        0.687\nKurtosis:                       1.445   Cond. No.                         85.2\n==============================================================================\n\nWarnings:\n[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n
            \n
            \n

            Alternatively, you could use a patsy formula to specify the dummy contrast:

            \n
            import pandas as pd\nimport statsmodels.formula.api as smf\n\nd = {'a': [1,2,3,4,3,3,3], 'b': [5,6,7,8,4,4,4], 'c': [9,10,11,12,3,3,3], \n     'd': ['red', 'blue', 'green', 'red', 'orange', 'blue', 'red']}\ndf = pd.DataFrame(d)\n\nmodel = smf.ols('a ~ b + c + C(d, Treatment(reference="green"))', df).fit()\nprint(model.summary())\n
            \n

            References:

            \n\n soup wrap:

            You could use pd.get_dummies:

            import pandas as pd
            d = {'a': [1,2,3,4,3,3,3], 'b': [5,6,7,8,4,4,4], 'c': [9,10,11,12,3,3,3], 
                 'd': pd.Series(['red', 'blue', 'green', 'red', 'orange', 'blue', 'red'], 
                                dtype='category')}
            df = pd.DataFrame(d)
            dummies = pd.get_dummies(df['d'])
            df = pd.concat([df, dummies], axis=1)
            df = df.drop(['d', 'green'], axis=1)
            print(df)
            

            yields

               a  b   c  blue  orange  red
            0  1  5   9     0       0    1
            1  2  6  10     1       0    0
            2  3  7  11     0       0    0
            3  4  8  12     0       0    1
            4  3  4   3     0       1    0
            5  3  4   3     1       0    0
            6  3  4   3     0       0    1
            

            Using statsmodels,

            import statsmodels.formula.api as smf
            model = smf.ols('a ~ b + c + blue + orange + red', df).fit()
            print(model.summary())
            

            yields

                                        OLS Regression Results                            
            ==============================================================================
            Dep. Variable:                      a   R-squared:                       1.000
            Model:                            OLS   Adj. R-squared:                  1.000
            Method:                 Least Squares   F-statistic:                 2.149e+25
            Date:                Sun, 22 Mar 2015   Prob (F-statistic):           1.64e-13
            Time:                        05:57:33   Log-Likelihood:                 200.74
            No. Observations:                   7   AIC:                            -389.5
            Df Residuals:                       1   BIC:                            -389.8
            Df Model:                           5                                         
            Covariance Type:            nonrobust                                         
            ==============================================================================
                             coef    std err          t      P>|t|      [95.0% Conf. Int.]
            ------------------------------------------------------------------------------
            Intercept     -1.6000   6.11e-13  -2.62e+12      0.000        -1.600    -1.600
            b              1.6000   1.59e-13   1.01e+13      0.000         1.600     1.600
            c             -0.6000   6.36e-14  -9.44e+12      0.000        -0.600    -0.600
            blue         1.11e-16   3.08e-13      0.000      1.000     -3.91e-12  3.91e-12
            orange      7.994e-15   3.87e-13      0.021      0.987     -4.91e-12  4.93e-12
            red         4.829e-15   2.75e-13      0.018      0.989     -3.49e-12   3.5e-12
            ==============================================================================
            Omnibus:                          nan   Durbin-Watson:                   0.203
            Prob(Omnibus):                    nan   Jarque-Bera (JB):                0.752
            Skew:                           0.200   Prob(JB):                        0.687
            Kurtosis:                       1.445   Cond. No.                         85.2
            ==============================================================================
            
            Warnings:
            [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
            

            Alternatively, you could use a patsy formula to specify the dummy contrast:

            import pandas as pd
            import statsmodels.formula.api as smf
            
            d = {'a': [1,2,3,4,3,3,3], 'b': [5,6,7,8,4,4,4], 'c': [9,10,11,12,3,3,3], 
                 'd': ['red', 'blue', 'green', 'red', 'orange', 'blue', 'red']}
            df = pd.DataFrame(d)
            
            model = smf.ols('a ~ b + c + C(d, Treatment(reference="green"))', df).fit()
            print(model.summary())
            

            References:

            qid & accept id: (29195983, 29196308) query: Python - workaround with sets soup:

            One approach is to use format and zfill

            \n
            n = 27\nprint "{0:b}".format(n).zfill(10) # prints "0000011010"\n
            \n

            You can also accomplish this with just a single format (although it's a little harder to read)

            \n
            n = 27\nprint "{0:010b}".format(n) # prints "0000011010"\n
            \n soup wrap:

            One approach is to use format and zfill

            n = 27
            print "{0:b}".format(n).zfill(10) # prints "0000011010"
            

            You can also accomplish this with just a single format (although it's a little harder to read)

            n = 27
            print "{0:010b}".format(n) # prints "0000011010"
            
            qid & accept id: (29224567, 29225081) query: Python: compare an array element-wise with a float soup:

            Suppose we have the same shape of a array of arrays you mention:

            \n
            >>> A=np.array([np.random.random((4,3)), np.random.random((3,2))])\n>>> A\narray([ array([[ 0.20621572,  0.83799579,  0.11064094],\n       [ 0.43473089,  0.68767982,  0.36339786],\n       [ 0.91399729,  0.1408565 ,  0.76830952],\n       [ 0.17096626,  0.49473758,  0.158627  ]]),\n       array([[ 0.95823229,  0.75178047],\n       [ 0.25873872,  0.67465796],\n       [ 0.83685788,  0.21377079]])], dtype=object)\n
            \n

            We can test each elements with a where clause:

            \n
            >>> A[0]>.2\narray([[ True,  True, False],\n       [ True,  True,  True],\n       [ True, False,  True],\n       [False,  True, False]], dtype=bool)\n
            \n

            But not the whole thing:

            \n
            >>> A>.2\nTraceback (most recent call last):\n  File "", line 1, in \nValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()\n
            \n

            So just rebuild the array B thus:

            \n
            >>> B=np.array([a>.2 for a in A])\n>>> B\narray([ array([[ True,  True, False],\n       [ True,  True,  True],\n       [ True, False,  True],\n       [False,  True, False]], dtype=bool),\n       array([[ True,  True],\n       [ True,  True],\n       [ True,  True]], dtype=bool)], dtype=object)\n
            \n soup wrap:

            Suppose we have the same shape of a array of arrays you mention:

            >>> A=np.array([np.random.random((4,3)), np.random.random((3,2))])
            >>> A
            array([ array([[ 0.20621572,  0.83799579,  0.11064094],
                   [ 0.43473089,  0.68767982,  0.36339786],
                   [ 0.91399729,  0.1408565 ,  0.76830952],
                   [ 0.17096626,  0.49473758,  0.158627  ]]),
                   array([[ 0.95823229,  0.75178047],
                   [ 0.25873872,  0.67465796],
                   [ 0.83685788,  0.21377079]])], dtype=object)
            

            We can test each elements with a where clause:

            >>> A[0]>.2
            array([[ True,  True, False],
                   [ True,  True,  True],
                   [ True, False,  True],
                   [False,  True, False]], dtype=bool)
            

            But not the whole thing:

            >>> A>.2
            Traceback (most recent call last):
              File "", line 1, in 
            ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
            

            So just rebuild the array B thus:

            >>> B=np.array([a>.2 for a in A])
            >>> B
            array([ array([[ True,  True, False],
                   [ True,  True,  True],
                   [ True, False,  True],
                   [False,  True, False]], dtype=bool),
                   array([[ True,  True],
                   [ True,  True],
                   [ True,  True]], dtype=bool)], dtype=object)
            
            qid & accept id: (29238534, 29238701) query: Execute a string as a command soup:

            You can use eval.eval() is used to evaluate expression, If you want to execute a statement, use exec()

            \n

            See example for eval:

            \n
            def fun():\n    print "in fun"\n\neval("fun()")\n\nx="fun()"\neval(x)\n
            \n

            See example for exec.

            \n
            exec("print 'hi'")\n
            \n soup wrap:

            You can use eval.eval() is used to evaluate expression, If you want to execute a statement, use exec()

            See example for eval:

            def fun():
                print "in fun"
            
            eval("fun()")
            
            x="fun()"
            eval(x)
            

            See example for exec.

            exec("print 'hi'")
            
            qid & accept id: (29246483, 29246522) query: Removing an element from a list and a corresponding value soup:

            It would be easier to pick an index at random from myList.

            \n
            from random import randint\n\nmyList = ['a', 'b', 'c']\nmyOtherList = [1, 2, 3]\n\nindex = randint(0, len(myList)-1)\n\ndel myList[index]\ndel myOtherList[index]\n
            \n

            But if you are stuck with picking an item, just get the index of the item with the... index function!

            \n
            index = myList.index(chosen_element)\n
            \n soup wrap:

            It would be easier to pick an index at random from myList.

            from random import randint
            
            myList = ['a', 'b', 'c']
            myOtherList = [1, 2, 3]
            
            index = randint(0, len(myList)-1)
            
            del myList[index]
            del myOtherList[index]
            

            But if you are stuck with picking an item, just get the index of the item with the... index function!

            index = myList.index(chosen_element)
            
            qid & accept id: (29258758, 29260111) query: How to use an array to keep track of different numbers? soup:

            I really don't understand the Assignment sheet, Specifically boxes needed to buy, which will always return 10, but I think this is what it is asking for:

            \n
            #!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport random   \ndef box():\n    startbox = 0\n    allcards = 0\n    cards = [1,2,3,4,5,6,7,8,9,10]\n    curcards = []\n    while True:\n        randomn = random.randrange(0,10)\n        allcards = allcards+1\n        if str(cards[randomn]) not in curcards:\n            cards[randomn]\n            startbox = startbox + 1\n            curcards.append(str(cards[randomn]))\n        if len(curcards) == 10:\n            break\n    return 'Boxes to buy: ' + str(startbox) + ' Cards Found: ' + '; '.join(curcards) + ' Total amount of cards: ' + str(allcards)\n\n#print box()\nbox()\n
            \n

            Returns:

            \n
            Boxes to buy: 10 Cards Found: 9; 1; 2; 8; 5; 7; 10; 6; 4; 3 Total Amount Of Cards: 31\n
            \n soup wrap:

            I really don't understand the Assignment sheet, Specifically boxes needed to buy, which will always return 10, but I think this is what it is asking for:

            #!/usr/bin/env python
            # -*- coding: utf-8 -*-
            
            import random   
            def box():
                startbox = 0
                allcards = 0
                cards = [1,2,3,4,5,6,7,8,9,10]
                curcards = []
                while True:
                    randomn = random.randrange(0,10)
                    allcards = allcards+1
                    if str(cards[randomn]) not in curcards:
                        cards[randomn]
                        startbox = startbox + 1
                        curcards.append(str(cards[randomn]))
                    if len(curcards) == 10:
                        break
                return 'Boxes to buy: ' + str(startbox) + ' Cards Found: ' + '; '.join(curcards) + ' Total amount of cards: ' + str(allcards)
            
            #print box()
            box()
            

            Returns:

            Boxes to buy: 10 Cards Found: 9; 1; 2; 8; 5; 7; 10; 6; 4; 3 Total Amount Of Cards: 31
            
            qid & accept id: (29263680, 29264113) query: Regex match following substring in string python soup:

            You can do:

            \n
            txt='''\\nCall me on my mobile anytime: 555-666-1212 \nThe office is best at 555-222-3333 \nDont ever call me at 555-666-2345 '''\n\nimport re\n\nprint re.findall(r'(?:(mobile|office).{0,15}(\+?[2-9]\d{2}\)?[ -]?\d{3}[ -]?\d{4}))', txt)\n
            \n

            Prints:

            \n
            [('mobile', '555-666-1212'), ('office', '555-222-3333')]\n
            \n soup wrap:

            You can do:

            txt='''\
            Call me on my mobile anytime: 555-666-1212 
            The office is best at 555-222-3333 
            Dont ever call me at 555-666-2345 '''
            
            import re
            
            print re.findall(r'(?:(mobile|office).{0,15}(\+?[2-9]\d{2}\)?[ -]?\d{3}[ -]?\d{4}))', txt)
            

            Prints:

            [('mobile', '555-666-1212'), ('office', '555-222-3333')]
            
            qid & accept id: (29308274, 29310579) query: Using np.searchsorted to find the most recent timestamp soup:

            To make our life easier, lets use numbers instead of timestamps:

            \n
            >>> a = np.arange(0, 10, 2)\n>>> b = np.arange(1, 8, 3)\n>>> a\narray([0, 2, 4, 6, 8])\n>>> b\narray([1, 4, 7])\n
            \n

            The last timestamps in a that are smaller than or equal to each item in b would be [0, 4, 6], which correspond to indices [0, 2, 3], which is exactly what we get if we do:

            \n
            >>> np.searchsorted(a, b, side='right') - 1\narray([0, 2, 3])\n>>> a[np.searchsorted(a, b, side='right') - 1]\narray([0, 4, 6])\n
            \n

            If you don't use side='right' then you would get wrong values for the second term, where there is an exactly matching timestamp in both arrays:

            \n
            >>> np.searchsorted(a, b) - 1\narray([0, 1, 3])\n
            \n soup wrap:

            To make our life easier, lets use numbers instead of timestamps:

            >>> a = np.arange(0, 10, 2)
            >>> b = np.arange(1, 8, 3)
            >>> a
            array([0, 2, 4, 6, 8])
            >>> b
            array([1, 4, 7])
            

            The last timestamps in a that are smaller than or equal to each item in b would be [0, 4, 6], which correspond to indices [0, 2, 3], which is exactly what we get if we do:

            >>> np.searchsorted(a, b, side='right') - 1
            array([0, 2, 3])
            >>> a[np.searchsorted(a, b, side='right') - 1]
            array([0, 4, 6])
            

            If you don't use side='right' then you would get wrong values for the second term, where there is an exactly matching timestamp in both arrays:

            >>> np.searchsorted(a, b) - 1
            array([0, 1, 3])
            
            qid & accept id: (29309643, 29309683) query: Converting List to Dict soup:

            You can use a dict comprehension and slicing :

            \n
            >>> list1 = [['James','24','Canada','Blue','Tall'],['Ryan','21','U.S.','Green','Short']]\n>>> {i[0]:i[1:] for i in list1}\n{'James': ['24', 'Canada', 'Blue', 'Tall'], 'Ryan': ['21', 'U.S.', 'Green', 'Short']}\n
            \n

            In python 3 you can use a more elegant way with unpacking operation :

            \n
            >>> {i:j for i,*j in list1}\n{'James': ['24', 'Canada', 'Blue', 'Tall'], 'Ryan': ['21', 'U.S.', 'Green', 'Short']}\n
            \n soup wrap:

            You can use a dict comprehension and slicing :

            >>> list1 = [['James','24','Canada','Blue','Tall'],['Ryan','21','U.S.','Green','Short']]
            >>> {i[0]:i[1:] for i in list1}
            {'James': ['24', 'Canada', 'Blue', 'Tall'], 'Ryan': ['21', 'U.S.', 'Green', 'Short']}
            

            In python 3 you can use a more elegant way with unpacking operation :

            >>> {i:j for i,*j in list1}
            {'James': ['24', 'Canada', 'Blue', 'Tall'], 'Ryan': ['21', 'U.S.', 'Green', 'Short']}
            
            qid & accept id: (29318963, 29321161) query: Pandas, Filling between dates with average change between previous rows soup:

            You can use DataFrame.interpolate with the "time" method after a resample. (It won't give quite the numbers you gave, because there are only 30 days between 2 Nov and 2 Dec, not 31):

            \n
            >>> dnew = df.resample("1d").interpolate("time")\n>>> dnew.head(100)\n                 295340      299616\n2014-11-02   304.904110  157.123288\n2014-11-03   314.650753  162.068795\n[...]\n2014-11-28   558.316839  285.706466\n2014-11-29   568.063483  290.651972\n2014-11-30   577.810126  295.597479\n2014-12-01   587.556770  300.542986\n2014-12-02   597.303413  305.488493\n2014-12-03   606.948799  310.299014\n[...]\n2014-12-30   867.374215  440.183068\n2014-12-31   877.019600  444.993589\n2015-01-01   886.664986  449.804109\n2015-01-02   896.310372  454.614630\n[...]\n2015-02-01  1182.828960  594.911891\n2015-02-02  1192.379580  599.588466\n[...]\n
            \n

            The downside here is that it'll extrapolate using the last value at the end:

            \n
            [...]\n2015-01-31  1173.278341  590.235315\n2015-02-01  1182.828960  594.911891\n2015-02-02  1192.379580  599.588466\n2015-02-03  1201.832532  604.125411\n2015-02-04  1211.285484  608.662356\n2015-02-05  1211.285484  613.199302\n2015-02-06  1211.285484  617.736247\n[...]\n
            \n

            So you'd have to decide how you want to handle that.

            \n soup wrap:

            You can use DataFrame.interpolate with the "time" method after a resample. (It won't give quite the numbers you gave, because there are only 30 days between 2 Nov and 2 Dec, not 31):

            >>> dnew = df.resample("1d").interpolate("time")
            >>> dnew.head(100)
                             295340      299616
            2014-11-02   304.904110  157.123288
            2014-11-03   314.650753  162.068795
            [...]
            2014-11-28   558.316839  285.706466
            2014-11-29   568.063483  290.651972
            2014-11-30   577.810126  295.597479
            2014-12-01   587.556770  300.542986
            2014-12-02   597.303413  305.488493
            2014-12-03   606.948799  310.299014
            [...]
            2014-12-30   867.374215  440.183068
            2014-12-31   877.019600  444.993589
            2015-01-01   886.664986  449.804109
            2015-01-02   896.310372  454.614630
            [...]
            2015-02-01  1182.828960  594.911891
            2015-02-02  1192.379580  599.588466
            [...]
            

            The downside here is that it'll extrapolate using the last value at the end:

            [...]
            2015-01-31  1173.278341  590.235315
            2015-02-01  1182.828960  594.911891
            2015-02-02  1192.379580  599.588466
            2015-02-03  1201.832532  604.125411
            2015-02-04  1211.285484  608.662356
            2015-02-05  1211.285484  613.199302
            2015-02-06  1211.285484  617.736247
            [...]
            

            So you'd have to decide how you want to handle that.

            qid & accept id: (29327493, 29327964) query: Trying to convert HSV image to Black and white [opencv] soup:

            You are basically looking for Thresholding, basically the concept is selecting a threshold value and any pixel value (grayscale) greater than threshold is set to be white and black otherwise. OpenCV has some beautiful inbuilt methods to do the same but it is really simple code:

            \n
            skin = #Initialize this variable with the image produced after separating the skin pixels from the image.\n\nbw_image = cv2.cvtColor(skin, cv2.HSV2GRAY)\n\nnew_image = bw_image[:]\n\nthreshold = 1 \n #This value is set to 1 because We want to separate out the pixel values which are purely BLACK whose grayscale value is constant (0) \n
            \n

            Now we simply iterate over the image and substitute the values accordingly.

            \n
            h,b = skin.shape[:2]    \n\nfor i in xrange(h):\n    for j in xrange(b):\n        if bw_image[i][j] > threshold:\n            new_image[i][j] = 255 #Setting the skin tone to be White\n        else:\n            new_image[i][j] = 0 #else setting it to zero.\n
            \n soup wrap:

            You are basically looking for Thresholding, basically the concept is selecting a threshold value and any pixel value (grayscale) greater than threshold is set to be white and black otherwise. OpenCV has some beautiful inbuilt methods to do the same but it is really simple code:

            skin = #Initialize this variable with the image produced after separating the skin pixels from the image.
            
            bw_image = cv2.cvtColor(skin, cv2.HSV2GRAY)
            
            new_image = bw_image[:]
            
            threshold = 1 
             #This value is set to 1 because We want to separate out the pixel values which are purely BLACK whose grayscale value is constant (0) 
            

            Now we simply iterate over the image and substitute the values accordingly.

            h,b = skin.shape[:2]    
            
            for i in xrange(h):
                for j in xrange(b):
                    if bw_image[i][j] > threshold:
                        new_image[i][j] = 255 #Setting the skin tone to be White
                    else:
                        new_image[i][j] = 0 #else setting it to zero.
            
            qid & accept id: (29351492, 29351603) query: How to make a continuous alphabetic list python (from a-z then from aa, ab, ac etc) soup:

            Use itertools.product.

            \n
            from string import ascii_lowercase\nimport itertools\n\ndef iter_all_strings():\n    size = 1\n    while True:\n        for s in itertools.product(ascii_lowercase, repeat=size):\n            yield "".join(s)\n        size +=1\n\nfor s in iter_all_strings():\n    print s\n    if s == 'bb':\n        break\n
            \n

            Result:

            \n
            a\nb\nc\nd\ne\n...\ny\nz\naa\nab\nac\n...\nay\naz\nba\nbb\n
            \n

            This has the added benefit of going well beyond two-letter combinations. If you need a million strings, it will happily give you three and four and five letter strings.

            \n
            \n

            Bonus style tip: if you don't like having an explicit break inside the bottom loop, you can use islice to make the loop terminate on its own:

            \n
            for s in itertools.islice(iter_all_strings(), 54):\n    print s\n
            \n soup wrap:

            Use itertools.product.

            from string import ascii_lowercase
            import itertools
            
            def iter_all_strings():
                size = 1
                while True:
                    for s in itertools.product(ascii_lowercase, repeat=size):
                        yield "".join(s)
                    size +=1
            
            for s in iter_all_strings():
                print s
                if s == 'bb':
                    break
            

            Result:

            a
            b
            c
            d
            e
            ...
            y
            z
            aa
            ab
            ac
            ...
            ay
            az
            ba
            bb
            

            This has the added benefit of going well beyond two-letter combinations. If you need a million strings, it will happily give you three and four and five letter strings.


            Bonus style tip: if you don't like having an explicit break inside the bottom loop, you can use islice to make the loop terminate on its own:

            for s in itertools.islice(iter_all_strings(), 54):
                print s
            
            qid & accept id: (29356022, 29356656) query: python multiprocessing dynamically created processes and pipes soup:

            Create your Pipe when you create the Process, and return a tuple of the Process an Pipe.

            \n
            import multiprocessing as mp\n\ndef mkproc(func):\n    parent_conn, child_conn = mp.Pipe()\n    p = mp.Process(func, args=(child_conn,))\n    p.start()\n    return (p, parent_conn)\n
            \n

            After calling mkproc to create processes, store the result in a list;

            \n
            allprocs = [mkproc(f) for f in (foo, bar, baz)]\n
            \n

            The contents of allproc is now a list of (Process, Pipe) tuples. If you iterate over the list you have the process and the pipe that belongs with it;

            \n
            for proc, conn in allprocs:\n    # do something with the process or pipe.\n
            \n soup wrap:

            Create your Pipe when you create the Process, and return a tuple of the Process an Pipe.

            import multiprocessing as mp
            
            def mkproc(func):
                parent_conn, child_conn = mp.Pipe()
                p = mp.Process(func, args=(child_conn,))
                p.start()
                return (p, parent_conn)
            

            After calling mkproc to create processes, store the result in a list;

            allprocs = [mkproc(f) for f in (foo, bar, baz)]
            

            The contents of allproc is now a list of (Process, Pipe) tuples. If you iterate over the list you have the process and the pipe that belongs with it;

            for proc, conn in allprocs:
                # do something with the process or pipe.
            
            qid & accept id: (29365357, 29366205) query: Python. How to efficiently remove custom object from array soup:

            If you need arbitrary criteria, then filtering is OK, but it is slightly shorter to use a list comprehension. For example, instead of

            \n
            self.skills = filter(lambda skill: skill.id != skill_to_remove.id, self.skills)\n
            \n

            use:

            \n
            self.skills = [s for s in self.skills if s.id != skill_to_remove.id]\n
            \n

            It's also possible to modify the list in-place (see this question) using slice assignment:

            \n
            self.skills[:] = (s for s in self.skills if s.id != skill_to_remove.id)\n
            \n

            If you are filtering skills based on an exact match with a "template" skill, i.e. matching all the properties of skill_to_remove then it might be better to implement an equality method for your Skill class (see this question). Then you could just use the remove method on self.skills. However, this will only remove the first matching instance.

            \n soup wrap:

            If you need arbitrary criteria, then filtering is OK, but it is slightly shorter to use a list comprehension. For example, instead of

            self.skills = filter(lambda skill: skill.id != skill_to_remove.id, self.skills)
            

            use:

            self.skills = [s for s in self.skills if s.id != skill_to_remove.id]
            

            It's also possible to modify the list in-place (see this question) using slice assignment:

            self.skills[:] = (s for s in self.skills if s.id != skill_to_remove.id)
            

            If you are filtering skills based on an exact match with a "template" skill, i.e. matching all the properties of skill_to_remove then it might be better to implement an equality method for your Skill class (see this question). Then you could just use the remove method on self.skills. However, this will only remove the first matching instance.

            qid & accept id: (29384588, 29385020) query: How to reset an unordered index to an ordered one in python? soup:

            Use the drop=True option of reset_index.

            \n
            \n

            drop : boolean, default False.\n Do not try to insert index into dataframe columns. This resets the index to the default integer index

            \n
            \n

            So, instead of calling:

            \n
            transactional.reset_index(inplace = True)\n
            \n

            Do:

            \n
            transactional.reset_index(inplace = True, drop=True)\n
            \n soup wrap:

            Use the drop=True option of reset_index.

            drop : boolean, default False. Do not try to insert index into dataframe columns. This resets the index to the default integer index

            So, instead of calling:

            transactional.reset_index(inplace = True)
            

            Do:

            transactional.reset_index(inplace = True, drop=True)
            
            qid & accept id: (29392606, 29393178) query: Pandas: Adding conditionally soup:

            This approach is the fastest I have tried:

            \n
            foo = foobar2.clip_lower(0)\nfoo = foo['var1']+foo['var2']-foo['var3']-foo['var4']\n
            \n

            This approach is a tiny bit little slower:

            \n
            foo = foobar2.clip_lower(0)\nfoo['var3']*=-1\nfoo['var4']*=-1\nfoo = foo.sum(axis=1)\n
            \n

            You can also use the apply method for a one-liner, which is simpler and clearer but also slower even than your approach:

            \n
            foo = foobar2.clip_lower(0).apply(lambda x: x['var1']+x['var2']-x['var3']-x['var4'], axis=1)\n
            \n soup wrap:

            This approach is the fastest I have tried:

            foo = foobar2.clip_lower(0)
            foo = foo['var1']+foo['var2']-foo['var3']-foo['var4']
            

            This approach is a tiny bit little slower:

            foo = foobar2.clip_lower(0)
            foo['var3']*=-1
            foo['var4']*=-1
            foo = foo.sum(axis=1)
            

            You can also use the apply method for a one-liner, which is simpler and clearer but also slower even than your approach:

            foo = foobar2.clip_lower(0).apply(lambda x: x['var1']+x['var2']-x['var3']-x['var4'], axis=1)
            
            qid & accept id: (29395674, 29396083) query: Generate two random strings with dash in between soup:

            Try picking a random string of letters, then a random string of digits, then joining them with a hyphen:

            \n
            import string, random\ndef pick(num):\n    for j in range(num):\n        print("".join([random.choice(string.ascii_uppercase) for i in range(3)])+"-"+"".join([random.choice(string.digits) for i in range(3)]))\n
            \n

            As such:

            \n
            >>> pick(5)\nOSD-711\nKRH-340\nMDE-271\nZJF-921\nLUX-920\n>>> pick(0)\n>>> pick(3)\nSFT-252\nXSL-209\nMAF-579\n
            \n soup wrap:

            Try picking a random string of letters, then a random string of digits, then joining them with a hyphen:

            import string, random
            def pick(num):
                for j in range(num):
                    print("".join([random.choice(string.ascii_uppercase) for i in range(3)])+"-"+"".join([random.choice(string.digits) for i in range(3)]))
            

            As such:

            >>> pick(5)
            OSD-711
            KRH-340
            MDE-271
            ZJF-921
            LUX-920
            >>> pick(0)
            >>> pick(3)
            SFT-252
            XSL-209
            MAF-579
            
            qid & accept id: (29415538, 29416152) query: numpy array slicing to avoid for loop soup:

            In case you want G to have the same dimensionality as A and then change the appropriate elements of G, the following code should work:

            \n
            # create G as a copy of A, otherwise you might change A by changing G\nG = A.copy()\n\n# getting the mask for all columns except the last one\nm = (B[:,0][:,None] != np.arange(d2-1)[None,:]) & (B[:,1]==0)[:,None]\n\n# getting a matrix with those elements of A which fulfills the conditions\nC = np.where(m,A[:,:d2-1],0).astype(np.float)\n\n# get the 'modified' average you use\navg = np.sum(C,axis=0)/np.sum(m.astype(np.int),axis=0)\n\n# change the appropriate elements in all the columns except the last one\nG[:,:-1] = np.where(m,avg,A[:,:d2-1])\n
            \n

            After fiddling a long time and finding bugs ... I ended up with this code. I checked it against several random matrices A and specific choices of B

            \n
            A = numpy.random.randint(100,size=(5,10))\nB = np.column_stack(([4,2,1,3,4],np.zeros(5)))\n
            \n

            and so far your and my result were in agreement.

            \n soup wrap:

            In case you want G to have the same dimensionality as A and then change the appropriate elements of G, the following code should work:

            # create G as a copy of A, otherwise you might change A by changing G
            G = A.copy()
            
            # getting the mask for all columns except the last one
            m = (B[:,0][:,None] != np.arange(d2-1)[None,:]) & (B[:,1]==0)[:,None]
            
            # getting a matrix with those elements of A which fulfills the conditions
            C = np.where(m,A[:,:d2-1],0).astype(np.float)
            
            # get the 'modified' average you use
            avg = np.sum(C,axis=0)/np.sum(m.astype(np.int),axis=0)
            
            # change the appropriate elements in all the columns except the last one
            G[:,:-1] = np.where(m,avg,A[:,:d2-1])
            

            After fiddling a long time and finding bugs ... I ended up with this code. I checked it against several random matrices A and specific choices of B

            A = numpy.random.randint(100,size=(5,10))
            B = np.column_stack(([4,2,1,3,4],np.zeros(5)))
            

            and so far your and my result were in agreement.

            qid & accept id: (29451598, 29451645) query: Scrape 'dictionary' type object from top of HTML file (bunch of text, not in a class) soup:

            Locate the script by checking it's text to contain "window.BC.product".

            \n

            After you extract the script contents, use regular expressions to extract the desired javascript object, then, load it via json.loads() to get the Python dictionary:

            \n
            import json\nimport re\nfrom bs4 import BeautifulSoup\nimport requests\n\npattern = re.compile(r"window\.BC\.product = (.*);", re.MULTILINE)\n\nresponse = requests.get("http://www.steepandcheap.com/gear-cache/shop-smartwool-on-sale/SWL00II-GRA")\nsoup = BeautifulSoup(response.content)   \n\nscript = soup.find("script", text=lambda x: x and "window.BC.product" in x).text\ndata = json.loads(re.search(pattern, script).group(1))\nprint data\n
            \n

            Prints:

            \n
            {u'features': [{u'name': u'Material', u'description': u'[shell] 86% polyester, ... u'Zippered back pocket\r', u'Reflective details']}\n
            \n soup wrap:

            Locate the script by checking it's text to contain "window.BC.product".

            After you extract the script contents, use regular expressions to extract the desired javascript object, then, load it via json.loads() to get the Python dictionary:

            import json
            import re
            from bs4 import BeautifulSoup
            import requests
            
            pattern = re.compile(r"window\.BC\.product = (.*);", re.MULTILINE)
            
            response = requests.get("http://www.steepandcheap.com/gear-cache/shop-smartwool-on-sale/SWL00II-GRA")
            soup = BeautifulSoup(response.content)   
            
            script = soup.find("script", text=lambda x: x and "window.BC.product" in x).text
            data = json.loads(re.search(pattern, script).group(1))
            print data
            

            Prints:

            {u'features': [{u'name': u'Material', u'description': u'[shell] 86% polyester, ... u'Zippered back pocket\r', u'Reflective details']}
            
            qid & accept id: (29481485, 29482058) query: Creating a Distance Matrix? soup:

            I think you are intrested in distance_matrix.

            \n

            For example:

            \n

            Create data:

            \n
            import pandas as pd\nfrom scipy.spatial import distance_matrix\n\ndata = [[5, 7], [7, 3], [8, 1]]\nctys = ['Boston', 'Phoenix', 'New York']\ndf = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)\n
            \n

            Output:

            \n
                      xcord ycord\nBoston      5   7\nPhoenix     7   3\nNew York    8   1\n
            \n

            Using the distance matrix function:

            \n
             pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)\n
            \n

            Results:

            \n
                      Boston    Phoenix     New York\nBoston    0.000000  4.472136    6.708204\nPhoenix   4.472136  0.000000    2.236068\nNew York  6.708204  2.236068    0.000000\n
            \n soup wrap:

            I think you are intrested in distance_matrix.

            For example:

            Create data:

            import pandas as pd
            from scipy.spatial import distance_matrix
            
            data = [[5, 7], [7, 3], [8, 1]]
            ctys = ['Boston', 'Phoenix', 'New York']
            df = pd.DataFrame(data, columns=['xcord', 'ycord'], index=ctys)
            

            Output:

                      xcord ycord
            Boston      5   7
            Phoenix     7   3
            New York    8   1
            

            Using the distance matrix function:

             pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)
            

            Results:

                      Boston    Phoenix     New York
            Boston    0.000000  4.472136    6.708204
            Phoenix   4.472136  0.000000    2.236068
            New York  6.708204  2.236068    0.000000
            
            qid & accept id: (29484529, 29485365) query: cosine similarity between two words in a list soup:

            You could define these two functions

            \n
            def word2vec(word):\n    from collections import Counter\n    from math import sqrt\n\n    # count the characters in word\n    cw = Counter(word)\n    # precomputes a set of the different characters\n    sw = set(cw)\n    # precomputes the "length" of the word vector\n    lw = sqrt(sum(c*c for c in cw.values()))\n\n    # return a tuple\n    return cw, sw, lw\n\ndef cosdis(v1, v2):\n    # which characters are common to the two words?\n    common = v1[1].intersection(v2[1])\n    # by definition of cosine distance we have\n    return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]\n
            \n

            and use them as in this example

            \n
            >>> a = 'safasfeqefscwaeeafweeaeawaw'\n>>> b = 'tsafdstrdfadsdfdswdfafdwaed'\n>>> c = 'optykop;lvhopijresokpghwji7'\n>>> \n>>> va = word2vec(a)\n>>> vb = word2vec(b)\n>>> vc = word2vec(c)\n>>> \n>>> print cosdis(va,vb)\n0.551843662321\n>>> print cosdis(vb,vc)\n0.113746579656\n>>> print cosdis(vc,va)\n0.153494378078\n
            \n

            BTW, the word2vec that you mention in a tag is quite a different business, that requires that one of us take a great deal of time and commitment for studying it and guess what, I'm not that one...

            \n soup wrap:

            You could define these two functions

            def word2vec(word):
                from collections import Counter
                from math import sqrt
            
                # count the characters in word
                cw = Counter(word)
                # precomputes a set of the different characters
                sw = set(cw)
                # precomputes the "length" of the word vector
                lw = sqrt(sum(c*c for c in cw.values()))
            
                # return a tuple
                return cw, sw, lw
            
            def cosdis(v1, v2):
                # which characters are common to the two words?
                common = v1[1].intersection(v2[1])
                # by definition of cosine distance we have
                return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]
            

            and use them as in this example

            >>> a = 'safasfeqefscwaeeafweeaeawaw'
            >>> b = 'tsafdstrdfadsdfdswdfafdwaed'
            >>> c = 'optykop;lvhopijresokpghwji7'
            >>> 
            >>> va = word2vec(a)
            >>> vb = word2vec(b)
            >>> vc = word2vec(c)
            >>> 
            >>> print cosdis(va,vb)
            0.551843662321
            >>> print cosdis(vb,vc)
            0.113746579656
            >>> print cosdis(vc,va)
            0.153494378078
            

            BTW, the word2vec that you mention in a tag is quite a different business, that requires that one of us take a great deal of time and commitment for studying it and guess what, I'm not that one...

            qid & accept id: (29497029, 29498349) query: Python Boto List Storage Devices Attached to Instance soup:

            I think you want the BlockDeviceMapping for the instance. Based on your example above the following should find the block_device_mapping for the instance which is a dictionary. Each key in the dictionary is a device name and the value is a BlockDeviceType object which contain information about the block device associated with that device name.

            \n
            for reservation in reservations:\n    for instance in reservation.instances:\n        bdm = instance.block_device_mapping\n        for device in bdm:\n            print('Device: {}'.format(device)\n            bdt = bdm[device]\n            print('\tVolumeID: {}'.format(bdt.volume_id))\n            print('\tVolume Status: {}'.format(bd.volume_status))\n
            \n

            This should print something like:

            \n
            Device: /dev/sda1\n    VolumeID: vol-1d011806\n    Volume Size: attached\n
            \n

            There are other fields in the BlockDeviceType object. You should be able to find more info about that in the Boto docs.

            \n soup wrap:

            I think you want the BlockDeviceMapping for the instance. Based on your example above the following should find the block_device_mapping for the instance which is a dictionary. Each key in the dictionary is a device name and the value is a BlockDeviceType object which contain information about the block device associated with that device name.

            for reservation in reservations:
                for instance in reservation.instances:
                    bdm = instance.block_device_mapping
                    for device in bdm:
                        print('Device: {}'.format(device)
                        bdt = bdm[device]
                        print('\tVolumeID: {}'.format(bdt.volume_id))
                        print('\tVolume Status: {}'.format(bd.volume_status))
            

            This should print something like:

            Device: /dev/sda1
                VolumeID: vol-1d011806
                Volume Size: attached
            

            There are other fields in the BlockDeviceType object. You should be able to find more info about that in the Boto docs.

            qid & accept id: (29509781, 29509822) query: In python, return value only when the function is used in an assignment soup:

            You can't and should not worry about this. Use different functions or have your function take an argument that controls what is returned, if this is an issue.

            \n

            Assignments are wide and varied and can include assignments that are explicitly discarded again. For example, if your function was used in a comparison expression:

            \n
            if f() == 'Return something':\n
            \n

            there is no assignment, but the return value matters.

            \n

            Using different functions:

            \n
            def f_with_return():\n    return 'something'\n\ndef f_without_return():\n    f_with_return()  # ignores the return value!\n    print "I won't return something"\n
            \n

            or parameters:

            \n
            def f(return_something=True):\n    if return_something:\n        return 'something'\n    print "I won't return something"\n
            \n

            lets you control what is returned.

            \n

            However, if you don't explicitly return anything, Python still returns a value: None.

            \n soup wrap:

            You can't and should not worry about this. Use different functions or have your function take an argument that controls what is returned, if this is an issue.

            Assignments are wide and varied and can include assignments that are explicitly discarded again. For example, if your function was used in a comparison expression:

            if f() == 'Return something':
            

            there is no assignment, but the return value matters.

            Using different functions:

            def f_with_return():
                return 'something'
            
            def f_without_return():
                f_with_return()  # ignores the return value!
                print "I won't return something"
            

            or parameters:

            def f(return_something=True):
                if return_something:
                    return 'something'
                print "I won't return something"
            

            lets you control what is returned.

            However, if you don't explicitly return anything, Python still returns a value: None.

            qid & accept id: (29514238, 29514571) query: How to write defaultdict in more pythonic way? soup:

            Here's one solution:

            \n
            headers = {key for count in counts_to_display.values() for key in count}\n
            \n

            I'm using a set instead of a list since we're working with a dictionary as input, which has arbitrary ordering anyway, so it wouldn't make sense to want to keep ordering intact, and sets automatically take care of deduplication for us, while being more performant for certain operations, too.

            \n

            You could use the following, too:

            \n
            import itertools\nheaders = set(itertools.chain.from_iterable(counts_to_display.values()))\n
            \n soup wrap:

            Here's one solution:

            headers = {key for count in counts_to_display.values() for key in count}
            

            I'm using a set instead of a list since we're working with a dictionary as input, which has arbitrary ordering anyway, so it wouldn't make sense to want to keep ordering intact, and sets automatically take care of deduplication for us, while being more performant for certain operations, too.

            You could use the following, too:

            import itertools
            headers = set(itertools.chain.from_iterable(counts_to_display.values()))
            
            qid & accept id: (29527654, 29592419) query: Appium - Clean app state at the first test and last test, but not between tests soup:

            Here is what I ended up using that works:

            \n

            To clean iOS simulator:

            \n
            xcrun simctl erase \n
            \n

            Note that for iOS simulator, this must be run while the simulator is not open.

            \n

            To clean android simulator:

            \n
            adb shell pm clear \n
            \n

            Note that for android simulator, this must be run while the simulator is open and the app closed.

            \n soup wrap:

            Here is what I ended up using that works:

            To clean iOS simulator:

            xcrun simctl erase 
            

            Note that for iOS simulator, this must be run while the simulator is not open.

            To clean android simulator:

            adb shell pm clear 
            

            Note that for android simulator, this must be run while the simulator is open and the app closed.

            qid & accept id: (29538559, 29539153) query: Draw different sized circles on a map soup:
              \n
            • The matplotlib scatter function has s and c parameters which would allow you to plot dots of different sizes and colors.

              \n

              The Pandas DataFrame.plot method calls the matplotlib scatter function when you specify kind='scatter'. It also passes extra arguments along to the call to scatter so you could use something like

              \n
              df.plot(kind='scatter', x='lon', y='lat', s=df['Total']*50, c=df['Total'], cmap=cmap)\n
              \n

              to plot your points.

            • \n
            • Annotating the points can be done with calls to plt.annotate.

            • \n
            • The gist_rainbow colormap goes from red to orange to yellow ... to violet. gist_rainbow_r is the reversed colormap, which makes red correspond to the largest values.

            • \n
            \n
            \n

            For example,

            \n
            import pandas as pd\nimport matplotlib.pyplot as plt\n\ndf = pd.DataFrame({'Total': [20,15,13,1],\n                   'lat': [40,0,-30,50],\n                   'lon': [40,50,60,70], }, \n                  index=['Location {}'.format(i) for i in range(1,5)])\n\ncmap = plt.get_cmap('gist_rainbow_r')\ndf.plot(kind='scatter', x='lon', y='lat', s=df['Total']*50, c=df['Total'], cmap=cmap)\n\nfor idx, row in df.iterrows():\n    x, y = row[['lon','lat']]\n    plt.annotate(\n        str(idx), \n        xy = (x, y), xytext = (-20, 20),\n        textcoords = 'offset points', ha = 'right', va = 'bottom',\n        bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),\n        arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))\n\nplt.show()\n
            \n

            yields

            \n

            enter image description here

            \n
            \n

            Do not call df.plot or plt.scatter once for each dot. That would become terribly slow as the number of dots increases. Instead, collect requisite the data (the longitudes and latitudes) in the DataFrame so that the dots can be drawn with one call to df.plot:

            \n
            longitudes, latitudes = [], []\nfor row_index, row in df.iterrows():\n    x, y = db.getLocation(row_index)\n    lat, lon = m(y, x)\n    longitudes.append(lon)\n    latitudes.append(lat)\n    plt.annotate(\n        str(row_index), \n        xy = (x, y), xytext = (-20, 20),\n        textcoords = 'offset points', ha = 'right', va = 'bottom',\n        bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),\n        arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))\n\ndf['lon'] = longitudes\ndf['lat'] = latitudes\ncmap = plt.get_cmap('gist_rainbow_r')\nax = plt.gca()\ndf.plot(kind='scatter', x='lon', y='lat', s=df['Total']*50, c=df['Total'], \n        cmap=cmap, ax=ax)\n
            \n soup wrap:
            • The matplotlib scatter function has s and c parameters which would allow you to plot dots of different sizes and colors.

              The Pandas DataFrame.plot method calls the matplotlib scatter function when you specify kind='scatter'. It also passes extra arguments along to the call to scatter so you could use something like

              df.plot(kind='scatter', x='lon', y='lat', s=df['Total']*50, c=df['Total'], cmap=cmap)
              

              to plot your points.

            • Annotating the points can be done with calls to plt.annotate.

            • The gist_rainbow colormap goes from red to orange to yellow ... to violet. gist_rainbow_r is the reversed colormap, which makes red correspond to the largest values.


            For example,

            import pandas as pd
            import matplotlib.pyplot as plt
            
            df = pd.DataFrame({'Total': [20,15,13,1],
                               'lat': [40,0,-30,50],
                               'lon': [40,50,60,70], }, 
                              index=['Location {}'.format(i) for i in range(1,5)])
            
            cmap = plt.get_cmap('gist_rainbow_r')
            df.plot(kind='scatter', x='lon', y='lat', s=df['Total']*50, c=df['Total'], cmap=cmap)
            
            for idx, row in df.iterrows():
                x, y = row[['lon','lat']]
                plt.annotate(
                    str(idx), 
                    xy = (x, y), xytext = (-20, 20),
                    textcoords = 'offset points', ha = 'right', va = 'bottom',
                    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
                    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))
            
            plt.show()
            

            yields

            enter image description here


            Do not call df.plot or plt.scatter once for each dot. That would become terribly slow as the number of dots increases. Instead, collect requisite the data (the longitudes and latitudes) in the DataFrame so that the dots can be drawn with one call to df.plot:

            longitudes, latitudes = [], []
            for row_index, row in df.iterrows():
                x, y = db.getLocation(row_index)
                lat, lon = m(y, x)
                longitudes.append(lon)
                latitudes.append(lat)
                plt.annotate(
                    str(row_index), 
                    xy = (x, y), xytext = (-20, 20),
                    textcoords = 'offset points', ha = 'right', va = 'bottom',
                    bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),
                    arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0'))
            
            df['lon'] = longitudes
            df['lat'] = latitudes
            cmap = plt.get_cmap('gist_rainbow_r')
            ax = plt.gca()
            df.plot(kind='scatter', x='lon', y='lat', s=df['Total']*50, c=df['Total'], 
                    cmap=cmap, ax=ax)
            
            qid & accept id: (29547906, 29548056) query: How do I collapse categorical data into a single record in R or Python? soup:

            In Python with Pandas you can do:

            \n
            import pandas as pd\n\ndf = pd.read_clipboard() # from your sample\n\ndf\n   ID Code\n0   1    A\n1   1    B\n2   1    C\n3   2    A\n4   2    C\n5   3    B\n6   3    C\n
            \n
            \n
            df.groupby('ID').agg(lambda x: ' '.join(x['Code']))\n\n     Code\nID       \n1   A B C\n2     A C\n3     B C\n
            \n soup wrap:

            In Python with Pandas you can do:

            import pandas as pd
            
            df = pd.read_clipboard() # from your sample
            
            df
               ID Code
            0   1    A
            1   1    B
            2   1    C
            3   2    A
            4   2    C
            5   3    B
            6   3    C
            

            df.groupby('ID').agg(lambda x: ' '.join(x['Code']))
            
                 Code
            ID       
            1   A B C
            2     A C
            3     B C
            
            qid & accept id: (29571644, 29577129) query: Custom Hadoop Configuration for Spark from Python (PySpark)? soup:

            I have solved this puzzle for my case when dropped requirement to modify Configurationonline and just base on custom set of Hadoop configuration *.xml files.

            \n

            At first I have written Java class which adds configuration of additional layers to default resources for org.apache.hadoop.conf.Configuration. It's static initialization appends Configuration default resoutces:

            \n
            public class Configurator {\n\n    static {\n\n        // We initialize needed Hadoop configuration layers default configuration\n        // by loading appropriate classes.\n\n        try {\n            Class.forName("org.apache.hadoop.hdfs.DistributedFileSystem");\n        } catch (ClassNotFoundException e) {\n            LOG.error("Failed to initialize HDFS configuartion layer.");\n        }\n\n        try {\n            Class.forName("org.apache.hadoop.mapreduce.Cluster");\n        } catch (ClassNotFoundException e) {\n            LOG.error("Failed to initialize YARN/MapReduce configuartion layer.");\n        }\n\n        // We do what actually HBase should: default HBase configuration\n        // is added to default Hadoop resources.\n        Configuration.addDefaultResource("hbase-default.xml");\n        Configuration.addDefaultResource("hbase-site.xml");\n    }\n\n    // Just 'callable' handle.\n    public void init() {\n    }\n\n}\n
            \n

            So now if someone just loads my Configurator he or she has the following inftastructure configurations searched over class path: core, hdfs, MapReduce, YARN, HBase. Appropriate files are core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hbase-default.xml, hbase-site.xml. If I will need additional layers, no problem to extend.

            \n

            Configurator.init() is provided just to have more trivial handle for class loading.

            \n

            Now I need to extend Python Spark scripts with access to configurator during Spark context startup:

            \n
            # Create minimal Spark context.\nsc = SparkContext(appName="ScriptWithIntegratedConfig")\n\n# It's critical to initialize configurator so any\n# new org.apach.hadoop.Configuration object loads our resources.\nsc._jvm.com.wellcentive.nosql.Configurator.init()\n
            \n

            So now normal Hadoop new Configuration() construction (which is common inside PythonRDD infrastructure for Hadoop-based datasets) leads to all layers configuration loaded from class path where I can place configuration for needed cluster.

            \n

            At least works for me.

            \n soup wrap:

            I have solved this puzzle for my case when dropped requirement to modify Configurationonline and just base on custom set of Hadoop configuration *.xml files.

            At first I have written Java class which adds configuration of additional layers to default resources for org.apache.hadoop.conf.Configuration. It's static initialization appends Configuration default resoutces:

            public class Configurator {
            
                static {
            
                    // We initialize needed Hadoop configuration layers default configuration
                    // by loading appropriate classes.
            
                    try {
                        Class.forName("org.apache.hadoop.hdfs.DistributedFileSystem");
                    } catch (ClassNotFoundException e) {
                        LOG.error("Failed to initialize HDFS configuartion layer.");
                    }
            
                    try {
                        Class.forName("org.apache.hadoop.mapreduce.Cluster");
                    } catch (ClassNotFoundException e) {
                        LOG.error("Failed to initialize YARN/MapReduce configuartion layer.");
                    }
            
                    // We do what actually HBase should: default HBase configuration
                    // is added to default Hadoop resources.
                    Configuration.addDefaultResource("hbase-default.xml");
                    Configuration.addDefaultResource("hbase-site.xml");
                }
            
                // Just 'callable' handle.
                public void init() {
                }
            
            }
            

            So now if someone just loads my Configurator he or she has the following inftastructure configurations searched over class path: core, hdfs, MapReduce, YARN, HBase. Appropriate files are core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hbase-default.xml, hbase-site.xml. If I will need additional layers, no problem to extend.

            Configurator.init() is provided just to have more trivial handle for class loading.

            Now I need to extend Python Spark scripts with access to configurator during Spark context startup:

            # Create minimal Spark context.
            sc = SparkContext(appName="ScriptWithIntegratedConfig")
            
            # It's critical to initialize configurator so any
            # new org.apach.hadoop.Configuration object loads our resources.
            sc._jvm.com.wellcentive.nosql.Configurator.init()
            

            So now normal Hadoop new Configuration() construction (which is common inside PythonRDD infrastructure for Hadoop-based datasets) leads to all layers configuration loaded from class path where I can place configuration for needed cluster.

            At least works for me.

            qid & accept id: (29573963, 29573994) query: How do I split items in a list (with delimiter) within a list? soup:

            Use list_comprehension.

            \n
            >>> x = ['temp1_a','temp2_b', None, 'temp3_c']\n>>> y, z  = [i if i is None else i.split('_')[0] for i in x ], [i if i is None else i.split('_')[1] for i in x ]\n>>> y\n['temp1', 'temp2', None, 'temp3']\n>>> z\n['a', 'b', None, 'c']\n
            \n

            Update:

            \n
            >>> x = [['temp1_a','temp2_b', None, 'temp3_c'],['list1_a','list2_b','list3_c']]\n>>> y, z = [i if i is None else i.split('_')[0] for i in itertools.chain(*x)], [i if i is None else i.split('_')[1] for i in itertools.chain(*x) ]\n>>> y\n['temp1', 'temp2', None, 'temp3', 'list1', 'list2', 'list3']\n>>> z\n['a', 'b', None, 'c', 'a', 'b', 'c']\n
            \n soup wrap:

            Use list_comprehension.

            >>> x = ['temp1_a','temp2_b', None, 'temp3_c']
            >>> y, z  = [i if i is None else i.split('_')[0] for i in x ], [i if i is None else i.split('_')[1] for i in x ]
            >>> y
            ['temp1', 'temp2', None, 'temp3']
            >>> z
            ['a', 'b', None, 'c']
            

            Update:

            >>> x = [['temp1_a','temp2_b', None, 'temp3_c'],['list1_a','list2_b','list3_c']]
            >>> y, z = [i if i is None else i.split('_')[0] for i in itertools.chain(*x)], [i if i is None else i.split('_')[1] for i in itertools.chain(*x) ]
            >>> y
            ['temp1', 'temp2', None, 'temp3', 'list1', 'list2', 'list3']
            >>> z
            ['a', 'b', None, 'c', 'a', 'b', 'c']
            
            qid & accept id: (29600513, 29600682) query: define different function for different versions of python soup:

            While there are compatibility libraries; six and future being the 2 most widely known, sometimes one needs to live without compatibility libs. You can always write your own class decorator, and put it into say mypackage/compat.py. The following works nicely for writing the class in Python 3 format and converting the 3-ready class to Python 2 if needed (the same can be used for next vs __next__, etc:

            \n
            import sys\n\nif sys.version_info[0] < 3:\n    def py2_compat(cls):\n        if hasattr(cls, '__str__'):\n            cls.__unicode__ = cls.__str__\n            del cls.__str__\n            # or optionally supply an str that \n            # encodes the output of cls.__unicode__\n        return cls\nelse:\n    def py2_compat(cls):\n        return cls\n\n@py2_compat\nclass MyPython3Class(object):\n    def __str__(self):\n        return u'Here I am!'\n
            \n

            (notice that we are using u'' prefix which is PyPy 3, and Python 3.3+ compatible only, so if you need to be compatible with Python 3.2, then you need to adjust accordingly)

            \n
            \n

            To supply a __str__ method that encodes the __unicode__ to UTF-8 in Python 2, you can replace the del cls.__str__ with

            \n
            def __str__(self):\n    return unicode(self).encode('UTF-8')\ncls.__str__ = __str__\n
            \n soup wrap:

            While there are compatibility libraries; six and future being the 2 most widely known, sometimes one needs to live without compatibility libs. You can always write your own class decorator, and put it into say mypackage/compat.py. The following works nicely for writing the class in Python 3 format and converting the 3-ready class to Python 2 if needed (the same can be used for next vs __next__, etc:

            import sys
            
            if sys.version_info[0] < 3:
                def py2_compat(cls):
                    if hasattr(cls, '__str__'):
                        cls.__unicode__ = cls.__str__
                        del cls.__str__
                        # or optionally supply an str that 
                        # encodes the output of cls.__unicode__
                    return cls
            else:
                def py2_compat(cls):
                    return cls
            
            @py2_compat
            class MyPython3Class(object):
                def __str__(self):
                    return u'Here I am!'
            

            (notice that we are using u'' prefix which is PyPy 3, and Python 3.3+ compatible only, so if you need to be compatible with Python 3.2, then you need to adjust accordingly)


            To supply a __str__ method that encodes the __unicode__ to UTF-8 in Python 2, you can replace the del cls.__str__ with

            def __str__(self):
                return unicode(self).encode('UTF-8')
            cls.__str__ = __str__
            
            qid & accept id: (29643161, 29646111) query: Python PyQt QWebView load site in clicked tab soup:

            in main.py:

            \n

            make the tab2 addressable, and add code to listen to tab change event:

            \n
                #...\n    self.tab2 = support\n\n    self.tabs.addTab(tab1,"tab1")\n    self.tabs.addTab(self.tab2,"SUPPORT")\n    #\n    self.tabs.currentChanged.connect(self.load_on_show)\n
            \n

            Then add the action

            \n
            def load_on_show(self):\n    idx = self.tabs.currentIndex()\n    if idx == 1:\n        url = "http://www.google.com"\n        print url\n        self.tab2.load_url(url)\n
            \n

            At last, in tab_file.py [i can't use a dash, have to use an underscore!]:

            \n

            Make view addressable, again (self.view) and add code

            \n
            def load_url(self, url):\n    self.view. load(QtCore.QUrl(url))\n
            \n

            Does this help?

            \n soup wrap:

            in main.py:

            make the tab2 addressable, and add code to listen to tab change event:

                #...
                self.tab2 = support
            
                self.tabs.addTab(tab1,"tab1")
                self.tabs.addTab(self.tab2,"SUPPORT")
                #
                self.tabs.currentChanged.connect(self.load_on_show)
            

            Then add the action

            def load_on_show(self):
                idx = self.tabs.currentIndex()
                if idx == 1:
                    url = "http://www.google.com"
                    print url
                    self.tab2.load_url(url)
            

            At last, in tab_file.py [i can't use a dash, have to use an underscore!]:

            Make view addressable, again (self.view) and add code

            def load_url(self, url):
                self.view. load(QtCore.QUrl(url))
            

            Does this help?

            qid & accept id: (29703388, 29708843) query: How to iterate over a pandas dataframe and compare certain columns based on a third column? soup:

            seaborn does a lot of this for you, very flexibly:

            \n
            import seaborn as sns\nsns.factorplot('ids', 'data', hue='var', kind='bar', data=df)\n
            \n

            enter image description here

            \n

            (it also restyles the plotting defaults, which can be changed or reset).

            \n

            If you want to subset the data, pass the subset as the data argument:

            \n
            sns.factorplot('ids', 'data', hue='var', kind='bar', \n               data=df[df.isin({'ids':['Bob','Mary']}).any(1)])\n
            \n

            enter image description here

            \n
              \n
            • that's with sns style turned off
            • \n
            • for any more complicated mask, you'd set up the mask separately; see the pandas docs
            • \n
            \n soup wrap:

            seaborn does a lot of this for you, very flexibly:

            import seaborn as sns
            sns.factorplot('ids', 'data', hue='var', kind='bar', data=df)
            

            enter image description here

            (it also restyles the plotting defaults, which can be changed or reset).

            If you want to subset the data, pass the subset as the data argument:

            sns.factorplot('ids', 'data', hue='var', kind='bar', 
                           data=df[df.isin({'ids':['Bob','Mary']}).any(1)])
            

            enter image description here

            • that's with sns style turned off
            • for any more complicated mask, you'd set up the mask separately; see the pandas docs
            qid & accept id: (29721519, 29721560) query: String formatting on SQL insert using dict soup:

            This depends on the database driver. Usually, if positional arguments (%s) are supported, then named arguments could be to, using the %(name)s format:

            \n
            cursor.execute(\n    '''INSERT INTO main_territory (code, name) VALUES (%(code)s, %(name)s)''',\n    {'code': item[0], 'name': item[1]})\n
            \n

            The parameter values then are passed in as a dictionary.

            \n

            The most commonly used MySQL database adapter, MySQLdb supports this style.

            \n

            Other database adapters use ? and :name as the positional and named arguments; you can query the style used with the paramstyle attribute on the module:

            \n
            >>> import MySQLdb\n>>> MySQLdb.paramstyle\n'format'\n
            \n

            but because most drivers support both positional and named styles, they usually just name one ('format' or 'qmark') while also supporting the named variants. Always consult the documentation to verify this.

            \n soup wrap:

            This depends on the database driver. Usually, if positional arguments (%s) are supported, then named arguments could be to, using the %(name)s format:

            cursor.execute(
                '''INSERT INTO main_territory (code, name) VALUES (%(code)s, %(name)s)''',
                {'code': item[0], 'name': item[1]})
            

            The parameter values then are passed in as a dictionary.

            The most commonly used MySQL database adapter, MySQLdb supports this style.

            Other database adapters use ? and :name as the positional and named arguments; you can query the style used with the paramstyle attribute on the module:

            >>> import MySQLdb
            >>> MySQLdb.paramstyle
            'format'
            

            but because most drivers support both positional and named styles, they usually just name one ('format' or 'qmark') while also supporting the named variants. Always consult the documentation to verify this.

            qid & accept id: (29744665, 29744714) query: Python json-rpc help, how to extract data soup:

            Access by the key, you may also need to import Decimal:

            \n
            from decimal import Decimal\n\nprint(access.getinfo()["balance"])\n
            \n

            So just assign to the value returned:

            \n
             bal, pay_tax = access.getinfo()["balance"], access.getinfo()["paytxfee"]\n .....\n
            \n soup wrap:

            Access by the key, you may also need to import Decimal:

            from decimal import Decimal
            
            print(access.getinfo()["balance"])
            

            So just assign to the value returned:

             bal, pay_tax = access.getinfo()["balance"], access.getinfo()["paytxfee"]
             .....
            
            qid & accept id: (29761118, 29761235) query: How to separate a single list into multiple list in python soup:

            How about this:

            \n
            foods = ['I_want_ten_orange_cookies', 'I_want_four_orange_juices', 'I_want_ten_lemon_cookies', 'I_want_four_lemon_juices']\n\norange=[]\nlemon=[]\n\nfor food in foods:\n    if 'orange' in food.split('_'):\n        orange.append(food)\n    elif 'lemon' in food.split('_'):\n        lemon.append(food) \n
            \n

            This would output:

            \n
            >>> orange\n['I_want_ten_orange_cookies', 'I_want_four_orange_juices']\n\n>>> lemon\n['I_want_ten_lemon_cookies', 'I_want_four_lemon_juices']\n
            \n

            This works if the items in the list are always separated by underscores.

            \n

            The if 'orange' in food.split('_') splits the sentence into a list of words and then then checks if the food is in that list.

            \n
            \n

            You could, in theory, just do if 'orange' in food but that would fail if the substring is found in another word. For example:

            \n
            >>> s='I_appeared_there'\n\n>>> if 'pear' in s:\n    print "yes"\n\nyes\n\n>>> if 'pear' in s.split('_'):\n    print "yes"\n\n>>>\n
            \n soup wrap:

            How about this:

            foods = ['I_want_ten_orange_cookies', 'I_want_four_orange_juices', 'I_want_ten_lemon_cookies', 'I_want_four_lemon_juices']
            
            orange=[]
            lemon=[]
            
            for food in foods:
                if 'orange' in food.split('_'):
                    orange.append(food)
                elif 'lemon' in food.split('_'):
                    lemon.append(food) 
            

            This would output:

            >>> orange
            ['I_want_ten_orange_cookies', 'I_want_four_orange_juices']
            
            >>> lemon
            ['I_want_ten_lemon_cookies', 'I_want_four_lemon_juices']
            

            This works if the items in the list are always separated by underscores.

            The if 'orange' in food.split('_') splits the sentence into a list of words and then then checks if the food is in that list.


            You could, in theory, just do if 'orange' in food but that would fail if the substring is found in another word. For example:

            >>> s='I_appeared_there'
            
            >>> if 'pear' in s:
                print "yes"
            
            yes
            
            >>> if 'pear' in s.split('_'):
                print "yes"
            
            >>>
            
            qid & accept id: (29785451, 29785457) query: How to print from itertools count object? soup:

            if it sounds simple ... it probably is

            \n
            from itertools import count\na = count(1)\n\nnext(a)\nnext(a)\nprint next(a)\n
            \n

            you can also use itertools.islice to skip parts of an iterator

            \n
            from itertools import count,islice\na = count(1)\nfor item in islice(a,2,4):\n    print item\n
            \n soup wrap:

            if it sounds simple ... it probably is

            from itertools import count
            a = count(1)
            
            next(a)
            next(a)
            print next(a)
            

            you can also use itertools.islice to skip parts of an iterator

            from itertools import count,islice
            a = count(1)
            for item in islice(a,2,4):
                print item
            
            qid & accept id: (29799542, 30006369) query: How to retain " and ' while parsing xml using bs4 python soup:

            Custom Encode & Output Formatting

            \n

            You can use a custom formatter function to add these specific entities to the entity substitution.

            \n
            from bs4 import BeautifulSoup\nfrom bs4.dammit import EntitySubstitution\n\ndef custom_formatter(string):\n    """add " and ' to entity substitution"""\n    return EntitySubstitution.substitute_html(string).replace('"','"').replace("'",''')\n\ninput_file = '''\n  " example text "\n  \n    " example text "\n    \n      ' example text '\n    \n  \n\n'''\n\nsoup = BeautifulSoup(input_file, "xml")\n\nprint soup.encode(formatter=custom_formatter)\n
            \n
            \n
            \n\n" example text "\n\n" example text "\n\n' example text '\n\n\n\n
            \n

            The trick is to do it after the EntitySubstitution.substitute_html() so your &s don't get substituted to &s.

            \n soup wrap:

            Custom Encode & Output Formatting

            You can use a custom formatter function to add these specific entities to the entity substitution.

            from bs4 import BeautifulSoup
            from bs4.dammit import EntitySubstitution
            
            def custom_formatter(string):
                """add " and ' to entity substitution"""
                return EntitySubstitution.substitute_html(string).replace('"','"').replace("'",''')
            
            input_file = '''
              " example text "
              
                " example text "
                
                  ' example text '
                
              
            
            '''
            
            soup = BeautifulSoup(input_file, "xml")
            
            print soup.encode(formatter=custom_formatter)
            

            
            
            " example text "
            
            " example text "
            
            ' example text '
            
            
            
            

            The trick is to do it after the EntitySubstitution.substitute_html() so your &s don't get substituted to &s.

            qid & accept id: (29831030, 29832960) query: Time - get yesterdays date soup:

            To get yesterday's struct_time, use any of many existing datetime solutions and call .timetuple() to get struct_time e.g.:

            \n
            #!/usr/bin/env python\nfrom datetime import date, timedelta\n\ntoday = date.today()\nyesterday = today - timedelta(1)\nprint(yesterday.timetuple())\n# -> time.struct_time(tm_year=2015, tm_mon=4, tm_mday=22, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=112, tm_isdst=-1)\n
            \n

            It produces the correct day in the local timezone even around DST transitions.

            \n

            See How can I subtract a day from a python date? if you want to find the corresponding UTC time (get yesterday as an aware datetime object).

            \n
            \n

            You could also get yesterday using only time module (but less directly):

            \n
            #!/usr/bin/env python\nimport time\n\ndef posix_time(utc_time_tuple):\n    """seconds since Epoch as defined by POSIX."""\n    # from https://gist.github.com/zed/ff4e35df3887c1f82002\n    tm_year = utc_time_tuple.tm_year - 1900\n    tm_yday = utc_time_tuple.tm_yday - 1\n    tm_hour = utc_time_tuple.tm_hour\n    tm_min = utc_time_tuple.tm_min\n    tm_sec = utc_time_tuple.tm_sec\n    # http://pubs.opengroup.org/stage7tc1/basedefs/V1_chap04.html#tag_04_15\n    return (tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86400 +\n            (tm_year-70)*31536000 + ((tm_year-69)//4)*86400 -\n            ((tm_year-1)//100)*86400 + ((tm_year+299)//400)*86400)\n\nnow = time.localtime()\nyesterday = time.gmtime(posix_time(now) - 86400)\nprint(yesterday)\n# -> time.struct_time(tm_year=2015, tm_mon=4, tm_mday=22, tm_hour=22, tm_min=6, tm_sec=16, tm_wday=2, tm_yday=112, tm_isdst=0)\n
            \n

            It assumes that time.gmtime() accepts POSIX timestamp on the given platform (Python's stdlib breaks otherwise e.g., if non-POSIX TZ=right/UTC is used). calendar.timegm() could be used instead of posix_time() but the former may use datetime internally.

            \n

            Note: yesterday represents local time in both solutions (gmtime() is just a simple way to implement subtraction here). Both solutions use naive timezone-unaware time objects and therefore the result may be ambiguous or even non-existent time though unless the local timezone has skipped yesterday (e.g., Russia had skipped several days in February 1918) then the date is correct anyway.

            \n soup wrap:

            To get yesterday's struct_time, use any of many existing datetime solutions and call .timetuple() to get struct_time e.g.:

            #!/usr/bin/env python
            from datetime import date, timedelta
            
            today = date.today()
            yesterday = today - timedelta(1)
            print(yesterday.timetuple())
            # -> time.struct_time(tm_year=2015, tm_mon=4, tm_mday=22, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=112, tm_isdst=-1)
            

            It produces the correct day in the local timezone even around DST transitions.

            See How can I subtract a day from a python date? if you want to find the corresponding UTC time (get yesterday as an aware datetime object).


            You could also get yesterday using only time module (but less directly):

            #!/usr/bin/env python
            import time
            
            def posix_time(utc_time_tuple):
                """seconds since Epoch as defined by POSIX."""
                # from https://gist.github.com/zed/ff4e35df3887c1f82002
                tm_year = utc_time_tuple.tm_year - 1900
                tm_yday = utc_time_tuple.tm_yday - 1
                tm_hour = utc_time_tuple.tm_hour
                tm_min = utc_time_tuple.tm_min
                tm_sec = utc_time_tuple.tm_sec
                # http://pubs.opengroup.org/stage7tc1/basedefs/V1_chap04.html#tag_04_15
                return (tm_sec + tm_min*60 + tm_hour*3600 + tm_yday*86400 +
                        (tm_year-70)*31536000 + ((tm_year-69)//4)*86400 -
                        ((tm_year-1)//100)*86400 + ((tm_year+299)//400)*86400)
            
            now = time.localtime()
            yesterday = time.gmtime(posix_time(now) - 86400)
            print(yesterday)
            # -> time.struct_time(tm_year=2015, tm_mon=4, tm_mday=22, tm_hour=22, tm_min=6, tm_sec=16, tm_wday=2, tm_yday=112, tm_isdst=0)
            

            It assumes that time.gmtime() accepts POSIX timestamp on the given platform (Python's stdlib breaks otherwise e.g., if non-POSIX TZ=right/UTC is used). calendar.timegm() could be used instead of posix_time() but the former may use datetime internally.

            Note: yesterday represents local time in both solutions (gmtime() is just a simple way to implement subtraction here). Both solutions use naive timezone-unaware time objects and therefore the result may be ambiguous or even non-existent time though unless the local timezone has skipped yesterday (e.g., Russia had skipped several days in February 1918) then the date is correct anyway.

            qid & accept id: (29857558, 29857936) query: Short for 'for i in range(1,len(a)):' in python soup:

            You can make it shorter by defining a function:

            \n
            def r(lst):\n    return range(len(lst))\n
            \n

            and then:

            \n
            for i in r(l):\n    ...\n
            \n

            Which is 9 characters shorter!

            \n soup wrap:

            You can make it shorter by defining a function:

            def r(lst):
                return range(len(lst))
            

            and then:

            for i in r(l):
                ...
            

            Which is 9 characters shorter!

            qid & accept id: (29867175, 29867217) query: Nested options with argparse soup:

            You should use subparsers:

            \n
            import argparse\n\nparser = argparse.ArgumentParser()\nsubparsers = parser.add_subparsers(title='subcommands')\n\nparser_foo = subparsers.add_parser('foo')\nparser_foo.set_defaults(target='foo')\n\nparser_bar = subparsers.add_parser('bar')\nparser_bar.add_argument('more')\nparser_bar.set_defaults(target='bar')\n
            \n

            Usage:

            \n
            >>> parser.parse_args(['foo'])\nNamespace(target='foo')\n\n>>> parser.parse_args(['bar', '123'])\nNamespace(target='bar', more='123')\n
            \n
            \n

            Note that you could set the target to e.g. a function and call it directly. Here's some sample code that does this (extracted from Cactus' CLI, but that's a rather common pattern):

            \n
            parser = argparse.ArgumentParser()\n\nsubparsers = parser.add_subparsers(title = 'subcommands')\n\nparser_create = subparsers.add_parser('create')\nparser_create.add_argument('path')\nparser_create.add_argument('-s', '--skeleton')\nparser_create.set_defaults(target=create)\n\nparser_build = subparsers.add_parser('build')\nparser_build.set_defaults(target = build)\n\nargs = parser.parse_args()\nargs.target(**{k: v for k, v in vars(args).items() if k != 'target'})\n
            \n soup wrap:

            You should use subparsers:

            import argparse
            
            parser = argparse.ArgumentParser()
            subparsers = parser.add_subparsers(title='subcommands')
            
            parser_foo = subparsers.add_parser('foo')
            parser_foo.set_defaults(target='foo')
            
            parser_bar = subparsers.add_parser('bar')
            parser_bar.add_argument('more')
            parser_bar.set_defaults(target='bar')
            

            Usage:

            >>> parser.parse_args(['foo'])
            Namespace(target='foo')
            
            >>> parser.parse_args(['bar', '123'])
            Namespace(target='bar', more='123')
            

            Note that you could set the target to e.g. a function and call it directly. Here's some sample code that does this (extracted from Cactus' CLI, but that's a rather common pattern):

            parser = argparse.ArgumentParser()
            
            subparsers = parser.add_subparsers(title = 'subcommands')
            
            parser_create = subparsers.add_parser('create')
            parser_create.add_argument('path')
            parser_create.add_argument('-s', '--skeleton')
            parser_create.set_defaults(target=create)
            
            parser_build = subparsers.add_parser('build')
            parser_build.set_defaults(target = build)
            
            args = parser.parse_args()
            args.target(**{k: v for k, v in vars(args).items() if k != 'target'})
            
            qid & accept id: (29870041, 29870131) query: How to XOR literal with a string soup:

            To convert a string to an array of bytes:

            \n
            b = bytes('abcd', 'ascii')\n
            \n

            To convert array of bytes to int:

            \n
            i = int.from_bytes(b, byteorder='big', signed=False)\n
            \n soup wrap:

            To convert a string to an array of bytes:

            b = bytes('abcd', 'ascii')
            

            To convert array of bytes to int:

            i = int.from_bytes(b, byteorder='big', signed=False)
            
            qid & accept id: (29908931, 29924521) query: YAML list -> Python generator? soup:

            I can understand that the Events API scares you, and it would only bring you so much. First of all you would need to keep track of depth (because you have your top level complex sequence items, as well as "bar", "baz" etc.\nAnd, having cut the low level sequence event elements correctly you would have to feed them into the composer to create nodes (and eventually Python objects), not trivial either.

            \n

            But since YAML uses indentation, even for scalars spanning multiple lines, you can use a simple line based parser that recognises where each sequence element starts and feed those into the normal load() function one at a time:

            \n
            #/usr/bin/env python\n\nimport ruamel.yaml\n\ndef list_elements(fp, depth=0):\n    buffer = None\n    in_header = True\n    list_element_match = ' ' * depth + '- '\n    for line in fp:\n        if line.startswith('---'):\n            in_header = False\n            continue\n        if in_header:\n            continue\n        if line.startswith(list_element_match):\n            if buffer is None:\n                buffer = line\n                continue\n            yield ruamel.yaml.load(buffer)[0]\n            buffer = line\n            continue\n        buffer += line\n    if buffer:\n       yield ruamel.yaml.load(buffer)[0]\n\n\nwith open("foobar.yaml") as fp:\n   for element in list_elements(fp):\n       print(str(element))\n
            \n

            resulting in:

            \n
            {'something_else': 'blah', 'foo': ['bar', 'baz', 'bah']}\n{'bar': 'yet_another_thing'}\n
            \n

            I used the enhanced version of PyYAML, ruamel.yaml here (of which I am the author), but PyYAML should work in the same way.

            \n soup wrap:

            I can understand that the Events API scares you, and it would only bring you so much. First of all you would need to keep track of depth (because you have your top level complex sequence items, as well as "bar", "baz" etc. And, having cut the low level sequence event elements correctly you would have to feed them into the composer to create nodes (and eventually Python objects), not trivial either.

            But since YAML uses indentation, even for scalars spanning multiple lines, you can use a simple line based parser that recognises where each sequence element starts and feed those into the normal load() function one at a time:

            #/usr/bin/env python
            
            import ruamel.yaml
            
            def list_elements(fp, depth=0):
                buffer = None
                in_header = True
                list_element_match = ' ' * depth + '- '
                for line in fp:
                    if line.startswith('---'):
                        in_header = False
                        continue
                    if in_header:
                        continue
                    if line.startswith(list_element_match):
                        if buffer is None:
                            buffer = line
                            continue
                        yield ruamel.yaml.load(buffer)[0]
                        buffer = line
                        continue
                    buffer += line
                if buffer:
                   yield ruamel.yaml.load(buffer)[0]
            
            
            with open("foobar.yaml") as fp:
               for element in list_elements(fp):
                   print(str(element))
            

            resulting in:

            {'something_else': 'blah', 'foo': ['bar', 'baz', 'bah']}
            {'bar': 'yet_another_thing'}
            

            I used the enhanced version of PyYAML, ruamel.yaml here (of which I am the author), but PyYAML should work in the same way.

            qid & accept id: (29910229, 29910735) query: How can i extract metdata from django models soup:

            You can get a Model's fields and their metadata like this:

            \n
            def get_model_metadata(model_class, meta_whitelist=[]):\n  field_list = model_class._meta.fields\n  return_data = {}\n  for field in field_list:\n    field_name = field.name\n    field_meta = field.__dict__\n    return_meta = {}\n    for meta_name in field_meta:\n      if meta_name in meta_whitelist:\n        return_meta[meta_name] = field_meta[meta_name]\n    if len(return_meta) > 0:\n      return_data[field_name] = return_meta\n  return return_data\n
            \n

            Usage:

            \n
            from django.contrib.auth.models import User\nget_model_metadata(User, meta_whitelist=['max_length'])\n
            \n

            Returns:

            \n
            {\n  'username': {'max_length': 30},\n  'first_name': {'max_length': 30},\n  'last_name': {'max_length': 30},\n  'is_active': {'max_length': None},\n  'email': {'max_length': 75},\n  'is_superuser': {'max_length': None},\n  'is_staff': {'max_length': None},\n  'last_login': {'max_length': None},\n  'password': {'max_length': 128},\n  u'id': {'max_length': None},\n  'date_joined': {'max_length': None}\n}\n
            \n

            Improvements to this method would include blacklist of field metadata, whitelist/blacklist for fields, and maybe a boolean for not showing metadata that has None value.

            \n soup wrap:

            You can get a Model's fields and their metadata like this:

            def get_model_metadata(model_class, meta_whitelist=[]):
              field_list = model_class._meta.fields
              return_data = {}
              for field in field_list:
                field_name = field.name
                field_meta = field.__dict__
                return_meta = {}
                for meta_name in field_meta:
                  if meta_name in meta_whitelist:
                    return_meta[meta_name] = field_meta[meta_name]
                if len(return_meta) > 0:
                  return_data[field_name] = return_meta
              return return_data
            

            Usage:

            from django.contrib.auth.models import User
            get_model_metadata(User, meta_whitelist=['max_length'])
            

            Returns:

            {
              'username': {'max_length': 30},
              'first_name': {'max_length': 30},
              'last_name': {'max_length': 30},
              'is_active': {'max_length': None},
              'email': {'max_length': 75},
              'is_superuser': {'max_length': None},
              'is_staff': {'max_length': None},
              'last_login': {'max_length': None},
              'password': {'max_length': 128},
              u'id': {'max_length': None},
              'date_joined': {'max_length': None}
            }
            

            Improvements to this method would include blacklist of field metadata, whitelist/blacklist for fields, and maybe a boolean for not showing metadata that has None value.

            qid & accept id: (29932225, 29932418) query: Obtain x'th largest item in a dictionary soup:

            Using the heap queue algorithm:

            \n
            import heapq\ny = {'a':55, 'b':33, 'c':67, 'd':12}\nprint heapq.nlargest(n=3, iterable=y, key=y.get)[-1]\n# b\n
            \n

            This will be better performing for large dictionaries than sorting the entire dict each time. Specifically, with a dictionary of n elements where you're looking for the k largest ones, this runs in O(n log k) instead of O(n log n).

            \n

            Also note that this gives you all three largest values in order as a list, simply remove the [-1]:

            \n
            print heapq.nlargest(n=3, iterable=y, key=y.get)\n# ['c', 'a', 'b']\n
            \n soup wrap:

            Using the heap queue algorithm:

            import heapq
            y = {'a':55, 'b':33, 'c':67, 'd':12}
            print heapq.nlargest(n=3, iterable=y, key=y.get)[-1]
            # b
            

            This will be better performing for large dictionaries than sorting the entire dict each time. Specifically, with a dictionary of n elements where you're looking for the k largest ones, this runs in O(n log k) instead of O(n log n).

            Also note that this gives you all three largest values in order as a list, simply remove the [-1]:

            print heapq.nlargest(n=3, iterable=y, key=y.get)
            # ['c', 'a', 'b']
            
            qid & accept id: (29935056, 29935613) query: Pandas: How to extract rows of a dataframe matching Filter1 OR filter2 soup:

            You could do

            \n
            In [276]: df[(df['fold'] >= 2) | (df['fold'] <= -0.6)]\nOut[276]:\n   label         Y88_N          diff       div      fold\n0      0  25273.626713  17348.581851  2.016404  2.016404\n1      1  29139.510491  -4208.868050  0.604304 -0.604304\n5      5  28996.634708  10934.944533  2.031293  2.031293\n
            \n

            Or use query method like

            \n
            In [277]: df.query('fold >=2 | fold <=-0.6')\nOut[277]:\n   label         Y88_N          diff       div      fold\n0      0  25273.626713  17348.581851  2.016404  2.016404\n1      1  29139.510491  -4208.868050  0.604304 -0.604304\n5      5  28996.634708  10934.944533  2.031293  2.031293\n
            \n

            And, pd.eval() works well with expressions containing large arrays

            \n
            In [278]: df[pd.eval('df.fold >=2 | df.fold <=-0.6')]\nOut[278]:\n   label         Y88_N          diff       div      fold\n0      0  25273.626713  17348.581851  2.016404  2.016404\n1      1  29139.510491  -4208.868050  0.604304 -0.604304\n5      5  28996.634708  10934.944533  2.031293  2.031293\n
            \n soup wrap:

            You could do

            In [276]: df[(df['fold'] >= 2) | (df['fold'] <= -0.6)]
            Out[276]:
               label         Y88_N          diff       div      fold
            0      0  25273.626713  17348.581851  2.016404  2.016404
            1      1  29139.510491  -4208.868050  0.604304 -0.604304
            5      5  28996.634708  10934.944533  2.031293  2.031293
            

            Or use query method like

            In [277]: df.query('fold >=2 | fold <=-0.6')
            Out[277]:
               label         Y88_N          diff       div      fold
            0      0  25273.626713  17348.581851  2.016404  2.016404
            1      1  29139.510491  -4208.868050  0.604304 -0.604304
            5      5  28996.634708  10934.944533  2.031293  2.031293
            

            And, pd.eval() works well with expressions containing large arrays

            In [278]: df[pd.eval('df.fold >=2 | df.fold <=-0.6')]
            Out[278]:
               label         Y88_N          diff       div      fold
            0      0  25273.626713  17348.581851  2.016404  2.016404
            1      1  29139.510491  -4208.868050  0.604304 -0.604304
            5      5  28996.634708  10934.944533  2.031293  2.031293
            
            qid & accept id: (29947844, 29947893) query: Opposite of set.intersection in python? soup:

            You are looking for the symmetric difference; all elements that appear only in set a or in set b, but not both:

            \n
            a.symmetric_difference(b)\n
            \n

            From the set.symmetric_difference() method documentation:

            \n
            \n

            Return a new set with elements in either the set or other but not both.

            \n
            \n

            You can use the ^ operator too, if both a and b are sets:

            \n
            a ^ b\n
            \n

            while set.symmetric_difference() takes any iterable for the other argument.

            \n

            The output is the equivalent of (a | b) - (a & b), the union of both sets minus the intersection of both sets.

            \n soup wrap:

            You are looking for the symmetric difference; all elements that appear only in set a or in set b, but not both:

            a.symmetric_difference(b)
            

            From the set.symmetric_difference() method documentation:

            Return a new set with elements in either the set or other but not both.

            You can use the ^ operator too, if both a and b are sets:

            a ^ b
            

            while set.symmetric_difference() takes any iterable for the other argument.

            The output is the equivalent of (a | b) - (a & b), the union of both sets minus the intersection of both sets.

            qid & accept id: (29950856, 29952855) query: remove dictionary from list in pandas colum soup:

            Step 1: Convert the string 'list' column to actual lists:

            \n
            from ast import literal_eval \n\ndf['misc'] = [literal_eval(r) for r in df.misc] \n
            \n

            Step 2: Loop through each dictionary to get the 'values' (e.g. cars, pets, shoes, etc.). Add a column to the DataFrame for each unique value.

            \n
            sublists = [[d.get('type') for d in cell] for cell in df.misc]\ncols = list(set([item for sublist in sublists for item in sublist]))\nfor c in cols:\n    df[c] = 0\n
            \n

            Step 3: Create a dictionary which gets the value for each type (this assumes that there is not more than one type for a given list of dictionaries in the row). Then enumerate through these value counts and assign the result back to the DataFrame:

            \n
            value_counts = [{d.get('type'): d.get('value') for d in cell} for cell in df.misc]\nfor n, row in enumerate(value_counts):\nif row:\n    items, values = zip(*row.items())\n    df.loc[df.index[n], items] = values\n\ndel df['misc']\n\n>>> df\n  name  age  cars  shoes  pets  siblings\n0  Jim   44     3     13     1         0\n1  Bob   25     0      0     1         3\n2  Sue   55     0      0     0         0\n
            \n soup wrap:

            Step 1: Convert the string 'list' column to actual lists:

            from ast import literal_eval 
            
            df['misc'] = [literal_eval(r) for r in df.misc] 
            

            Step 2: Loop through each dictionary to get the 'values' (e.g. cars, pets, shoes, etc.). Add a column to the DataFrame for each unique value.

            sublists = [[d.get('type') for d in cell] for cell in df.misc]
            cols = list(set([item for sublist in sublists for item in sublist]))
            for c in cols:
                df[c] = 0
            

            Step 3: Create a dictionary which gets the value for each type (this assumes that there is not more than one type for a given list of dictionaries in the row). Then enumerate through these value counts and assign the result back to the DataFrame:

            value_counts = [{d.get('type'): d.get('value') for d in cell} for cell in df.misc]
            for n, row in enumerate(value_counts):
            if row:
                items, values = zip(*row.items())
                df.loc[df.index[n], items] = values
            
            del df['misc']
            
            >>> df
              name  age  cars  shoes  pets  siblings
            0  Jim   44     3     13     1         0
            1  Bob   25     0      0     1         3
            2  Sue   55     0      0     0         0
            
            qid & accept id: (29952373, 29952472) query: Parse list to other list soup:

            You could go through the list items and split the items after the first one, and get the last two items from the list and append it to a new List

            \n
            l = ['GIS_FPC_PP,PERIMETER,MAT,LIGHTS,PARK,SPACES,LAT,LNG\n',\n     '8266.99157657,453.7255798,Paved,1,American Legion,20,40.0188044212,-75.0547647126\n',\n     '20054.5870679,928.20201772,Paved,1,Barnes Foundation Museum, ,39.9610355788,-75.1725011285\n']\n\nnewList = []\nfor i in range(0, len(l)):\n    item = l[i]\n    tempList = []\n    if i != 0:\n        itemSplit = item.split(',')\n        tempList.append(itemSplit[-2].strip())\n        tempList.append(itemSplit[-1].strip())\n        newList.append(tuple(tempList))\nprint newList\n
            \n

            Output

            \n
            [('40.0188044212', '-75.0547647126'), ('39.9610355788', '-75.1725011285')]\n
            \n soup wrap:

            You could go through the list items and split the items after the first one, and get the last two items from the list and append it to a new List

            l = ['GIS_FPC_PP,PERIMETER,MAT,LIGHTS,PARK,SPACES,LAT,LNG\n',
                 '8266.99157657,453.7255798,Paved,1,American Legion,20,40.0188044212,-75.0547647126\n',
                 '20054.5870679,928.20201772,Paved,1,Barnes Foundation Museum, ,39.9610355788,-75.1725011285\n']
            
            newList = []
            for i in range(0, len(l)):
                item = l[i]
                tempList = []
                if i != 0:
                    itemSplit = item.split(',')
                    tempList.append(itemSplit[-2].strip())
                    tempList.append(itemSplit[-1].strip())
                    newList.append(tuple(tempList))
            print newList
            

            Output

            [('40.0188044212', '-75.0547647126'), ('39.9610355788', '-75.1725011285')]
            
            qid & accept id: (29968046, 29968151) query: Python incrementing a Dictionary value entry held within a list soup:

            You need to index into the list object too:

            \n
            for k, v in PointsOfInterest.iteritems():\n    if k in mypkt.Text:\n        PointsOfInterest[k][1] = PointsOfInterest[k][1] + 1\n
            \n

            or shorter (since you already have v referencing the same value):

            \n
            for k, v in PointsOfInterest.iteritems():\n    if k in mypkt.Text:\n        v[1] += 1\n
            \n

            The same would apply to appending items to a nested list in the value:

            \n
            for k, v in PointsOfInterest.iteritems():\n    if k in mypkt.Text:\n        v[1].append(mykt.Text)\n
            \n soup wrap:

            You need to index into the list object too:

            for k, v in PointsOfInterest.iteritems():
                if k in mypkt.Text:
                    PointsOfInterest[k][1] = PointsOfInterest[k][1] + 1
            

            or shorter (since you already have v referencing the same value):

            for k, v in PointsOfInterest.iteritems():
                if k in mypkt.Text:
                    v[1] += 1
            

            The same would apply to appending items to a nested list in the value:

            for k, v in PointsOfInterest.iteritems():
                if k in mypkt.Text:
                    v[1].append(mykt.Text)
            
            qid & accept id: (29969430, 29985047) query: Apply a weighted average function to a dataframe without grouping it, as if it was a single group soup:

            Some example data:

            \n
            import numpy as np\nimport pandas as pd\n\nmy_target = 25\ndf = pd.DataFrame({'column1': np.random.normal(25, 3, 20),\n                   'weight_column': np.random.random_integers(1, 10, 20)})\n\ndf\nOut[4]: \n      column1  weight_column\n0   23.147356              6\n1   24.361162              5\n2   25.665186              4\n3   20.059039              1\n4   28.573390              5\n5   26.543743              1\n6   23.177928              2\n# etc.\n
            \n

            Okay, so in your post when you say "If I don't use groupby, pandas would apply this function to every row of the dataframe", that's not necessarily true. You should try to read up on the way operations on numpy arrays are "vectorized". So, like people have pointed out in the comments, your function works fine without having to do the groupby:

            \n
            mdft(df)\nOut[9]: 1.9429828309434094\n
            \n

            That said, you could have avoided writing the function in the first place because numpy can do weighted means for you:

            \n
            np.average(np.abs(my_target - df['column1']), weights=df['weight_column'])\nOut[8]: 1.9429828309434098\n
            \n soup wrap:

            Some example data:

            import numpy as np
            import pandas as pd
            
            my_target = 25
            df = pd.DataFrame({'column1': np.random.normal(25, 3, 20),
                               'weight_column': np.random.random_integers(1, 10, 20)})
            
            df
            Out[4]: 
                  column1  weight_column
            0   23.147356              6
            1   24.361162              5
            2   25.665186              4
            3   20.059039              1
            4   28.573390              5
            5   26.543743              1
            6   23.177928              2
            # etc.
            

            Okay, so in your post when you say "If I don't use groupby, pandas would apply this function to every row of the dataframe", that's not necessarily true. You should try to read up on the way operations on numpy arrays are "vectorized". So, like people have pointed out in the comments, your function works fine without having to do the groupby:

            mdft(df)
            Out[9]: 1.9429828309434094
            

            That said, you could have avoided writing the function in the first place because numpy can do weighted means for you:

            np.average(np.abs(my_target - df['column1']), weights=df['weight_column'])
            Out[8]: 1.9429828309434098
            
            qid & accept id: (29974139, 29974292) query: How to automatically rerun a python program after it finishes? Supervisord? soup:

            I don't see why you couldn't use supervisord. The configuration is really simple and very flexible and it's not limited to python programs.

            \n

            For example, you can create file /etc/supervisor/conf.d/myprog.conf:

            \n
            [program:myprog]\ncommand=/opt/myprog/bin/myprog --opt1 --opt2\ndirectory=/opt/myprog\nuser=myuser\n
            \n

            Then reload supervisor's config:

            \n
            $ sudo supervisorctl reload\n
            \n

            and it's on. Isn't it simple enough?

            \n

            More about supervisord configuration: http://supervisord.org/subprocess.html

            \n soup wrap:

            I don't see why you couldn't use supervisord. The configuration is really simple and very flexible and it's not limited to python programs.

            For example, you can create file /etc/supervisor/conf.d/myprog.conf:

            [program:myprog]
            command=/opt/myprog/bin/myprog --opt1 --opt2
            directory=/opt/myprog
            user=myuser
            

            Then reload supervisor's config:

            $ sudo supervisorctl reload
            

            and it's on. Isn't it simple enough?

            More about supervisord configuration: http://supervisord.org/subprocess.html

            qid & accept id: (29998052, 29998062) query: Deleting consonants from a string in Python soup:

            Correcting your code

            \n

            The line if char == vowels: is wrong. It has to be if char in vowels:. This is because you need to check if that particular character is present in the list of vowels. Apart from that you need to print(char,end = '') (in python3) to print the output as iiii all in one line.

            \n

            The final program will be like

            \n
            def eliminate_consonants(x):\n        vowels= ['a','e','i','o','u']\n        for char in x:\n            if char in vowels:\n                print(char,end = "")\n\neliminate_consonants('mississippi')\n
            \n

            And the output will be

            \n
            iiii\n
            \n
            \n

            Other ways include

            \n
              \n
            • Using in a string

              \n
              def eliminate_consonants(x):\n    for char in x:\n        if char in 'aeiou':\n            print(char,end = "")\n
              \n

              As simple as it looks, the statement if char in 'aeiou' checks if char is present in the string aeiou.

            • \n
            • A list comprehension

              \n
               ''.join([c for c in x if c in 'aeiou'])\n
              \n

              This list comprehension will return a list that will contain the characters only if the character is in aeiou

            • \n
            • A generator expression

              \n
              ''.join(c for c in x if c in 'aeiou')\n
              \n

              This gen exp will return a generator than will return the characters only if the character is in aeiou

            • \n
            • Regular Expressions

              \n

              You can use re.findall to discover only the vowels in your string. The code

              \n
              re.findall(r'[aeiou]',"mississippi")\n
              \n

              will return a list of vowels found in the string i.e. ['i', 'i', 'i', 'i']. So now we can use str.join and then use

              \n
              ''.join(re.findall(r'[aeiou]',"mississippi"))\n
            • \n
            • str.translate and maketrans

              \n

              For this technique you will need to store a map which matches each of the non vowels to a None type. For this you can use string.ascii_lowecase. The code to make the map is

              \n
              str.maketrans({i:None for i in string.ascii_lowercase if i not in "aeiou"})\n
              \n

              this will return the mapping. Do store it in a variable (here m for map)

              \n
              "mississippi".translate(m)\n
              \n

              This will remove all the non aeiou characters from the string.

            • \n
            • Using dict.fromkeys

              \n

              You can use dict.fromkeys along with sys.maxunicode. But remember to import sys first!

              \n
              dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')\n
              \n

              and now use str.translate.

              \n
              'mississippi'.translate(m)\n
            • \n
            • Using bytearray

              \n

              As mentioned by J.F.Sebastian in the comments below, you can create a bytearray of lower case consonants by using

              \n
              non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))\n
              \n

              Using this we can translate the word ,

              \n
              'mississippi'.encode('ascii', 'ignore').translate(None, non_vowels)\n
              \n

              which will return b'iiii'. This can easily be converted to str by using decode i.e. b'iiii'.decode("ascii").

            • \n
            • Using bytes

              \n

              bytes returns an bytes object and is the immutable version of bytearray. (It is Python 3 specific)

              \n
              non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))\n
              \n

              Using this we can translate the word ,

              \n
              'mississippi'.encode('ascii', 'ignore').translate(None, non_vowels)\n
              \n

              which will return b'iiii'. This can easily be converted to str by using decode i.e. b'iiii'.decode("ascii").

            • \n
            \n
            \n

            Timing comparison

            \n

            Python 3

            \n
            python3 -m timeit -s "text = 'mississippi'*100; non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"\n100000 loops, best of 3: 2.88 usec per loop\npython3 -m timeit -s "text = 'mississippi'*100; non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"\n100000 loops, best of 3: 3.06 usec per loop\npython3 -m timeit -s "text = 'mississippi'*100;d=dict.fromkeys(i for i in range(127) if chr(i) not in 'aeiou')" "text.translate(d)"\n10000 loops, best of 3: 71.3 usec per loop\npython3 -m timeit -s "import string; import sys; text='mississippi'*100; m = dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')" "text.translate(m)"\n10000 loops, best of 3: 71.6 usec per loop\npython3 -m timeit -s "text = 'mississippi'*100" "''.join(c for c in text if c in 'aeiou')"\n10000 loops, best of 3: 60.1 usec per loop\npython3 -m timeit -s "text = 'mississippi'*100" "''.join([c for c in text if c in 'aeiou'])"\n10000 loops, best of 3: 53.2 usec per loop\npython3 -m timeit -s "import re;text = 'mississippi'*100; p=re.compile(r'[aeiou]')" "''.join(p.findall(text))"\n10000 loops, best of 3: 57 usec per loop\n
            \n

            The timings in sorted order

            \n
            translate (bytes)    |  2.88\ntranslate (bytearray)|  3.06\nList Comprehension   | 53.2\nRegular expressions  | 57.0\nGenerator exp        | 60.1\ndict.fromkeys        | 71.3\ntranslate (unicode)  | 71.6\n
            \n

            As you can see the final method using bytes is the fastest.

            \n
            \n

            Python 3.5

            \n
            python3.5 -m timeit -s "text = 'mississippi'*100; non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"\n100000 loops, best of 3: 4.17 usec per loop\npython3.5 -m timeit -s "text = 'mississippi'*100; non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"\n100000 loops, best of 3: 4.21 usec per loop\npython3.5 -m timeit -s "text = 'mississippi'*100;d=dict.fromkeys(i for i in range(127) if chr(i) not in 'aeiou')" "text.translate(d)"\n100000 loops, best of 3: 2.39 usec per loop\npython3.5 -m timeit -s "import string; import sys; text='mississippi'*100; m = dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')" "text.translate(m)"\n100000 loops, best of 3: 2.33 usec per loop\npython3.5 -m timeit -s "text = 'mississippi'*100" "''.join(c for c in text if c in 'aeiou')"\n10000 loops, best of 3: 97.1 usec per loop\npython3.5 -m timeit -s "text = 'mississippi'*100" "''.join([c for c in text if c in 'aeiou'])"\n10000 loops, best of 3: 86.6 usec per loop\npython3.5 -m timeit -s "import re;text = 'mississippi'*100; p=re.compile(r'[aeiou]')" "''.join(p.findall(text))"\n10000 loops, best of 3: 74.3 usec per loop\n
            \n

            The timings in sorted order

            \n
            translate (unicode)  |  2.33\ndict.fromkeys        |  2.39\ntranslate (bytes)    |  4.17\ntranslate (bytearray)|  4.21\nList Comprehension   | 86.6\nRegular expressions  | 74.3\nGenerator exp        | 97.1\n
            \n soup wrap:

            Correcting your code

            The line if char == vowels: is wrong. It has to be if char in vowels:. This is because you need to check if that particular character is present in the list of vowels. Apart from that you need to print(char,end = '') (in python3) to print the output as iiii all in one line.

            The final program will be like

            def eliminate_consonants(x):
                    vowels= ['a','e','i','o','u']
                    for char in x:
                        if char in vowels:
                            print(char,end = "")
            
            eliminate_consonants('mississippi')
            

            And the output will be

            iiii
            

            Other ways include

            • Using in a string

              def eliminate_consonants(x):
                  for char in x:
                      if char in 'aeiou':
                          print(char,end = "")
              

              As simple as it looks, the statement if char in 'aeiou' checks if char is present in the string aeiou.

            • A list comprehension

               ''.join([c for c in x if c in 'aeiou'])
              

              This list comprehension will return a list that will contain the characters only if the character is in aeiou

            • A generator expression

              ''.join(c for c in x if c in 'aeiou')
              

              This gen exp will return a generator than will return the characters only if the character is in aeiou

            • Regular Expressions

              You can use re.findall to discover only the vowels in your string. The code

              re.findall(r'[aeiou]',"mississippi")
              

              will return a list of vowels found in the string i.e. ['i', 'i', 'i', 'i']. So now we can use str.join and then use

              ''.join(re.findall(r'[aeiou]',"mississippi"))
              
            • str.translate and maketrans

              For this technique you will need to store a map which matches each of the non vowels to a None type. For this you can use string.ascii_lowecase. The code to make the map is

              str.maketrans({i:None for i in string.ascii_lowercase if i not in "aeiou"})
              

              this will return the mapping. Do store it in a variable (here m for map)

              "mississippi".translate(m)
              

              This will remove all the non aeiou characters from the string.

            • Using dict.fromkeys

              You can use dict.fromkeys along with sys.maxunicode. But remember to import sys first!

              dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')
              

              and now use str.translate.

              'mississippi'.translate(m)
              
            • Using bytearray

              As mentioned by J.F.Sebastian in the comments below, you can create a bytearray of lower case consonants by using

              non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))
              

              Using this we can translate the word ,

              'mississippi'.encode('ascii', 'ignore').translate(None, non_vowels)
              

              which will return b'iiii'. This can easily be converted to str by using decode i.e. b'iiii'.decode("ascii").

            • Using bytes

              bytes returns an bytes object and is the immutable version of bytearray. (It is Python 3 specific)

              non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))
              

              Using this we can translate the word ,

              'mississippi'.encode('ascii', 'ignore').translate(None, non_vowels)
              

              which will return b'iiii'. This can easily be converted to str by using decode i.e. b'iiii'.decode("ascii").


            Timing comparison

            Python 3

            python3 -m timeit -s "text = 'mississippi'*100; non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
            100000 loops, best of 3: 2.88 usec per loop
            python3 -m timeit -s "text = 'mississippi'*100; non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
            100000 loops, best of 3: 3.06 usec per loop
            python3 -m timeit -s "text = 'mississippi'*100;d=dict.fromkeys(i for i in range(127) if chr(i) not in 'aeiou')" "text.translate(d)"
            10000 loops, best of 3: 71.3 usec per loop
            python3 -m timeit -s "import string; import sys; text='mississippi'*100; m = dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')" "text.translate(m)"
            10000 loops, best of 3: 71.6 usec per loop
            python3 -m timeit -s "text = 'mississippi'*100" "''.join(c for c in text if c in 'aeiou')"
            10000 loops, best of 3: 60.1 usec per loop
            python3 -m timeit -s "text = 'mississippi'*100" "''.join([c for c in text if c in 'aeiou'])"
            10000 loops, best of 3: 53.2 usec per loop
            python3 -m timeit -s "import re;text = 'mississippi'*100; p=re.compile(r'[aeiou]')" "''.join(p.findall(text))"
            10000 loops, best of 3: 57 usec per loop
            

            The timings in sorted order

            translate (bytes)    |  2.88
            translate (bytearray)|  3.06
            List Comprehension   | 53.2
            Regular expressions  | 57.0
            Generator exp        | 60.1
            dict.fromkeys        | 71.3
            translate (unicode)  | 71.6
            

            As you can see the final method using bytes is the fastest.


            Python 3.5

            python3.5 -m timeit -s "text = 'mississippi'*100; non_vowels = bytes(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
            100000 loops, best of 3: 4.17 usec per loop
            python3.5 -m timeit -s "text = 'mississippi'*100; non_vowels = bytearray(set(range(0x100)) - set(b'aeiou'))" "text.encode('ascii', 'ignore').translate(None, non_vowels).decode('ascii')"
            100000 loops, best of 3: 4.21 usec per loop
            python3.5 -m timeit -s "text = 'mississippi'*100;d=dict.fromkeys(i for i in range(127) if chr(i) not in 'aeiou')" "text.translate(d)"
            100000 loops, best of 3: 2.39 usec per loop
            python3.5 -m timeit -s "import string; import sys; text='mississippi'*100; m = dict.fromkeys(i for i in range(sys.maxunicode+1) if chr(i) not in 'aeiou')" "text.translate(m)"
            100000 loops, best of 3: 2.33 usec per loop
            python3.5 -m timeit -s "text = 'mississippi'*100" "''.join(c for c in text if c in 'aeiou')"
            10000 loops, best of 3: 97.1 usec per loop
            python3.5 -m timeit -s "text = 'mississippi'*100" "''.join([c for c in text if c in 'aeiou'])"
            10000 loops, best of 3: 86.6 usec per loop
            python3.5 -m timeit -s "import re;text = 'mississippi'*100; p=re.compile(r'[aeiou]')" "''.join(p.findall(text))"
            10000 loops, best of 3: 74.3 usec per loop
            

            The timings in sorted order

            translate (unicode)  |  2.33
            dict.fromkeys        |  2.39
            translate (bytes)    |  4.17
            translate (bytearray)|  4.21
            List Comprehension   | 86.6
            Regular expressions  | 74.3
            Generator exp        | 97.1
            
            qid & accept id: (30002261, 30003231) query: How to draw stacked histogram in pandas soup:

            What about something like this

            \n
            import matplotlib.pyplot as plt\nsubset = pd.DataFrame({'fork': {0: True, 1: False, 2: False, 3: False, 4: False, 5: True, 6: False},\n 'percentage_remains': {0: 20.0,\n  1: 9.0909089999999999,\n  2: 2.0,\n  3: 0.0,\n  4: 0.0,\n  5: 33.333333000000003,\n  6: 20.0}})\n
            \n

            Filter for fork == True via boolean indexing

            \n
            filter = subset["fork"] == True`\n
            \n

            Then use matplotlib directly. Notice I'm passing a list, one element are the true values and the other is for the false values

            \n
                plt.hist([subset["percentage_remains"][filter],subset["percentage_remains"][~filter]],\n                                                   stacked=True)\n        plt.show()\n
            \n

            enter image description here

            \n soup wrap:

            What about something like this

            import matplotlib.pyplot as plt
            subset = pd.DataFrame({'fork': {0: True, 1: False, 2: False, 3: False, 4: False, 5: True, 6: False},
             'percentage_remains': {0: 20.0,
              1: 9.0909089999999999,
              2: 2.0,
              3: 0.0,
              4: 0.0,
              5: 33.333333000000003,
              6: 20.0}})
            

            Filter for fork == True via boolean indexing

            filter = subset["fork"] == True`
            

            Then use matplotlib directly. Notice I'm passing a list, one element are the true values and the other is for the false values

                plt.hist([subset["percentage_remains"][filter],subset["percentage_remains"][~filter]],
                                                               stacked=True)
                    plt.show()
            

            enter image description here

            qid & accept id: (30053381, 30053906) query: Working with multiple columns from a data file soup:

            The function cumtrapz accepts an axis argument. For example, suppose you put your first column in x and the remaining columns in y, and they have these values:

            \n
            In [61]: x\nOut[61]: array([100, 110, 120, 130])\n\nIn [62]: y\nOut[62]: \narray([[ 1.1,  2.1,  2. ,  1.1,  1.1],\n       [ 2. ,  2.1,  1. ,  1.2,  2.1],\n       [ 1.2,  1. ,  1.1,  1. ,  1.2],\n       [ 2. ,  1.1,  1.2,  2. ,  1.2]])\n
            \n

            You can integrate each column of y with respect to x as follows:

            \n
            In [63]: cumtrapz(y, x=x, axis=0, initial=0)\nOut[63]: \narray([[  0. ,   0. ,   0. ,   0. ,   0. ],\n       [ 15.5,  21. ,  15. ,  11.5,  16. ],\n       [ 31.5,  36.5,  25.5,  22.5,  32.5],\n       [ 47.5,  47. ,  37. ,  37.5,  44.5]])\n
            \n soup wrap:

            The function cumtrapz accepts an axis argument. For example, suppose you put your first column in x and the remaining columns in y, and they have these values:

            In [61]: x
            Out[61]: array([100, 110, 120, 130])
            
            In [62]: y
            Out[62]: 
            array([[ 1.1,  2.1,  2. ,  1.1,  1.1],
                   [ 2. ,  2.1,  1. ,  1.2,  2.1],
                   [ 1.2,  1. ,  1.1,  1. ,  1.2],
                   [ 2. ,  1.1,  1.2,  2. ,  1.2]])
            

            You can integrate each column of y with respect to x as follows:

            In [63]: cumtrapz(y, x=x, axis=0, initial=0)
            Out[63]: 
            array([[  0. ,   0. ,   0. ,   0. ,   0. ],
                   [ 15.5,  21. ,  15. ,  11.5,  16. ],
                   [ 31.5,  36.5,  25.5,  22.5,  32.5],
                   [ 47.5,  47. ,  37. ,  37.5,  44.5]])
            
            qid & accept id: (30068271, 30070362) query: Python get get average of neighbours in matrix with na value soup:

            Shot #1

            \n

            This assumes you are looking to get sliding windowed average values in an input array with a window of 3 x 3 and considering only the north-west-east-south neighborhood elements.

            \n

            For such a case, signal.convolve2d with an appropriate kernel could be used. At the end, you need to divide those summations by the number of ones in kernel, i.e. kernel.sum() as only those contributed to the summations. Here's the implementation -

            \n
            import numpy as np\nfrom scipy import signal\n\n# Inputs\na = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]\n\n# Convert to numpy array\narr = np.asarray(a,float)    \n\n# Define kernel for convolution                                         \nkernel = np.array([[0,1,0],\n                   [1,0,1],\n                   [0,1,0]]) \n\n# Perform 2D convolution with input data and kernel \nout = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()\n
            \n

            Shot #2

            \n

            This makes the same assumptions as in shot #1, except that we are looking to find average values in a neighborhood of only zero elements with the intention to replace them with those average values.

            \n

            Approach #1: Here's one way to do it using a manual selective convolution approach -

            \n
            import numpy as np\n\n# Convert to numpy array\narr = np.asarray(a,float)    \n\n# Pad around the input array to take care of boundary conditions\narr_pad = np.lib.pad(arr, (1,1), 'wrap')\n\nR,C = np.where(arr==0)   # Row, column indices for zero elements in input array\nN = arr_pad.shape[1]     # Number of rows in input array\n\noffset = np.array([-N, -1, 1, N])\nidx = np.ravel_multi_index((R+1,C+1),arr_pad.shape)[:,None] + offset\n\narr_out = arr.copy()\narr_out[R,C] = arr_pad.ravel()[idx].sum(1)/4\n
            \n

            Sample input, output -

            \n
            In [587]: arr\nOut[587]: \narray([[ 4.,  0.,  3.,  3.,  3.,  1.,  3.],\n       [ 2.,  4.,  0.,  0.,  4.,  2.,  1.],\n       [ 0.,  1.,  1.,  0.,  1.,  4.,  3.],\n       [ 0.,  3.,  0.,  2.,  3.,  0.,  1.]])\n\nIn [588]: arr_out\nOut[588]: \narray([[ 4.  ,  3.5 ,  3.  ,  3.  ,  3.  ,  1.  ,  3.  ],\n       [ 2.  ,  4.  ,  2.  ,  1.75,  4.  ,  2.  ,  1.  ],\n       [ 1.5 ,  1.  ,  1.  ,  1.  ,  1.  ,  4.  ,  3.  ],\n       [ 2.  ,  3.  ,  2.25,  2.  ,  3.  ,  2.25,  1.  ]])\n
            \n

            To take care of the boundary conditions, there are other options for padding. Look at numpy.pad for more info.

            \n

            Approach #2: This would be a modified version of convolution based approach listed earlier in Shot #1. This is same as that earlier approach, except that at the end, we selectively replace\nthe zero elements with the convolution output. Here's the code -

            \n
            import numpy as np\nfrom scipy import signal\n\n# Inputs\na = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]\n\n# Convert to numpy array\narr = np.asarray(a,float)\n\n# Define kernel for convolution                                         \nkernel = np.array([[0,1,0],\n                   [1,0,1],\n                   [0,1,0]]) \n\n# Perform 2D convolution with input data and kernel \nconv_out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()\n\n# Initialize output array as a copy of input array\narr_out = arr.copy()\n\n# Setup a mask of zero elements in input array and \n# replace those in output array with the convolution output\nmask = arr==0\narr_out[mask] = conv_out[mask]\n
            \n

            Remarks: Approach #1 would be the preferred way when you have fewer number of zero elements in input array, otherwise go with Approach #2.

            \n soup wrap:

            Shot #1

            This assumes you are looking to get sliding windowed average values in an input array with a window of 3 x 3 and considering only the north-west-east-south neighborhood elements.

            For such a case, signal.convolve2d with an appropriate kernel could be used. At the end, you need to divide those summations by the number of ones in kernel, i.e. kernel.sum() as only those contributed to the summations. Here's the implementation -

            import numpy as np
            from scipy import signal
            
            # Inputs
            a = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]
            
            # Convert to numpy array
            arr = np.asarray(a,float)    
            
            # Define kernel for convolution                                         
            kernel = np.array([[0,1,0],
                               [1,0,1],
                               [0,1,0]]) 
            
            # Perform 2D convolution with input data and kernel 
            out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
            

            Shot #2

            This makes the same assumptions as in shot #1, except that we are looking to find average values in a neighborhood of only zero elements with the intention to replace them with those average values.

            Approach #1: Here's one way to do it using a manual selective convolution approach -

            import numpy as np
            
            # Convert to numpy array
            arr = np.asarray(a,float)    
            
            # Pad around the input array to take care of boundary conditions
            arr_pad = np.lib.pad(arr, (1,1), 'wrap')
            
            R,C = np.where(arr==0)   # Row, column indices for zero elements in input array
            N = arr_pad.shape[1]     # Number of rows in input array
            
            offset = np.array([-N, -1, 1, N])
            idx = np.ravel_multi_index((R+1,C+1),arr_pad.shape)[:,None] + offset
            
            arr_out = arr.copy()
            arr_out[R,C] = arr_pad.ravel()[idx].sum(1)/4
            

            Sample input, output -

            In [587]: arr
            Out[587]: 
            array([[ 4.,  0.,  3.,  3.,  3.,  1.,  3.],
                   [ 2.,  4.,  0.,  0.,  4.,  2.,  1.],
                   [ 0.,  1.,  1.,  0.,  1.,  4.,  3.],
                   [ 0.,  3.,  0.,  2.,  3.,  0.,  1.]])
            
            In [588]: arr_out
            Out[588]: 
            array([[ 4.  ,  3.5 ,  3.  ,  3.  ,  3.  ,  1.  ,  3.  ],
                   [ 2.  ,  4.  ,  2.  ,  1.75,  4.  ,  2.  ,  1.  ],
                   [ 1.5 ,  1.  ,  1.  ,  1.  ,  1.  ,  4.  ,  3.  ],
                   [ 2.  ,  3.  ,  2.25,  2.  ,  3.  ,  2.25,  1.  ]])
            

            To take care of the boundary conditions, there are other options for padding. Look at numpy.pad for more info.

            Approach #2: This would be a modified version of convolution based approach listed earlier in Shot #1. This is same as that earlier approach, except that at the end, we selectively replace the zero elements with the convolution output. Here's the code -

            import numpy as np
            from scipy import signal
            
            # Inputs
            a = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]
            
            # Convert to numpy array
            arr = np.asarray(a,float)
            
            # Define kernel for convolution                                         
            kernel = np.array([[0,1,0],
                               [1,0,1],
                               [0,1,0]]) 
            
            # Perform 2D convolution with input data and kernel 
            conv_out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
            
            # Initialize output array as a copy of input array
            arr_out = arr.copy()
            
            # Setup a mask of zero elements in input array and 
            # replace those in output array with the convolution output
            mask = arr==0
            arr_out[mask] = conv_out[mask]
            

            Remarks: Approach #1 would be the preferred way when you have fewer number of zero elements in input array, otherwise go with Approach #2.

            qid & accept id: (30071618, 30072109) query: Numpy loadtxt load every other column soup:

            You don't need to write out a giant tuple of even or odd numbers; you can have Python do that for you:

            \n
            data = numpy.loadtxt(..., usecols=xrange(1, numcols, 2))\n
            \n

            I've passed an xrange here, since the usecols parameter can be any sequence type, but even if you needed a tuple, you could just call tuple:

            \n
            data = numpy.loadtxt(..., usecols=tuple(xrange(1, numcols, 2)))\n
            \n soup wrap:

            You don't need to write out a giant tuple of even or odd numbers; you can have Python do that for you:

            data = numpy.loadtxt(..., usecols=xrange(1, numcols, 2))
            

            I've passed an xrange here, since the usecols parameter can be any sequence type, but even if you needed a tuple, you could just call tuple:

            data = numpy.loadtxt(..., usecols=tuple(xrange(1, numcols, 2)))
            
            qid & accept id: (30076583, 30129613) query: How to use swig with compiled dll and header file only soup:

            Yes, it is possible. SWIG only uses the headers to generate wrapper functions. Here's a simple SWIG file:

            \n
            %module mymod\n%{\n#include "myheader.h"\n%}\n\n%include "myheader.h"\n
            \n

            Then:

            \n
            swig -python -c++ mymod.i\n
            \n

            Then compile and link the generated code as a Python extension DLL. You will also need to link in the .lib for the wrapped DLL.

            \n soup wrap:

            Yes, it is possible. SWIG only uses the headers to generate wrapper functions. Here's a simple SWIG file:

            %module mymod
            %{
            #include "myheader.h"
            %}
            
            %include "myheader.h"
            

            Then:

            swig -python -c++ mymod.i
            

            Then compile and link the generated code as a Python extension DLL. You will also need to link in the .lib for the wrapped DLL.

            qid & accept id: (30133281, 30133383) query: Slicing based on dates Pandas Dataframe soup:

            For a specific value you can do this:

            \n
            In [84]:\n\nidx = df[df['preciptotal'] > 1].index[0]\ndf.iloc[idx-3: idx+4]\nOut[84]:\n        date  store_nbr  units  preciptotal\n0 2014-10-11          1      0         0.00\n1 2014-10-12          1      0         0.01\n2 2014-10-13          1      2         0.00\n3 2014-10-14          1      1         2.13\n4 2014-10-15          1      0         0.00\n5 2014-10-16          1      0         0.87\n6 2014-10-17          1      3         0.01\n
            \n

            For the more general case you can get an array of indices where the condition is met

            \n
            idx_vals = df[df['preciptotal'] > 1].index\n
            \n

            then you can generate slices or iterate over the array values:

            \n
            for idx in idx_values:\n    df.iloc[idx-3: idx+4]\n
            \n

            This assumes your index is a 0 based int64 index which your sample is

            \n soup wrap:

            For a specific value you can do this:

            In [84]:
            
            idx = df[df['preciptotal'] > 1].index[0]
            df.iloc[idx-3: idx+4]
            Out[84]:
                    date  store_nbr  units  preciptotal
            0 2014-10-11          1      0         0.00
            1 2014-10-12          1      0         0.01
            2 2014-10-13          1      2         0.00
            3 2014-10-14          1      1         2.13
            4 2014-10-15          1      0         0.00
            5 2014-10-16          1      0         0.87
            6 2014-10-17          1      3         0.01
            

            For the more general case you can get an array of indices where the condition is met

            idx_vals = df[df['preciptotal'] > 1].index
            

            then you can generate slices or iterate over the array values:

            for idx in idx_values:
                df.iloc[idx-3: idx+4]
            

            This assumes your index is a 0 based int64 index which your sample is

            qid & accept id: (30156357, 30156378) query: Getting value of a class in selenium and python soup:
            \n

            print(i.get_attribute("values"))

            \n
            \n

            There is no values attribute, hence, you are getting Nones. There is value attribute.

            \n

            Aside from that, your first approach totally makes sense and should work.

            \n

            We can make it a bit simpler and use a "CSS selector" via find_elements_by_css_selector():

            \n
            medications = driver.find_elements_by_css_selector("input.medMedications")\n\n# count\nprint len(medications)\n\n# values\nfor medication in medications:\n    print medication.get_attribute("value")\n
            \n

            Alternatively, you can check whether an id attribute contains Medications:

            \n
            medications = driver.find_elements_by_css_selector("input[id*=Medications]")\n
            \n

            or, in case of XPath:

            \n
            medications = driver.find_elements_by_xpath("//input[contains(@id, 'Medications']")\n
            \n soup wrap:

            print(i.get_attribute("values"))

            There is no values attribute, hence, you are getting Nones. There is value attribute.

            Aside from that, your first approach totally makes sense and should work.

            We can make it a bit simpler and use a "CSS selector" via find_elements_by_css_selector():

            medications = driver.find_elements_by_css_selector("input.medMedications")
            
            # count
            print len(medications)
            
            # values
            for medication in medications:
                print medication.get_attribute("value")
            

            Alternatively, you can check whether an id attribute contains Medications:

            medications = driver.find_elements_by_css_selector("input[id*=Medications]")
            

            or, in case of XPath:

            medications = driver.find_elements_by_xpath("//input[contains(@id, 'Medications']")
            
            qid & accept id: (30170614, 30170966) query: Iteration Through tuple of dictionaries in Python soup:

            Just check each dict d for the key and then set Dict["2"] equal to d["2"].

            \n
            Dict = {'1': 'one', '2': 'three'}\n\nTuple = ({'1': 'one', '5': 'five'}, {'4': 'four', '2': 'two'})\n\nfor d in Tuple:\n    if "2" in d:\n        Dict["2"] = d["2"]\n
            \n

            If you have multiple dicts in Tuple that have the same key the value will be set to the last dict you encounter. If you wanted the first match you should break in the if.

            \n
            Dict = {'1': 'one', '2': 'three'}\n\nTuple = ({'1': 'one', '5': 'five'}, {'4': 'four', '2': 'two'})\nfor d in Tuple:\n    if "2" in d:\n        Dict["2"] = d["2"]\n        break # get first match\n
            \n

            If you want the last match it would be better start at the end of Tuple:

            \n
            for d in reversed(Tuple):\n    if "2" in d:\n        Dict["2"] = d["2"]\n        break # last dict in Tuple that has the key\n
            \n soup wrap:

            Just check each dict d for the key and then set Dict["2"] equal to d["2"].

            Dict = {'1': 'one', '2': 'three'}
            
            Tuple = ({'1': 'one', '5': 'five'}, {'4': 'four', '2': 'two'})
            
            for d in Tuple:
                if "2" in d:
                    Dict["2"] = d["2"]
            

            If you have multiple dicts in Tuple that have the same key the value will be set to the last dict you encounter. If you wanted the first match you should break in the if.

            Dict = {'1': 'one', '2': 'three'}
            
            Tuple = ({'1': 'one', '5': 'five'}, {'4': 'four', '2': 'two'})
            for d in Tuple:
                if "2" in d:
                    Dict["2"] = d["2"]
                    break # get first match
            

            If you want the last match it would be better start at the end of Tuple:

            for d in reversed(Tuple):
                if "2" in d:
                    Dict["2"] = d["2"]
                    break # last dict in Tuple that has the key
            
            qid & accept id: (30180241, 30180322) query: Numpy: get the column and row index of the minimum value of a 2D array soup:

            You may use np.where:

            \n
            In [9]: np.where(x == np.min(x))\nOut[9]: (array([2]), array([1]))\n
            \n

            Also as @senderle mentioned in comment, to get values in an array, you can use np.argwhere:

            \n
            In [21]: np.argwhere(x == np.min(x))\nOut[21]: array([[2, 1]])\n
            \n

            Updated:

            \n

            As OP's times show, and much clearer that argmin is desired (no duplicated mins etc.), one way I think may slightly improve OP's original approach is to use divmod:

            \n
            divmod(x.argmin(), x.shape[1])\n
            \n

            Timed them and you will find that extra bits of speed, not much but still an improvement.

            \n
            %timeit find_min_idx(x)\n1000000 loops, best of 3: 1.1 µs per loop\n\n%timeit divmod(x.argmin(), x.shape[1])\n1000000 loops, best of 3: 1.04 µs per loop\n
            \n

            If you are really concerned about performance, you may take a look at cython.

            \n soup wrap:

            You may use np.where:

            In [9]: np.where(x == np.min(x))
            Out[9]: (array([2]), array([1]))
            

            Also as @senderle mentioned in comment, to get values in an array, you can use np.argwhere:

            In [21]: np.argwhere(x == np.min(x))
            Out[21]: array([[2, 1]])
            

            Updated:

            As OP's times show, and much clearer that argmin is desired (no duplicated mins etc.), one way I think may slightly improve OP's original approach is to use divmod:

            divmod(x.argmin(), x.shape[1])
            

            Timed them and you will find that extra bits of speed, not much but still an improvement.

            %timeit find_min_idx(x)
            1000000 loops, best of 3: 1.1 µs per loop
            
            %timeit divmod(x.argmin(), x.shape[1])
            1000000 loops, best of 3: 1.04 µs per loop
            

            If you are really concerned about performance, you may take a look at cython.

            qid & accept id: (30185056, 30211905) query: Updating a table from another table with multiple columns in sqlalchemy soup:

            I don't think you can. Thus, this is not really an answer, but it is far too long for a comment.

            \n

            You can easily compose your query with 2 columns (I guess you already knew that):

            \n
            select_query = select([table2.c.col1, table2.c.col2]).where(table1.c.key == table2.c.key)\n
            \n

            and afterwards you can use the method with_only_columns(), see api:

            \n
            In[52]: print(table.update().values(col1 = select_query.with_only_columns([table2.c.col1]), col2 = select_query.with_only_columns([table2.c.col2])))\nUPDATE table SET a=(SELECT tweet.id \nFROM tweet \nWHERE tweet.id IS NOT NULL), b=(SELECT tweet.user_id \nFROM tweet \nWHERE tweet.id IS NOT NULL)\n
            \n

            But as you see from the update statement, you will be effectivelly doing two selects. (Sorry I did not adapt the output completely to your example, but I'm sure you get the idea).

            \n

            I'm not sure whether, as you say, MySQL will be smart enough to make it one query only. I guess so. Hope it helps anyway.

            \n soup wrap:

            I don't think you can. Thus, this is not really an answer, but it is far too long for a comment.

            You can easily compose your query with 2 columns (I guess you already knew that):

            select_query = select([table2.c.col1, table2.c.col2]).where(table1.c.key == table2.c.key)
            

            and afterwards you can use the method with_only_columns(), see api:

            In[52]: print(table.update().values(col1 = select_query.with_only_columns([table2.c.col1]), col2 = select_query.with_only_columns([table2.c.col2])))
            UPDATE table SET a=(SELECT tweet.id 
            FROM tweet 
            WHERE tweet.id IS NOT NULL), b=(SELECT tweet.user_id 
            FROM tweet 
            WHERE tweet.id IS NOT NULL)
            

            But as you see from the update statement, you will be effectivelly doing two selects. (Sorry I did not adapt the output completely to your example, but I'm sure you get the idea).

            I'm not sure whether, as you say, MySQL will be smart enough to make it one query only. I guess so. Hope it helps anyway.

            qid & accept id: (30196224, 30196344) query: Extracting data from file with differing amounts of columns soup:

            Use $(NF-1) like so where NF is the number fields for that line:

            \n
            awk  '{print $(NF-1)}' /tmp/genes.txt\nA2M\nACADM\n
            \n

            Your posted example has spaces for delimiters. You may need to change the field separator to tabs if you file is truly tab delimited. Then it would be:

            \n
            awk  -F $'\t' {print $(NF-1)}' file_name\n
            \n

            If you want the number before that name:

            \n
            $ awk  '{print $(NF-2)}' /tmp/genes.txt\n9268558\n76229363\n
            \n soup wrap:

            Use $(NF-1) like so where NF is the number fields for that line:

            awk  '{print $(NF-1)}' /tmp/genes.txt
            A2M
            ACADM
            

            Your posted example has spaces for delimiters. You may need to change the field separator to tabs if you file is truly tab delimited. Then it would be:

            awk  -F $'\t' {print $(NF-1)}' file_name
            

            If you want the number before that name:

            $ awk  '{print $(NF-2)}' /tmp/genes.txt
            9268558
            76229363
            
            qid & accept id: (30198481, 30207890) query: Pyyaml - Using different styles for keys and integers and strings soup:

            You can at least preserve the original flow/block style for the various elements with the normal yaml.dump() for some value of "normal".

            \n

            What you need is a loader that saves the flow/bcock style information while reading the data, subclass the normal types that have the style (mappings/dicts resp. sequences/lists) so that they behave like the python constructs normally returned by the loader, but have the style information attached. Then on the way out using yaml.dump you provide a custom dumper that takes this style information into account.

            \n

            I use the normal yaml.dump in my enhanced version of PyYAML called ruamel.yaml, but have special loader and dumper class RoundTripDumper (and a RoundTripLoader for yaml.load) that preserve the flow/block style (and any comments you might have in the file:

            \n
            import ruamel.yaml as yaml\n\ninfile = yaml.load(open('yamlfile'), Loader=yaml.RoundTripLoader)\n\nfor key, value in infile['main'].items():\n    if key == 'keepalivetimeout':\n        item = value['item']\n        item['keepalivetimeout'] = 400\n\nprint yaml.dump(infile, Dumper=yaml.RoundTripDumper)\n
            \n

            gives you:

            \n
            main:\n  directory:\n    options:\n      directive: options\n      item:\n        options: Stuff OtherStuff MoreStuff\n  directoryindex:\n    item:\n      directoryindex: stuff.htm otherstuff.htm morestuff.html\n  fileetag:\n    item:\n      fileetag: Stuff\n  keepalive:\n    item:\n      keepalive: Stuff\n  keepalivetimeout:\n    item:\n      keepalivetimeout: 400\n
            \n

            If you cannot install ruamel.yaml you can pull out the code from my repository and include it in your code, AFAIK PyYAML has not been upgraded since I started working on this.

            \n

            I currently don't preserve the superfluous quote on the scalars, but I do preserve the chomping information (for multiline statements starting with '|'. That information is thrown out really early on in the input processing of the YAML file and would require multiple changes to be preserved.

            \n

            Since you seem to be having different quotes for key and value string scalars, you can achieve the output you want by overriding process_scalar (part of the Emitter in emitter.py) to add the quotes based on the string scalar being a key or not and being an integer or not:

            \n
            import ruamel.yaml as yaml\n\n# the scalar emitter from emitter.py\ndef process_scalar(self):\n    if self.analysis is None:\n        self.analysis = self.analyze_scalar(self.event.value)\n    if self.style is None:\n        self.style = self.choose_scalar_style()\n    split = (not self.simple_key_context)\n    # VVVVVVVVVVVVVVVVVVVV added\n    try:\n        x = int(self.event.value)  # might need to expand this\n    except:\n        # we have string\n        if split:\n            self.style = "'"\n        else:\n            self.style = '"'\n    # ^^^^^^^^^^^^^^^^^^^^\n    # if self.analysis.multiline and split    \\n    #         and (not self.style or self.style in '\'\"'):\n    #     self.write_indent()\n    if self.style == '"':\n        self.write_double_quoted(self.analysis.scalar, split)\n    elif self.style == '\'':\n        self.write_single_quoted(self.analysis.scalar, split)\n    elif self.style == '>':\n        self.write_folded(self.analysis.scalar)\n    elif self.style == '|':\n        self.write_literal(self.analysis.scalar)\n    else:\n        self.write_plain(self.analysis.scalar, split)\n    self.analysis = None\n    self.style = None\n    if self.event.comment:\n        self.write_post_comment(self.event)\n\n\ninfile = yaml.load(open('yamlfile'), Loader=yaml.RoundTripLoader)\n\nfor key, value in infile['main'].items():\n    if key == 'keepalivetimeout':\n        item = value['item']\n        item['keepalivetimeout'] = 400\n\ndd = yaml.RoundTripDumper\ndd.process_scalar = process_scalar\n\nprint '---'\nprint yaml.dump(infile, Dumper=dd)\n
            \n

            gives you:

            \n
            ---\n"main":\n  "directory":\n    "options":\n      "directive": 'options'\n      "item":\n        "options": 'Stuff OtherStuff MoreStuff'\n  "directoryindex":\n    "item":\n      "directoryindex": 'stuff.htm otherstuff.htm morestuff.html'\n  "fileetag":\n    "item":\n      "fileetag": 'Stuff'\n  "keepalive":\n    "item":\n      "keepalive": 'Stuff'\n  "keepalivetimeout":\n    "item":\n      "keepalivetimeout": 400\n
            \n

            which is quite close to what you asked for.

            \n soup wrap:

            You can at least preserve the original flow/block style for the various elements with the normal yaml.dump() for some value of "normal".

            What you need is a loader that saves the flow/bcock style information while reading the data, subclass the normal types that have the style (mappings/dicts resp. sequences/lists) so that they behave like the python constructs normally returned by the loader, but have the style information attached. Then on the way out using yaml.dump you provide a custom dumper that takes this style information into account.

            I use the normal yaml.dump in my enhanced version of PyYAML called ruamel.yaml, but have special loader and dumper class RoundTripDumper (and a RoundTripLoader for yaml.load) that preserve the flow/block style (and any comments you might have in the file:

            import ruamel.yaml as yaml
            
            infile = yaml.load(open('yamlfile'), Loader=yaml.RoundTripLoader)
            
            for key, value in infile['main'].items():
                if key == 'keepalivetimeout':
                    item = value['item']
                    item['keepalivetimeout'] = 400
            
            print yaml.dump(infile, Dumper=yaml.RoundTripDumper)
            

            gives you:

            main:
              directory:
                options:
                  directive: options
                  item:
                    options: Stuff OtherStuff MoreStuff
              directoryindex:
                item:
                  directoryindex: stuff.htm otherstuff.htm morestuff.html
              fileetag:
                item:
                  fileetag: Stuff
              keepalive:
                item:
                  keepalive: Stuff
              keepalivetimeout:
                item:
                  keepalivetimeout: 400
            

            If you cannot install ruamel.yaml you can pull out the code from my repository and include it in your code, AFAIK PyYAML has not been upgraded since I started working on this.

            I currently don't preserve the superfluous quote on the scalars, but I do preserve the chomping information (for multiline statements starting with '|'. That information is thrown out really early on in the input processing of the YAML file and would require multiple changes to be preserved.

            Since you seem to be having different quotes for key and value string scalars, you can achieve the output you want by overriding process_scalar (part of the Emitter in emitter.py) to add the quotes based on the string scalar being a key or not and being an integer or not:

            import ruamel.yaml as yaml
            
            # the scalar emitter from emitter.py
            def process_scalar(self):
                if self.analysis is None:
                    self.analysis = self.analyze_scalar(self.event.value)
                if self.style is None:
                    self.style = self.choose_scalar_style()
                split = (not self.simple_key_context)
                # VVVVVVVVVVVVVVVVVVVV added
                try:
                    x = int(self.event.value)  # might need to expand this
                except:
                    # we have string
                    if split:
                        self.style = "'"
                    else:
                        self.style = '"'
                # ^^^^^^^^^^^^^^^^^^^^
                # if self.analysis.multiline and split    \
                #         and (not self.style or self.style in '\'\"'):
                #     self.write_indent()
                if self.style == '"':
                    self.write_double_quoted(self.analysis.scalar, split)
                elif self.style == '\'':
                    self.write_single_quoted(self.analysis.scalar, split)
                elif self.style == '>':
                    self.write_folded(self.analysis.scalar)
                elif self.style == '|':
                    self.write_literal(self.analysis.scalar)
                else:
                    self.write_plain(self.analysis.scalar, split)
                self.analysis = None
                self.style = None
                if self.event.comment:
                    self.write_post_comment(self.event)
            
            
            infile = yaml.load(open('yamlfile'), Loader=yaml.RoundTripLoader)
            
            for key, value in infile['main'].items():
                if key == 'keepalivetimeout':
                    item = value['item']
                    item['keepalivetimeout'] = 400
            
            dd = yaml.RoundTripDumper
            dd.process_scalar = process_scalar
            
            print '---'
            print yaml.dump(infile, Dumper=dd)
            

            gives you:

            ---
            "main":
              "directory":
                "options":
                  "directive": 'options'
                  "item":
                    "options": 'Stuff OtherStuff MoreStuff'
              "directoryindex":
                "item":
                  "directoryindex": 'stuff.htm otherstuff.htm morestuff.html'
              "fileetag":
                "item":
                  "fileetag": 'Stuff'
              "keepalive":
                "item":
                  "keepalive": 'Stuff'
              "keepalivetimeout":
                "item":
                  "keepalivetimeout": 400
            

            which is quite close to what you asked for.

            qid & accept id: (30209723, 30209830) query: Convert column elements to column name in pandas soup:

            You can use read_csv and specify header=None and pass the column names as a list:

            \n
            In [124]:\n\nt="""time1,stockA,bid,1\n time2,stockA,ask,1.1\n time3,stockB,ask,2.1\n time4,stockB,bid,2.0"""\n​\ndf = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])\ndf\nOut[124]:\n     time   stock  bid  ask\n0   time1  stockA  bid  1.0\n1   time2  stockA  ask  1.1\n2   time3  stockB  ask  2.1\n3   time4  stockB  bid  2.0\n
            \n

            You'll have to re-encode the bid column to 1 or 2:

            \n
            In [126]:\n\ndf['bid'] = df['bid'].replace('bid', 1)\ndf['bid'] = df['bid'].replace('ask', 2)\ndf\nOut[126]:\n     time   stock  bid  ask\n0   time1  stockA    1  1.0\n1   time2  stockA    2  1.1\n2   time3  stockB    2  2.1\n3   time4  stockB    1  2.0\n
            \n

            EDIT

            \n

            Based on your updated sample data and desired output the following works:

            \n
            In [29]:\n\nt="""time1,stockA,bid,1\n time2,stockA,ask,1.1\n time3,stockB,ask,2.1\n time4,stockB,bid,2.0\n time5,stockA,bid,1.1\n time6,stockA,ask,1.2"""\n​\ndf = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])\ndf\nOut[29]:\n     time   stock  bid  ask\n0   time1  stockA  bid  1.0\n1   time2  stockA  ask  1.1\n2   time3  stockB  ask  2.1\n3   time4  stockB  bid  2.0\n4   time5  stockA  bid  1.1\n5   time6  stockA  ask  1.2\nIn [30]:\n\ndf.loc[df['bid'] == 'bid', 'bid'] = df['ask']\ndf.loc[df['bid'] != 'ask', 'ask'] = ''\ndf.loc[df['bid'] == 'ask','bid'] = ''\ndf\nOut[30]:\n     time   stock  bid  ask\n0   time1  stockA    1     \n1   time2  stockA       1.1\n2   time3  stockB       2.1\n3   time4  stockB    2     \n4   time5  stockA  1.1     \n5   time6  stockA       1.2\n
            \n soup wrap:

            You can use read_csv and specify header=None and pass the column names as a list:

            In [124]:
            
            t="""time1,stockA,bid,1
             time2,stockA,ask,1.1
             time3,stockB,ask,2.1
             time4,stockB,bid,2.0"""
            ​
            df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
            df
            Out[124]:
                 time   stock  bid  ask
            0   time1  stockA  bid  1.0
            1   time2  stockA  ask  1.1
            2   time3  stockB  ask  2.1
            3   time4  stockB  bid  2.0
            

            You'll have to re-encode the bid column to 1 or 2:

            In [126]:
            
            df['bid'] = df['bid'].replace('bid', 1)
            df['bid'] = df['bid'].replace('ask', 2)
            df
            Out[126]:
                 time   stock  bid  ask
            0   time1  stockA    1  1.0
            1   time2  stockA    2  1.1
            2   time3  stockB    2  2.1
            3   time4  stockB    1  2.0
            

            EDIT

            Based on your updated sample data and desired output the following works:

            In [29]:
            
            t="""time1,stockA,bid,1
             time2,stockA,ask,1.1
             time3,stockB,ask,2.1
             time4,stockB,bid,2.0
             time5,stockA,bid,1.1
             time6,stockA,ask,1.2"""
            ​
            df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
            df
            Out[29]:
                 time   stock  bid  ask
            0   time1  stockA  bid  1.0
            1   time2  stockA  ask  1.1
            2   time3  stockB  ask  2.1
            3   time4  stockB  bid  2.0
            4   time5  stockA  bid  1.1
            5   time6  stockA  ask  1.2
            In [30]:
            
            df.loc[df['bid'] == 'bid', 'bid'] = df['ask']
            df.loc[df['bid'] != 'ask', 'ask'] = ''
            df.loc[df['bid'] == 'ask','bid'] = ''
            df
            Out[30]:
                 time   stock  bid  ask
            0   time1  stockA    1     
            1   time2  stockA       1.1
            2   time3  stockB       2.1
            3   time4  stockB    2     
            4   time5  stockA  1.1     
            5   time6  stockA       1.2
            
            qid & accept id: (30214489, 30214658) query: Split file and turn it into dictionary in python soup:

            You can use collections.Counter() which takes a text as input and returns a dictionary recording the frequency of each word in the file.

            \n

            sample.txt:

            \n
            hello this file is good\nfile is is good excellent\n
            \n

            And the code for reading and recording the frequency of words:

            \n
            import collections\nwith open("sample.txt", "r") as datafile:\n    lines = datafile.read()\n    words = lines.split()\n    words_hist = collections.Counter(words)\n    print words_hist\n
            \n

            Output:

            \n
            {'is': 3, 'good': 2, 'file': 2, 'this': 1, 'excellent': 1, 'hello': 1}\n
            \n

            As per your posted solution, It seems that, you are incorrectly reading the input file. So I have edited your approach a bit:

            \n
            counts = dict()\n\nwith open("sample.txt", "r") as datafile:\n    x = datafile.read().split()\n    for word in x:                               \n        words = word.split()\n        print words\n        counts[word] = counts.get(word,0) + 1\nprint counts\n
            \n soup wrap:

            You can use collections.Counter() which takes a text as input and returns a dictionary recording the frequency of each word in the file.

            sample.txt:

            hello this file is good
            file is is good excellent
            

            And the code for reading and recording the frequency of words:

            import collections
            with open("sample.txt", "r") as datafile:
                lines = datafile.read()
                words = lines.split()
                words_hist = collections.Counter(words)
                print words_hist
            

            Output:

            {'is': 3, 'good': 2, 'file': 2, 'this': 1, 'excellent': 1, 'hello': 1}
            

            As per your posted solution, It seems that, you are incorrectly reading the input file. So I have edited your approach a bit:

            counts = dict()
            
            with open("sample.txt", "r") as datafile:
                x = datafile.read().split()
                for word in x:                               
                    words = word.split()
                    print words
                    counts[word] = counts.get(word,0) + 1
            print counts
            
            qid & accept id: (30229104, 30229129) query: python - increase array size and initialize new elements to zero soup:

            In Python, if the input is a numpy array, you can use np.lib.pad to pad zeros around it -

            \n
            import numpy as np\n\nA = np.array([[1, 2 ],[2, 3]])   # Input\nA_new = np.lib.pad(A, ((0,1),(0,2)), 'constant', constant_values=(0)) # Output\n
            \n

            Sample run -

            \n
            In [7]: A  # Input: A numpy array\nOut[7]: \narray([[1, 2],\n       [2, 3]])\n\nIn [8]: np.lib.pad(A, ((0,1),(0,2)), 'constant', constant_values=(0))\nOut[8]: \narray([[1, 2, 0, 0],\n       [2, 3, 0, 0],\n       [0, 0, 0, 0]])  # Zero padded numpy array\n
            \n

            If you don't want to do the math of how many zeros to pad, you can let the code do it for you given the output array size -

            \n
            In [29]: A\nOut[29]: \narray([[1, 2],\n       [2, 3]])\n\nIn [30]: new_shape = (3,4)\n\nIn [31]: shape_diff = np.array(new_shape) - np.array(A.shape)\n\nIn [32]: np.lib.pad(A, ((0,shape_diff[0]),(0,shape_diff[1])), \n                              'constant', constant_values=(0))\nOut[32]: \narray([[1, 2, 0, 0],\n       [2, 3, 0, 0],\n       [0, 0, 0, 0]])\n
            \n

            Or, you can start off with a zero initialized output array and then put back those input elements from A -

            \n
            In [38]: A\nOut[38]: \narray([[1, 2],\n       [2, 3]])\n\nIn [39]: A_new = np.zeros(new_shape,dtype = A.dtype)\n\nIn [40]: A_new[0:A.shape[0],0:A.shape[1]] = A\n\nIn [41]: A_new\nOut[41]: \narray([[1, 2, 0, 0],\n       [2, 3, 0, 0],\n       [0, 0, 0, 0]])\n
            \n
            \n

            In MATLAB, you can use padarray -

            \n
            A_new  = padarray(A,[1 2],'post')\n
            \n

            Sample run -

            \n
            >> A\nA =\n     1     2\n     2     3\n>> A_new = padarray(A,[1 2],'post')\nA_new =\n     1     2     0     0\n     2     3     0     0\n     0     0     0     0\n
            \n soup wrap:

            In Python, if the input is a numpy array, you can use np.lib.pad to pad zeros around it -

            import numpy as np
            
            A = np.array([[1, 2 ],[2, 3]])   # Input
            A_new = np.lib.pad(A, ((0,1),(0,2)), 'constant', constant_values=(0)) # Output
            

            Sample run -

            In [7]: A  # Input: A numpy array
            Out[7]: 
            array([[1, 2],
                   [2, 3]])
            
            In [8]: np.lib.pad(A, ((0,1),(0,2)), 'constant', constant_values=(0))
            Out[8]: 
            array([[1, 2, 0, 0],
                   [2, 3, 0, 0],
                   [0, 0, 0, 0]])  # Zero padded numpy array
            

            If you don't want to do the math of how many zeros to pad, you can let the code do it for you given the output array size -

            In [29]: A
            Out[29]: 
            array([[1, 2],
                   [2, 3]])
            
            In [30]: new_shape = (3,4)
            
            In [31]: shape_diff = np.array(new_shape) - np.array(A.shape)
            
            In [32]: np.lib.pad(A, ((0,shape_diff[0]),(0,shape_diff[1])), 
                                          'constant', constant_values=(0))
            Out[32]: 
            array([[1, 2, 0, 0],
                   [2, 3, 0, 0],
                   [0, 0, 0, 0]])
            

            Or, you can start off with a zero initialized output array and then put back those input elements from A -

            In [38]: A
            Out[38]: 
            array([[1, 2],
                   [2, 3]])
            
            In [39]: A_new = np.zeros(new_shape,dtype = A.dtype)
            
            In [40]: A_new[0:A.shape[0],0:A.shape[1]] = A
            
            In [41]: A_new
            Out[41]: 
            array([[1, 2, 0, 0],
                   [2, 3, 0, 0],
                   [0, 0, 0, 0]])
            

            In MATLAB, you can use padarray -

            A_new  = padarray(A,[1 2],'post')
            

            Sample run -

            >> A
            A =
                 1     2
                 2     3
            >> A_new = padarray(A,[1 2],'post')
            A_new =
                 1     2     0     0
                 2     3     0     0
                 0     0     0     0
            
            qid & accept id: (30242208, 30252646) query: XOR neural network backprop soup:

            OK, so, first, here's the amended code to make yours work.

            \n
            #! /usr/bin/python\n\nimport numpy as np\n\ndef sigmoid(x):\n    return 1.0 / (1.0 + np.exp(-x))\n\nvec_sigmoid = np.vectorize(sigmoid)\n\n# Binesh - just cleaning it up, so you can easily change the number of hiddens.\n# Also, initializing with a heuristic from Yoshua Bengio.\n# In many places you were using matrix multiplication and elementwise multiplication\n# interchangably... You can't do that.. (So I explicitly changed everything to be\n# dot products and multiplies so it's clear.)\ninput_sz = 2;\nhidden_sz = 3;\noutput_sz = 1;\ntheta1 = np.matrix(0.5 * np.sqrt(6.0 / (input_sz+hidden_sz)) * (np.random.rand(1+input_sz,hidden_sz)-0.5))\ntheta2 = np.matrix(0.5 * np.sqrt(6.0 / (hidden_sz+output_sz)) * (np.random.rand(1+hidden_sz,output_sz)-0.5))\n\ndef fit(x, y, theta1, theta2, learn_rate=.1):\n    #forward pass\n    layer1 = np.matrix(x, dtype='f')\n    layer1 = np.c_[np.ones(1), layer1]\n    # Binesh - for layer2 we need to add a bias term.\n    layer2 = np.c_[np.ones(1), vec_sigmoid(layer1.dot(theta1))]\n    layer3 = sigmoid(layer2.dot(theta2))\n\n    #backprop\n    delta3 = y - layer3\n    # Binesh - In reality, this is the _negative_ derivative of the cross entropy function\n    # wrt the _input_ to the final sigmoid function.\n\n    delta2 = np.multiply(delta3.dot(theta2.T), np.multiply(layer2, (1-layer2)))\n    # Binesh - We actually don't use the delta for the bias term. (What would be the point?\n    # it has no inputs. Hence the line below.\n    delta2 = delta2[:,1:]\n\n    # But, delta's are just derivatives wrt the inputs to the sigmoid.\n    # We don't add those to theta directly. We have to multiply these by\n    # the preceding layer to get the theta2d's and theta1d's\n    theta2d = np.dot(layer2.T, delta3)\n    theta1d = np.dot(layer1.T, delta2)\n\n    #update weights\n    # Binesh - here you had delta3 and delta2... Those are not the\n    # the derivatives wrt the theta's, they are the derivatives wrt\n    # the inputs to the sigmoids.. (As I mention above)\n    theta2 += learn_rate * theta2d #??\n    theta1 += learn_rate * theta1d #??\n\ndef train(X, Y):\n    for _ in range(10000):\n        for i in range(4):\n            x = X[i]\n            y = Y[i]\n            fit(x, y, theta1, theta2)\n\n\n# Binesh - Here's a little test function to see that it actually works\ndef test(X):\n    for i in range(4):\n        layer1 = np.matrix(X[i],dtype='f')\n        layer1 = np.c_[np.ones(1), layer1]\n        layer2 = np.c_[np.ones(1), vec_sigmoid(layer1.dot(theta1))]\n        layer3 = sigmoid(layer2.dot(theta2))\n        print "%d xor %d = %.7f" % (layer1[0,1], layer1[0,2], layer3[0,0])\n\nX = [(0,0), (1,0), (0,1), (1,1)]\nY = [0, 1, 1, 0]    \ntrain(X, Y)\n\n# Binesh - Alright, let's see!\ntest(X)\n
            \n

            And, now for some explanation. Forgive the crude drawing. It was just easier to take a picture than draw something in gimp.

            \n

            Visual of WBC's xor neural network http://cablemodem.hex21.com/~binesh/WBC-XOR-nn-small.jpg

            \n

            So. First, we have our error function. We'll call this CE (for Cross Entropy. I'll try to use your variables where possible, tho, I'm going to use L1, L2 and L3 instead of layer1, layer2 and layer3. sigh (I don't know how to do latex here. It seems to work on the statistics stack exchange. weird.)

            \n
            CE = -(Y log(L3) + (1-Y) log(1-L3))\n
            \n

            We need to take the derivative of this wrt L3, so that we can see how we can move L3 so as to reduce this value.

            \n
            dCE/dL3 = -((Y/L3) - (1-Y)/(1-L3))\n        = -((Y(1-L3) - (1-Y)L3) / (L3(1-L3)))\n        = -(((Y-Y*L3) - (L3-Y*L3)) / (L3(1-L3)))\n        = -((Y-Y3*L3 + Y3*L3 - L3) / (L3(1-L3)))\n        = -((Y-L3) / (L3(1-L3)))\n        = ((L3-Y) / (L3(1-L3)))\n
            \n

            Great, but, actually, we can't just alter L3 as we see fit. L3 is a function of Z3 (See my picture).

            \n
            L3      = sigmoid(Z3)\ndL3/dZ3 = L3(1-L3)\n
            \n

            I'm not deriving this here, (the derivative of the sigmoid) but, it's actually not that hard to prove).

            \n

            But, anyway, that's the derivative of L3 wrt Z3, but we want the derivative of CE wrt Z3.

            \n
            dCE/dZ3 = (dCE/dL3) * (dL3/dZ3)\n        = ((L3-Y)/(L3(1-L3)) * (L3(1-L3)) # Hey, look at that. The denominator gets cancelled out and\n        = (L3-Y) # This is why in my comments I was saying what you are computing is the _negative_ derivative.\n
            \n

            We call the derivatives wrt Z's "deltas". So, in your code, this corresponds to delta3.

            \n

            Great, but we can't just change Z3 as we like either. We need to compute it's derivative wrt L2.

            \n

            But this is more complicated.

            \n
            Z3 = theta2(0) + theta2(1) * L2(1) + theta2(2) * L2(2) + theta2(3) * L2(3)\n
            \n

            So, we need to take partial derivatives wrt. L2(1), L2(2) and L2(3)

            \n
            dZ3/dL2(1) = theta2(1)\ndZ3/dL2(2) = theta2(2)\ndZ3/dL2(3) = theta2(3)\n
            \n

            Notice that the bias would effectively be

            \n
            dZ3/dBias  = theta2(0)\n
            \n

            but the bias never changes, it's always 1, so we can safely ignore it. But, our layer2 includes the bias, so we'll keep it for now.

            \n

            But, again, we want the derivative wrt Z2(0), Z2(1), Z2(2) (Looks like I drew that badly, unfortunately. Look at the graph, it'll be clearer with it, I think.)

            \n
            dL2(1)/dZ2(0) = L2(1) * (1-L2(1))\ndL2(2)/dZ2(1) = L2(2) * (1-L2(2))\ndL2(3)/dZ2(2) = L2(3) * (1-L2(3))\n
            \n

            What now is dCE/dZ2(0..2)

            \n
            dCE/dZ2(0) = dCE/dZ3 * dZ3/dL2(1) * dL2(1)/dZ2(0)\n           = (L3-Y)  * theta2(1)  * L2(1) * (1-L2(1))\n\ndCE/dZ2(1) = dCE/dZ3 * dZ3/dL2(2) * dL2(2)/dZ2(1)\n           = (L3-Y)  * theta2(2)  * L2(2) * (1-L2(2))\n\ndCE/dZ2(2) = dCE/dZ3 * dZ3/dL2(3) * dL2(3)/dZ2(2)\n           = (L3-Y)  * theta2(3)  * L2(3) * (1-L2(3))\n
            \n

            But, really we can express this as (delta3 * Transpose[theta2]) elemenwise multiplied by (L2 * (1-L2)) (where L2 is the vector)

            \n

            These are our delta2 layer. I remove the first entry of it, because as I mention above, it corresponds to the delta of the bias (what I label L2(0) on my graph.)

            \n

            So. Now, we have derivatives wrt our Z's, but, really, what we can modify are only our thetas.

            \n
            Z3 = theta2(0) + theta2(1) * L2(1) + theta2(2) * L2(2) + theta2(3) * L2(3)\ndZ3/dtheta2(0) = 1\ndZ3/dtheta2(1) = L2(1)\ndZ3/dtheta2(2) = L2(2)\ndZ3/dtheta2(3) = L2(3)\n
            \n

            Once again tho, we want dCE/dtheta2(0) tho, so that becomes

            \n
            dCE/dtheta2(0) = dCE/dZ3 * dZ3/dtheta2(0)\n               = (L3-Y) * 1\ndCE/dtheta2(1) = dCE/dZ3 * dZ3/dtheta2(1)\n               = (L3-Y) * L2(1)\ndCE/dtheta2(2) = dCE/dZ3 * dZ3/dtheta2(2)\n               = (L3-Y) * L2(2)\ndCE/dtheta2(3) = dCE/dZ3 * dZ3/dtheta2(3)\n               = (L3-Y) * L2(3)\n
            \n

            Well, this is just np.dot(layer2.T, delta3), and that's what I have in theta2d

            \n

            And, similarly:\n Z2(0) = theta1(0,0) + theta1(1,0) * L1(1) + theta1(2,0) * L1(2)\n dZ2(0)/dtheta1(0,0) = 1\n dZ2(0)/dtheta1(1,0) = L1(1)\n dZ2(0)/dtheta1(2,0) = L1(2)

            \n
            Z2(1) = theta1(0,1) + theta1(1,1) * L1(1) + theta1(2,1) * L1(2)\ndZ2(1)/dtheta1(0,1) = 1\ndZ2(1)/dtheta1(1,1) = L1(1)\ndZ2(1)/dtheta1(2,1) = L1(2)\n\nZ2(2) = theta1(0,2) + theta1(1,2) * L1(1) + theta1(2,2) * L1(2)\ndZ2(2)/dtheta1(0,2) = 1\ndZ2(2)/dtheta1(1,2) = L1(1)\ndZ2(2)/dtheta1(2,2) = L1(2)\n
            \n

            And, we'd have to multiply by dCE/dZ2(0), dCE/dZ2(1) and dCE/dZ2(2) (for each of the three groups up there. But, if you think about that, that then just becomes np.dot(layer1.T, delta2), and that's what I have in theta1d.

            \n

            Now, because you did Y-L3 in your code, you're adding to theta1 and theta2... But, here's the reasoning. What we just computed above is the derivative of CE wrt the weights. So, that means, increasing the weights by will increase the CE. But, we really want to decrease the CE.. So, we subtract (normally). But, because in your code, you're computing the negative derivative, it is right that you add.

            \n

            Does that make sense?

            \n soup wrap:

            OK, so, first, here's the amended code to make yours work.

            #! /usr/bin/python
            
            import numpy as np
            
            def sigmoid(x):
                return 1.0 / (1.0 + np.exp(-x))
            
            vec_sigmoid = np.vectorize(sigmoid)
            
            # Binesh - just cleaning it up, so you can easily change the number of hiddens.
            # Also, initializing with a heuristic from Yoshua Bengio.
            # In many places you were using matrix multiplication and elementwise multiplication
            # interchangably... You can't do that.. (So I explicitly changed everything to be
            # dot products and multiplies so it's clear.)
            input_sz = 2;
            hidden_sz = 3;
            output_sz = 1;
            theta1 = np.matrix(0.5 * np.sqrt(6.0 / (input_sz+hidden_sz)) * (np.random.rand(1+input_sz,hidden_sz)-0.5))
            theta2 = np.matrix(0.5 * np.sqrt(6.0 / (hidden_sz+output_sz)) * (np.random.rand(1+hidden_sz,output_sz)-0.5))
            
            def fit(x, y, theta1, theta2, learn_rate=.1):
                #forward pass
                layer1 = np.matrix(x, dtype='f')
                layer1 = np.c_[np.ones(1), layer1]
                # Binesh - for layer2 we need to add a bias term.
                layer2 = np.c_[np.ones(1), vec_sigmoid(layer1.dot(theta1))]
                layer3 = sigmoid(layer2.dot(theta2))
            
                #backprop
                delta3 = y - layer3
                # Binesh - In reality, this is the _negative_ derivative of the cross entropy function
                # wrt the _input_ to the final sigmoid function.
            
                delta2 = np.multiply(delta3.dot(theta2.T), np.multiply(layer2, (1-layer2)))
                # Binesh - We actually don't use the delta for the bias term. (What would be the point?
                # it has no inputs. Hence the line below.
                delta2 = delta2[:,1:]
            
                # But, delta's are just derivatives wrt the inputs to the sigmoid.
                # We don't add those to theta directly. We have to multiply these by
                # the preceding layer to get the theta2d's and theta1d's
                theta2d = np.dot(layer2.T, delta3)
                theta1d = np.dot(layer1.T, delta2)
            
                #update weights
                # Binesh - here you had delta3 and delta2... Those are not the
                # the derivatives wrt the theta's, they are the derivatives wrt
                # the inputs to the sigmoids.. (As I mention above)
                theta2 += learn_rate * theta2d #??
                theta1 += learn_rate * theta1d #??
            
            def train(X, Y):
                for _ in range(10000):
                    for i in range(4):
                        x = X[i]
                        y = Y[i]
                        fit(x, y, theta1, theta2)
            
            
            # Binesh - Here's a little test function to see that it actually works
            def test(X):
                for i in range(4):
                    layer1 = np.matrix(X[i],dtype='f')
                    layer1 = np.c_[np.ones(1), layer1]
                    layer2 = np.c_[np.ones(1), vec_sigmoid(layer1.dot(theta1))]
                    layer3 = sigmoid(layer2.dot(theta2))
                    print "%d xor %d = %.7f" % (layer1[0,1], layer1[0,2], layer3[0,0])
            
            X = [(0,0), (1,0), (0,1), (1,1)]
            Y = [0, 1, 1, 0]    
            train(X, Y)
            
            # Binesh - Alright, let's see!
            test(X)
            

            And, now for some explanation. Forgive the crude drawing. It was just easier to take a picture than draw something in gimp.

            Visual of WBC's xor neural network http://cablemodem.hex21.com/~binesh/WBC-XOR-nn-small.jpg

            So. First, we have our error function. We'll call this CE (for Cross Entropy. I'll try to use your variables where possible, tho, I'm going to use L1, L2 and L3 instead of layer1, layer2 and layer3. sigh (I don't know how to do latex here. It seems to work on the statistics stack exchange. weird.)

            CE = -(Y log(L3) + (1-Y) log(1-L3))
            

            We need to take the derivative of this wrt L3, so that we can see how we can move L3 so as to reduce this value.

            dCE/dL3 = -((Y/L3) - (1-Y)/(1-L3))
                    = -((Y(1-L3) - (1-Y)L3) / (L3(1-L3)))
                    = -(((Y-Y*L3) - (L3-Y*L3)) / (L3(1-L3)))
                    = -((Y-Y3*L3 + Y3*L3 - L3) / (L3(1-L3)))
                    = -((Y-L3) / (L3(1-L3)))
                    = ((L3-Y) / (L3(1-L3)))
            

            Great, but, actually, we can't just alter L3 as we see fit. L3 is a function of Z3 (See my picture).

            L3      = sigmoid(Z3)
            dL3/dZ3 = L3(1-L3)
            

            I'm not deriving this here, (the derivative of the sigmoid) but, it's actually not that hard to prove).

            But, anyway, that's the derivative of L3 wrt Z3, but we want the derivative of CE wrt Z3.

            dCE/dZ3 = (dCE/dL3) * (dL3/dZ3)
                    = ((L3-Y)/(L3(1-L3)) * (L3(1-L3)) # Hey, look at that. The denominator gets cancelled out and
                    = (L3-Y) # This is why in my comments I was saying what you are computing is the _negative_ derivative.
            

            We call the derivatives wrt Z's "deltas". So, in your code, this corresponds to delta3.

            Great, but we can't just change Z3 as we like either. We need to compute it's derivative wrt L2.

            But this is more complicated.

            Z3 = theta2(0) + theta2(1) * L2(1) + theta2(2) * L2(2) + theta2(3) * L2(3)
            

            So, we need to take partial derivatives wrt. L2(1), L2(2) and L2(3)

            dZ3/dL2(1) = theta2(1)
            dZ3/dL2(2) = theta2(2)
            dZ3/dL2(3) = theta2(3)
            

            Notice that the bias would effectively be

            dZ3/dBias  = theta2(0)
            

            but the bias never changes, it's always 1, so we can safely ignore it. But, our layer2 includes the bias, so we'll keep it for now.

            But, again, we want the derivative wrt Z2(0), Z2(1), Z2(2) (Looks like I drew that badly, unfortunately. Look at the graph, it'll be clearer with it, I think.)

            dL2(1)/dZ2(0) = L2(1) * (1-L2(1))
            dL2(2)/dZ2(1) = L2(2) * (1-L2(2))
            dL2(3)/dZ2(2) = L2(3) * (1-L2(3))
            

            What now is dCE/dZ2(0..2)

            dCE/dZ2(0) = dCE/dZ3 * dZ3/dL2(1) * dL2(1)/dZ2(0)
                       = (L3-Y)  * theta2(1)  * L2(1) * (1-L2(1))
            
            dCE/dZ2(1) = dCE/dZ3 * dZ3/dL2(2) * dL2(2)/dZ2(1)
                       = (L3-Y)  * theta2(2)  * L2(2) * (1-L2(2))
            
            dCE/dZ2(2) = dCE/dZ3 * dZ3/dL2(3) * dL2(3)/dZ2(2)
                       = (L3-Y)  * theta2(3)  * L2(3) * (1-L2(3))
            

            But, really we can express this as (delta3 * Transpose[theta2]) elemenwise multiplied by (L2 * (1-L2)) (where L2 is the vector)

            These are our delta2 layer. I remove the first entry of it, because as I mention above, it corresponds to the delta of the bias (what I label L2(0) on my graph.)

            So. Now, we have derivatives wrt our Z's, but, really, what we can modify are only our thetas.

            Z3 = theta2(0) + theta2(1) * L2(1) + theta2(2) * L2(2) + theta2(3) * L2(3)
            dZ3/dtheta2(0) = 1
            dZ3/dtheta2(1) = L2(1)
            dZ3/dtheta2(2) = L2(2)
            dZ3/dtheta2(3) = L2(3)
            

            Once again tho, we want dCE/dtheta2(0) tho, so that becomes

            dCE/dtheta2(0) = dCE/dZ3 * dZ3/dtheta2(0)
                           = (L3-Y) * 1
            dCE/dtheta2(1) = dCE/dZ3 * dZ3/dtheta2(1)
                           = (L3-Y) * L2(1)
            dCE/dtheta2(2) = dCE/dZ3 * dZ3/dtheta2(2)
                           = (L3-Y) * L2(2)
            dCE/dtheta2(3) = dCE/dZ3 * dZ3/dtheta2(3)
                           = (L3-Y) * L2(3)
            

            Well, this is just np.dot(layer2.T, delta3), and that's what I have in theta2d

            And, similarly: Z2(0) = theta1(0,0) + theta1(1,0) * L1(1) + theta1(2,0) * L1(2) dZ2(0)/dtheta1(0,0) = 1 dZ2(0)/dtheta1(1,0) = L1(1) dZ2(0)/dtheta1(2,0) = L1(2)

            Z2(1) = theta1(0,1) + theta1(1,1) * L1(1) + theta1(2,1) * L1(2)
            dZ2(1)/dtheta1(0,1) = 1
            dZ2(1)/dtheta1(1,1) = L1(1)
            dZ2(1)/dtheta1(2,1) = L1(2)
            
            Z2(2) = theta1(0,2) + theta1(1,2) * L1(1) + theta1(2,2) * L1(2)
            dZ2(2)/dtheta1(0,2) = 1
            dZ2(2)/dtheta1(1,2) = L1(1)
            dZ2(2)/dtheta1(2,2) = L1(2)
            

            And, we'd have to multiply by dCE/dZ2(0), dCE/dZ2(1) and dCE/dZ2(2) (for each of the three groups up there. But, if you think about that, that then just becomes np.dot(layer1.T, delta2), and that's what I have in theta1d.

            Now, because you did Y-L3 in your code, you're adding to theta1 and theta2... But, here's the reasoning. What we just computed above is the derivative of CE wrt the weights. So, that means, increasing the weights by will increase the CE. But, we really want to decrease the CE.. So, we subtract (normally). But, because in your code, you're computing the negative derivative, it is right that you add.

            Does that make sense?

            qid & accept id: (30268750, 30268762) query: how to sort a list of tuples with list[i][1] as key from biggest to smallest soup:

            The obvious way is to explicitly use the reverse parameter which exists precisely for that purpose:

            \n
            sorted(alpha_items, key=lambda x: x[1], reverse=True)\n
            \n

            If you're sorting by numbers, you can also just negate them:

            \n
            sorted(alpha_items, key=lambda x: -x[1])\n
            \n soup wrap:

            The obvious way is to explicitly use the reverse parameter which exists precisely for that purpose:

            sorted(alpha_items, key=lambda x: x[1], reverse=True)
            

            If you're sorting by numbers, you can also just negate them:

            sorted(alpha_items, key=lambda x: -x[1])
            
            qid & accept id: (30272538, 30281079) query: Python code for counting number of zero crossings in an array soup:

            This produces the same result:

            \n
            import numpy as np\nmy_array = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3,  \n                     89.2, -154.1, 121.4, -85.1, 96.8, 68.2])\n((my_array[:-1] * my_array[1:]) < 0).sum()\n
            \n

            gives:

            \n
            8\n
            \n

            and seems to be the fastest solution:

            \n
            %timeit ((my_array[:-1] * my_array[1:]) < 0).sum()\n100000 loops, best of 3: 11.6 µs per loop\n
            \n

            Compared to the fastest so far:

            \n
            %timeit (np.diff(np.sign(my_array)) != 0).sum()\n10000 loops, best of 3: 22.2 µs per loop\n
            \n

            Also for larger arrays:

            \n
            big = np.random.randint(-10, 10, size=10000000)\n
            \n

            this:

            \n
            %timeit ((big[:-1] * big[1:]) < 0).sum()\n10 loops, best of 3: 62.1 ms per loop\n
            \n

            vs:

            \n
            %timeit (np.diff(np.sign(big)) != 0).sum()\n1 loops, best of 3: 97.6 ms per loop\n
            \n soup wrap:

            This produces the same result:

            import numpy as np
            my_array = np.array([80.6, 120.8, -115.6, -76.1, 131.3, 105.1, 138.4, -81.3, -95.3,  
                                 89.2, -154.1, 121.4, -85.1, 96.8, 68.2])
            ((my_array[:-1] * my_array[1:]) < 0).sum()
            

            gives:

            8
            

            and seems to be the fastest solution:

            %timeit ((my_array[:-1] * my_array[1:]) < 0).sum()
            100000 loops, best of 3: 11.6 µs per loop
            

            Compared to the fastest so far:

            %timeit (np.diff(np.sign(my_array)) != 0).sum()
            10000 loops, best of 3: 22.2 µs per loop
            

            Also for larger arrays:

            big = np.random.randint(-10, 10, size=10000000)
            

            this:

            %timeit ((big[:-1] * big[1:]) < 0).sum()
            10 loops, best of 3: 62.1 ms per loop
            

            vs:

            %timeit (np.diff(np.sign(big)) != 0).sum()
            1 loops, best of 3: 97.6 ms per loop
            
            qid & accept id: (30296726, 30297450) query: python - parsing and sorting dates soup:

            For just 10 megs, I will definitly go with a in memory sort. I would parse the HTML with Beautiful Soup, create a array of object with the given class :

            \n
            class Chat:\n    def __init__(self, user, date, text):\n        self.user = user\n        self.date = date\n        self.text = text\n
            \n

            And sort the array with :

            \n
            ut.sort(key=lambda x: x.date, reverse=True)\n
            \n

            But if the order is perfect reverse in the original file and you do not want to use a lot of memory, you could read the file chat by chat and insert each chat at the beginning of your result file.

            \n soup wrap:

            For just 10 megs, I will definitly go with a in memory sort. I would parse the HTML with Beautiful Soup, create a array of object with the given class :

            class Chat:
                def __init__(self, user, date, text):
                    self.user = user
                    self.date = date
                    self.text = text
            

            And sort the array with :

            ut.sort(key=lambda x: x.date, reverse=True)
            

            But if the order is perfect reverse in the original file and you do not want to use a lot of memory, you could read the file chat by chat and insert each chat at the beginning of your result file.

            qid & accept id: (30329252, 30329783) query: Calculate a point along a line segment one unit from a end of the seg soup:

            This is kind of hitting a nail with a sledge hammer, but if you're going to be running into geometry problems often, I'd either write or find a Point/Vector class like

            \n
            import math\nclass Vector():\n    def __init__(self, x=0.0, y=0.0, z=0.0):\n        self.x = x\n        self.y = y\n        self.z = z\n\n    def __add__(self, other):\n        self.x += other.x\n        self.y += other.y\n        self.z += other.z\n        return self\n\n    def __sub__(self, other):\n        self.x -= other.x\n        self.y -= other.y\n        self.z -= other.z\n        return self\n\n    def dot(self, other):\n        return self.x*other.x + self.y*other.y + self.z*other.z\n\n    def cross(self, other):\n        tempX = self.y*other.z - self.z*other.y\n        tempY = self.z*other.x - solf.x*other.z\n        tempZ = self.x*other.y - self.y*other.x\n        return Vector(tempX, tempY, tempZ)\n\n    def dist(self, other):\n        return math.sqrt((self.x-other.x)**2 + (self.y-other.y)**2 + (self.z-other.z)**2)\n\n    def unitVector(self):\n        mag = self.dist(Vector())\n        if mag != 0.0:\n            return Vector(self.x * 1.0/mag, self.y * 1.0/mag, self.z * 1.0/mag)\n        else:\n            return Vector()\n\n    def __repr__(self):\n        return str([self.x, self.y, self.z])\n
            \n

            Then you can do all kinds of stuff like find the vector by subtracting two points

            \n
            >>> a = Vector(4,5,0)\n>>> b = Vector(5,6,0)\n>>> b - a\n[1, 1, 0]\n
            \n

            Or adding an arbitrary unit vector to a point to find a new point (which is the answer to your original question)

            \n
            >>> a = Vector(4,5,0)\n>>> direction = Vector(10, 1, 0).unitVector()\n>>> a + direction\n[4.995037190209989, 5.099503719020999, 0.0]\n
            \n

            You can add more utilities, like allowing Vector/Scalar operations for scaling, etc.

            \n soup wrap:

            This is kind of hitting a nail with a sledge hammer, but if you're going to be running into geometry problems often, I'd either write or find a Point/Vector class like

            import math
            class Vector():
                def __init__(self, x=0.0, y=0.0, z=0.0):
                    self.x = x
                    self.y = y
                    self.z = z
            
                def __add__(self, other):
                    self.x += other.x
                    self.y += other.y
                    self.z += other.z
                    return self
            
                def __sub__(self, other):
                    self.x -= other.x
                    self.y -= other.y
                    self.z -= other.z
                    return self
            
                def dot(self, other):
                    return self.x*other.x + self.y*other.y + self.z*other.z
            
                def cross(self, other):
                    tempX = self.y*other.z - self.z*other.y
                    tempY = self.z*other.x - solf.x*other.z
                    tempZ = self.x*other.y - self.y*other.x
                    return Vector(tempX, tempY, tempZ)
            
                def dist(self, other):
                    return math.sqrt((self.x-other.x)**2 + (self.y-other.y)**2 + (self.z-other.z)**2)
            
                def unitVector(self):
                    mag = self.dist(Vector())
                    if mag != 0.0:
                        return Vector(self.x * 1.0/mag, self.y * 1.0/mag, self.z * 1.0/mag)
                    else:
                        return Vector()
            
                def __repr__(self):
                    return str([self.x, self.y, self.z])
            

            Then you can do all kinds of stuff like find the vector by subtracting two points

            >>> a = Vector(4,5,0)
            >>> b = Vector(5,6,0)
            >>> b - a
            [1, 1, 0]
            

            Or adding an arbitrary unit vector to a point to find a new point (which is the answer to your original question)

            >>> a = Vector(4,5,0)
            >>> direction = Vector(10, 1, 0).unitVector()
            >>> a + direction
            [4.995037190209989, 5.099503719020999, 0.0]
            

            You can add more utilities, like allowing Vector/Scalar operations for scaling, etc.

            qid & accept id: (30338961, 30363313) query: 3x1 Matrix Multiplication with lists[UPDATED] soup:

            Answer assumes Python 2.7x

            \n

            Data:

            \n
            key=[[16, 4, 11], [8, 6, 18], [15, 19, 15]]\nmessage=[[0], [12], [8], [6], [15], [2], [15], [13], [3], [21], [2], [20], [15], [18], [8]]\n
            \n

            One thing that seems to complicate things is that message is a list of lists, to make things easier it will need to be flattened at some point.

            \n

            I'll use an itertools recipe to get chunks of the message because I'm going to use a generator to flatten the message. There are other ways to flatten a list found in the answers to: How do you split a list into evenly sized chunks in Python?

            \n
            import itertools\ndef grouper(iterable, n, fillvalue=None):\n    "Collect data into fixed-length chunks or blocks"\n    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx\n    args = [iter(iterable)] * n\n    return itertools.izip_longest(fillvalue=fillvalue, *args)\n# flatten the message [[1],[2]] -> [1,2]\nmessage = (item for thing in message for item in thing)\n
            \n

            zip is another useful function, it is similar to a transposition:

            \n
            >>> zip([1,2],[3,4])\n[(1, 3), (2, 4)]\n>>> \n
            \n

            Now we'll do a matrix multiplication with the key and each, 3 item, chunk of the message.

            \n
            for group in grouper(message, n):\n    # matrix multiplication\n    for row in key:\n        sum_prod = 0\n        for a, b in zip(group, row):\n            sum_prod += a*b\n        #print(group, row, sum_prod, sum_prod % 26)\n        result.append(sum_prod)\n        #result.append(sum_prod % 26)\n
            \n soup wrap:

            Answer assumes Python 2.7x

            Data:

            key=[[16, 4, 11], [8, 6, 18], [15, 19, 15]]
            message=[[0], [12], [8], [6], [15], [2], [15], [13], [3], [21], [2], [20], [15], [18], [8]]
            

            One thing that seems to complicate things is that message is a list of lists, to make things easier it will need to be flattened at some point.

            I'll use an itertools recipe to get chunks of the message because I'm going to use a generator to flatten the message. There are other ways to flatten a list found in the answers to: How do you split a list into evenly sized chunks in Python?

            import itertools
            def grouper(iterable, n, fillvalue=None):
                "Collect data into fixed-length chunks or blocks"
                # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
                args = [iter(iterable)] * n
                return itertools.izip_longest(fillvalue=fillvalue, *args)
            # flatten the message [[1],[2]] -> [1,2]
            message = (item for thing in message for item in thing)
            

            zip is another useful function, it is similar to a transposition:

            >>> zip([1,2],[3,4])
            [(1, 3), (2, 4)]
            >>> 
            

            Now we'll do a matrix multiplication with the key and each, 3 item, chunk of the message.

            for group in grouper(message, n):
                # matrix multiplication
                for row in key:
                    sum_prod = 0
                    for a, b in zip(group, row):
                        sum_prod += a*b
                    #print(group, row, sum_prod, sum_prod % 26)
                    result.append(sum_prod)
                    #result.append(sum_prod % 26)
            
            qid & accept id: (30365563, 30366232) query: Join list of dict with a dict in python soup:

            Your code has the side-effect that it updates the same instance of each dictionary object, resulting in both list1 and nlist referring to the same dictionary objects. After execution of your code:

            \n
            >>> list1\n[{'ip': '12.12.12.12', 'code': '123', 'mask': '255.255.255.255'}, {'ip': '12.12.12.11', 'code': '345', 'mask': '255.255.255.255'}]\n>>> nlist\n[{'ip': '12.12.12.12', 'code': '123', 'mask': '255.255.255.255'}, {'ip': '12.12.12.11', 'code': '345', 'mask': '255.255.255.255'}\n>>> [list1[i] is nlist[i] for i in range(len(list1))]\n[True, True]\n>>> list1[0]['newkey'] = 'value'\n>>> list1[0]\n{'ip': '12.12.12.12', 'code': '123', 'mask': '255.255.255.255', 'newkey': 'value'}\n>>> nlist[0]\n{'ip': '12.12.12.12', 'code': '123', 'mask': '255.255.255.255', 'newkey': 'value'}\n
            \n

            The above shows that the dicts are the same, and so any update is reflected in both lists.

            \n

            If you simply wanted to update all the dictionaries in list1, do this:

            \n
            dict2 = {"mask": "255.255.255.255"}\nfor d in list1:\n    d.update(dict2)\n
            \n soup wrap:

            Your code has the side-effect that it updates the same instance of each dictionary object, resulting in both list1 and nlist referring to the same dictionary objects. After execution of your code:

            >>> list1
            [{'ip': '12.12.12.12', 'code': '123', 'mask': '255.255.255.255'}, {'ip': '12.12.12.11', 'code': '345', 'mask': '255.255.255.255'}]
            >>> nlist
            [{'ip': '12.12.12.12', 'code': '123', 'mask': '255.255.255.255'}, {'ip': '12.12.12.11', 'code': '345', 'mask': '255.255.255.255'}
            >>> [list1[i] is nlist[i] for i in range(len(list1))]
            [True, True]
            >>> list1[0]['newkey'] = 'value'
            >>> list1[0]
            {'ip': '12.12.12.12', 'code': '123', 'mask': '255.255.255.255', 'newkey': 'value'}
            >>> nlist[0]
            {'ip': '12.12.12.12', 'code': '123', 'mask': '255.255.255.255', 'newkey': 'value'}
            

            The above shows that the dicts are the same, and so any update is reflected in both lists.

            If you simply wanted to update all the dictionaries in list1, do this:

            dict2 = {"mask": "255.255.255.255"}
            for d in list1:
                d.update(dict2)
            
            qid & accept id: (30397315, 30397427) query: index by comparision of two numpy arrays in python soup:

            If I understand what you're after you can just create arrays from the lists and compare directly, you can then get the count by calling sum:

            \n
            In [161]:\n\na=[1,'aaa', 'bbb', 'vvv', 'www']\nb=[2,'qqq', 'bbb', 'ppp', 'www']\nA = np.array(a)\nB = np.array(b)\nsum(A==B)\nOut[161]:\n2\n
            \n

            When using performing equality comparison this will produce a boolean array:

            \n
            In [166]:\n\nA==B\nOut[166]:\narray([False, False,  True, False,  True], dtype=bool)\n
            \n

            when you call sum on this the True values are cast to 1 and the False are cast to 0 allowing you to sum the True values

            \n

            EDIT

            \n

            It will be more performant to just call .sum() on the np.array:

            \n
            In [173]:\n\na=[1,'aaa', 'bbb', 'vvv', 'www']\na *=100\nb=[2,'qqq', 'bbb', 'ppp', 'www']\nb *=100\nA = np.array(a)\nB = np.array(b)\n%timeit (A==B).sum()\n%timeit sum(A==B)\nThe slowest run took 2784.03 times longer than the fastest. This could mean that an intermediate result is being cached \n100000 loops, best of 3: 11.4 µs per loop\n1000 loops, best of 3: 1.34 ms per loop\n
            \n

            the top-level sum is significantly slower which is to be expected.

            \n soup wrap:

            If I understand what you're after you can just create arrays from the lists and compare directly, you can then get the count by calling sum:

            In [161]:
            
            a=[1,'aaa', 'bbb', 'vvv', 'www']
            b=[2,'qqq', 'bbb', 'ppp', 'www']
            A = np.array(a)
            B = np.array(b)
            sum(A==B)
            Out[161]:
            2
            

            When using performing equality comparison this will produce a boolean array:

            In [166]:
            
            A==B
            Out[166]:
            array([False, False,  True, False,  True], dtype=bool)
            

            when you call sum on this the True values are cast to 1 and the False are cast to 0 allowing you to sum the True values

            EDIT

            It will be more performant to just call .sum() on the np.array:

            In [173]:
            
            a=[1,'aaa', 'bbb', 'vvv', 'www']
            a *=100
            b=[2,'qqq', 'bbb', 'ppp', 'www']
            b *=100
            A = np.array(a)
            B = np.array(b)
            %timeit (A==B).sum()
            %timeit sum(A==B)
            The slowest run took 2784.03 times longer than the fastest. This could mean that an intermediate result is being cached 
            100000 loops, best of 3: 11.4 µs per loop
            1000 loops, best of 3: 1.34 ms per loop
            

            the top-level sum is significantly slower which is to be expected.

            qid & accept id: (30405409, 30405552) query: Convert multichar %xx escapes to unicode soup:

            The problem is that what %C3%B1 means depends on the encoding of the string.

            \n

            As Unicode, it means ñ. As Latin-1, it also means ñ. As UTF-8, it means ñ.

            \n

            So, you need to unescape those characters before decoding from UTF-8.

            \n

            In other words, somewhere, you're doing the equivalent of:

            \n
            u = urllib.unquote(s.decode('utf-8'))\n
            \n

            Don't do that. You should be doing:

            \n
            u = urllib.unquote(s).decode('utf-8')\n
            \n
            \n

            If some framework you're using has already decoded the string before you get to see it, re-encode it, unquote it, and re-decode it:

            \n
            u = urllib.unquote(u.encode('utf-8')).decode('utf-8')\n
            \n

            But it would be better to not have the framework hand you charset-decoded but still quote-encoded strings in the first place.

            \n soup wrap:

            The problem is that what %C3%B1 means depends on the encoding of the string.

            As Unicode, it means ñ. As Latin-1, it also means ñ. As UTF-8, it means ñ.

            So, you need to unescape those characters before decoding from UTF-8.

            In other words, somewhere, you're doing the equivalent of:

            u = urllib.unquote(s.decode('utf-8'))
            

            Don't do that. You should be doing:

            u = urllib.unquote(s).decode('utf-8')
            

            If some framework you're using has already decoded the string before you get to see it, re-encode it, unquote it, and re-decode it:

            u = urllib.unquote(u.encode('utf-8')).decode('utf-8')
            

            But it would be better to not have the framework hand you charset-decoded but still quote-encoded strings in the first place.

            qid & accept id: (30406324, 30408259) query: Gurobi, How to change a continuous variable to a binary variable soup:

            In the gurobi python API, you can simply set the vtype attribute on the variable. It is easy if you save a reference to the variable In your case, if you create a varaible

            \n
            x = m.addVar(lb=0, ub=1, vtype=GRB.CONTINUOUS)\n
            \n

            You can set it's attribe

            \n
            x.vtype = GRB.BINARY\n
            \n

            You can see it work in this longer example.

            \n
            import gurobipy  as grb\nGRB = grb.GRB\nm = grb.Model()\n\nx = m.addVar(0.0, 1.0, vtype=GRB.CONTINUOUS)\ny = m.addVar(0.0, 1.0, vtype=GRB.CONTINUOUS)\nm.update()\n# add constraints so that y >= |x - 0.75|\nm.addConstr(y >= x-0.75)\nm.addConstr(y >= 0.75 - x)\nm.setObjective(y)\nm.update()\nm.optimize()\nprint x.X\n# 0.75\nx.vtype=GRB.BINARY\nm.optimize()\nprint x.X\n# 1.0\n
            \n

            In the first solve, x was continuous, so the optimal value for x was 0.75. In the second solve, x was binary, so the optimal value for x was 1.0.

            \n soup wrap:

            In the gurobi python API, you can simply set the vtype attribute on the variable. It is easy if you save a reference to the variable In your case, if you create a varaible

            x = m.addVar(lb=0, ub=1, vtype=GRB.CONTINUOUS)
            

            You can set it's attribe

            x.vtype = GRB.BINARY
            

            You can see it work in this longer example.

            import gurobipy  as grb
            GRB = grb.GRB
            m = grb.Model()
            
            x = m.addVar(0.0, 1.0, vtype=GRB.CONTINUOUS)
            y = m.addVar(0.0, 1.0, vtype=GRB.CONTINUOUS)
            m.update()
            # add constraints so that y >= |x - 0.75|
            m.addConstr(y >= x-0.75)
            m.addConstr(y >= 0.75 - x)
            m.setObjective(y)
            m.update()
            m.optimize()
            print x.X
            # 0.75
            x.vtype=GRB.BINARY
            m.optimize()
            print x.X
            # 1.0
            

            In the first solve, x was continuous, so the optimal value for x was 0.75. In the second solve, x was binary, so the optimal value for x was 1.0.

            qid & accept id: (30411388, 30414849) query: More efficient solution? Dictionary as sparse vector soup:

            Perhaps pandas is what you're looking for:

            \n
            d1 = pandas.DataFrame(numpy.array([1, 4]), index=['a', 'b'], dtype="int32")\nd2 = pandas.DataFrame(numpy.array([2, 2]), index=['a', 'c'], dtype="int32")\n\nd1.add(d2, fill_value=0)\n
            \n

            result:

            \n
               0\na  3\nb  4\nc  2\n
            \n soup wrap:

            Perhaps pandas is what you're looking for:

            d1 = pandas.DataFrame(numpy.array([1, 4]), index=['a', 'b'], dtype="int32")
            d2 = pandas.DataFrame(numpy.array([2, 2]), index=['a', 'c'], dtype="int32")
            
            d1.add(d2, fill_value=0)
            

            result:

               0
            a  3
            b  4
            c  2
            
            qid & accept id: (30414578, 30417506) query: Is there a mercurial command which can generate a clone without largefiles? soup:

            Simply use

            \n
            hg lfconvert --to-normal  \n
            \n

            This will convert the repository in directory to a repository in directory with all large files turned back into normal files. Revision hashes will change, but otherwise, the revision history should remain intact.

            \n

            If you actually want to first strip all large files from the repository and lose all information association associated with them (i.e. if your intent is to destroy the large files rather than keep them), first run:

            \n
            hg convert --filemap   \n
            \n

            where is the path to a file containing the single line:

            \n
            exclude .hglf\n
            \n

            and is the original repository and the target directory for the conversion.

            \n

            This conversion will exclude the .hglf directory, which contains all the "stand-in" files for large files. Note that such a conversion will also destroy all commits that only changed largefiles along with their commit messages (since they become empty commits).

            \n

            You can also use hg convert with an appropriate --filemap after hg lfconvert --to-normal to selectively delete only some large files.

            \n soup wrap:

            Simply use

            hg lfconvert --to-normal  
            

            This will convert the repository in directory to a repository in directory with all large files turned back into normal files. Revision hashes will change, but otherwise, the revision history should remain intact.

            If you actually want to first strip all large files from the repository and lose all information association associated with them (i.e. if your intent is to destroy the large files rather than keep them), first run:

            hg convert --filemap   
            

            where is the path to a file containing the single line:

            exclude .hglf
            

            and is the original repository and the target directory for the conversion.

            This conversion will exclude the .hglf directory, which contains all the "stand-in" files for large files. Note that such a conversion will also destroy all commits that only changed largefiles along with their commit messages (since they become empty commits).

            You can also use hg convert with an appropriate --filemap after hg lfconvert --to-normal to selectively delete only some large files.

            qid & accept id: (30443894, 30563505) query: Persist and fetch data in with block soup:

            This behavior is only supported in Python 3.5+, via asynchronous context managers (__aenter__/__aexit__), and async with, both of which were added in PEP 492:

            \n
            class TestRepository:\n   # All your normal methods go here\n\n   async def __aenter__(self):\n      # You can call coroutines here\n      await self.some_init()\n\n   async def __aexit__(self, exc_type, exc, tb):\n      # You can call coroutines here\n      await self.do_persistence()\n      await self.fetch_data()\n\n\nasync def do_work():\n    test_repo = TestRepository()\n\n    async with test_repo:\n        res = await test_repo.get_by_lim_off(\n                page_size=int(length),\n                offset=start,\n                customer_name=customer_name,\n                customer_phone=customer_phone,\n                return_type=return_type\n            )\n\n asyncio.get_event_loop().run_until_complete(do_work())\n
            \n

            Prior to 3.5, you have to use a try/finally block with explicit calls to the init/cleanup coroutines, unfortunately:

            \n
            @asyncio.coroutine\ndef do_work():\n    test_repo = TestRepository()\n\n    yield from test_repo.some_init()\n    try:\n        res = yield from test_repo.get_by_lim_off(\n                page_size=int(length),\n                offset=start,\n                customer_name=customer_name,\n                customer_phone=customer_phone,\n                return_type=return_type\n            )\n    finally:\n        yield from test_repo.do_persistence()\n        yield from test_repo.fetch_data()\n
            \n soup wrap:

            This behavior is only supported in Python 3.5+, via asynchronous context managers (__aenter__/__aexit__), and async with, both of which were added in PEP 492:

            class TestRepository:
               # All your normal methods go here
            
               async def __aenter__(self):
                  # You can call coroutines here
                  await self.some_init()
            
               async def __aexit__(self, exc_type, exc, tb):
                  # You can call coroutines here
                  await self.do_persistence()
                  await self.fetch_data()
            
            
            async def do_work():
                test_repo = TestRepository()
            
                async with test_repo:
                    res = await test_repo.get_by_lim_off(
                            page_size=int(length),
                            offset=start,
                            customer_name=customer_name,
                            customer_phone=customer_phone,
                            return_type=return_type
                        )
            
             asyncio.get_event_loop().run_until_complete(do_work())
            

            Prior to 3.5, you have to use a try/finally block with explicit calls to the init/cleanup coroutines, unfortunately:

            @asyncio.coroutine
            def do_work():
                test_repo = TestRepository()
            
                yield from test_repo.some_init()
                try:
                    res = yield from test_repo.get_by_lim_off(
                            page_size=int(length),
                            offset=start,
                            customer_name=customer_name,
                            customer_phone=customer_phone,
                            return_type=return_type
                        )
                finally:
                    yield from test_repo.do_persistence()
                    yield from test_repo.fetch_data()
            
            qid & accept id: (30447975, 30448211) query: Finding if a number is a perfect square soup:

            For very large numbers it's better to avoid using floating point square roots altogether because you will run into too many precision issues and you can't even guarantee that you will be within 1 integer value of the correct answer. Fortunately Python natively supports integers of arbitrary size, so you can write an integer square root checking function, like this:

            \n
            def isSquare(x):\n    if x == 1:\n        return True\n    low = 0\n    high = x // 2\n    root = high\n    while root * root != x:\n       root = (low + high) // 2\n       if low + 1 >= high:\n          return False\n       if root * root > x:\n          high = root\n       else:\n          low = root\n    return True\n
            \n

            Then you can run through the integers from 0 to 100 like this:

            \n
            n = 0\nwhile n <= 100:\n    x = math.factorial(n) + 1\n    if isSquare(x):\n        print n\n    n = n + 1\n
            \n soup wrap:

            For very large numbers it's better to avoid using floating point square roots altogether because you will run into too many precision issues and you can't even guarantee that you will be within 1 integer value of the correct answer. Fortunately Python natively supports integers of arbitrary size, so you can write an integer square root checking function, like this:

            def isSquare(x):
                if x == 1:
                    return True
                low = 0
                high = x // 2
                root = high
                while root * root != x:
                   root = (low + high) // 2
                   if low + 1 >= high:
                      return False
                   if root * root > x:
                      high = root
                   else:
                      low = root
                return True
            

            Then you can run through the integers from 0 to 100 like this:

            n = 0
            while n <= 100:
                x = math.factorial(n) + 1
                if isSquare(x):
                    print n
                n = n + 1
            
            qid & accept id: (30464454, 30464847) query: How to apply group by on data frame with neglecting NaN values in Pandas? soup:

            Guys I have found the solution myself, in case anyone else gets stuck, it would be helpful.

            \n
            df = pd.DataFrame(columns=['xAxis', 'yAxis1', 'yAxis2'])\ndf['xAxis'] = pd.to_datetime(weather['Date'])\ndf['yAxis1'] = weather_stn1['Tavg']\ndf['yAxis2'] = weather_stn2['Tavg']\n\nplot_df = plot_df.groupby(plot_df['xAxis']).mean()\n\nprint plot_df.reset_index()\n
            \n

            Now my output is as:

            \n
                     xAxis  yAxis1  yAxis2\n0   2009-05-01      53      55\n1   2009-05-02      55      55\n2   2009-05-03      57      58\n3   2009-05-04      57      60\n4   2009-05-05      60      62\n5   2009-05-06      63      66\n
            \n

            That simple it was!

            \n soup wrap:

            Guys I have found the solution myself, in case anyone else gets stuck, it would be helpful.

            df = pd.DataFrame(columns=['xAxis', 'yAxis1', 'yAxis2'])
            df['xAxis'] = pd.to_datetime(weather['Date'])
            df['yAxis1'] = weather_stn1['Tavg']
            df['yAxis2'] = weather_stn2['Tavg']
            
            plot_df = plot_df.groupby(plot_df['xAxis']).mean()
            
            print plot_df.reset_index()
            

            Now my output is as:

                     xAxis  yAxis1  yAxis2
            0   2009-05-01      53      55
            1   2009-05-02      55      55
            2   2009-05-03      57      58
            3   2009-05-04      57      60
            4   2009-05-05      60      62
            5   2009-05-06      63      66
            

            That simple it was!

            qid & accept id: (30475558, 30475725) query: Padding or truncating a Python list soup:

            Slicing using an index greater than the length of a list just returns the entire list.

            \n

            Multiplying a list by a negative value returns an empty list.

            \n

            That means the function can be written as:

            \n
            def trp(l, n):\n    return l[:n] + [0]*(n-len(l))\n\ntrp([], 4)\n[0, 0, 0, 0]\n\ntrp([1,2,3,4], 4)\n[1, 2, 3, 4]\n\ntrp([1,2,3,4,5], 4)\n[1, 2, 3, 4]\n\ntrp([1,2,3], 4)\n[1, 2, 3, 0]\n
            \n
            \n
            In [1]: a = [1,2,3]\n\nIn [2]: a[:4]\nOut[2]: [1, 2, 3]\n\nIn [3]: [0]*0\nOut[3]: []\n\nIn [4]: [0]*-1\nOut[4]: []\n
            \n soup wrap:

            Slicing using an index greater than the length of a list just returns the entire list.

            Multiplying a list by a negative value returns an empty list.

            That means the function can be written as:

            def trp(l, n):
                return l[:n] + [0]*(n-len(l))
            
            trp([], 4)
            [0, 0, 0, 0]
            
            trp([1,2,3,4], 4)
            [1, 2, 3, 4]
            
            trp([1,2,3,4,5], 4)
            [1, 2, 3, 4]
            
            trp([1,2,3], 4)
            [1, 2, 3, 0]
            

            In [1]: a = [1,2,3]
            
            In [2]: a[:4]
            Out[2]: [1, 2, 3]
            
            In [3]: [0]*0
            Out[3]: []
            
            In [4]: [0]*-1
            Out[4]: []
            
            qid & accept id: (30506746, 30506807) query: Use regex backreferences to create array soup:

            You can use re.findall() function within a list comprehension :

            \n
            import re\n[re.findall(r'^\t(.*)\t.*: (.*)$',i) for i in my_list]\n
            \n

            For example :

            \n
            >>> my_list=["\tLocation\tNext Available Appointment: Date\n","\tLocation2\tNext Available Appointment: Date2\n"]\n>>> [re.findall(r'^\t(.*)\t.*: (.*)$',i) for i in my_list]\n[[('Location', 'Date')], [('Location2', 'Date2')]]\n
            \n

            You can also use re.search() with groups() method :

            \n
            >>> [re.search(r'^\t(.*)\t.*: (.*)$',i).groups() for i in my_list]\n[('Location', 'Date'), ('Location2', 'Date2')]\n
            \n

            Note that the advantage of re.search here is that you'll get a list of tuples instead of list of list of tuples (with findall()).

            \n soup wrap:

            You can use re.findall() function within a list comprehension :

            import re
            [re.findall(r'^\t(.*)\t.*: (.*)$',i) for i in my_list]
            

            For example :

            >>> my_list=["\tLocation\tNext Available Appointment: Date\n","\tLocation2\tNext Available Appointment: Date2\n"]
            >>> [re.findall(r'^\t(.*)\t.*: (.*)$',i) for i in my_list]
            [[('Location', 'Date')], [('Location2', 'Date2')]]
            

            You can also use re.search() with groups() method :

            >>> [re.search(r'^\t(.*)\t.*: (.*)$',i).groups() for i in my_list]
            [('Location', 'Date'), ('Location2', 'Date2')]
            

            Note that the advantage of re.search here is that you'll get a list of tuples instead of list of list of tuples (with findall()).

            qid & accept id: (30507442, 30508070) query: Pandas: add dataframes to dataframe - match on index and column value soup:

            If your DataFrames look like this:

            \n
            import datetime as DT\nimport numpy as np\nimport pandas as pd\n\ndf1 = pd.DataFrame({'id':[1,2,1,2], 'value1':[13,14,15,16]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-1', '2015-5-2', '2015-5-2']))\ndf2 = pd.DataFrame({'id':[1,1], 'value2':[4,5]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-2']))\ndf3 = pd.DataFrame({'id':[2,2], 'value2':[7,8]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-2']))\n
            \n

            you could concatenate all the DataFrames:

            \n
            df = pd.concat([df1,df2,df3])\n#             id  value1  value2\n# 2015-05-01   1      13     NaN\n# 2015-05-01   2      14     NaN\n# 2015-05-02   1      15     NaN\n# 2015-05-02   2      16     NaN\n# 2015-05-01   1     NaN       4\n# 2015-05-02   1     NaN       5\n# 2015-05-01   2     NaN       7\n# 2015-05-02   2     NaN       8\n
            \n

            Since the result is being aligned on both the date and the id, it's natural to set id as an index. Then if we stack the DataFrame we get this Series:

            \n
            series = df.set_index(['id'], append=True).stack()\n#             id        \n# 2015-05-01  1   value1    13\n#             2   value1    14\n# 2015-05-02  1   value1    15\n#             2   value1    16\n# 2015-05-01  1   value2     4\n# 2015-05-02  1   value2     5\n# 2015-05-01  2   value2     7\n# 2015-05-02  2   value2     8\n# dtype: float64\n
            \n

            Now if we turn around and unstack the Series, the values are aligned based on the remaining index -- the date and the id:

            \n
            result = series.unstack()\n
            \n

            yields

            \n
                           value1  value2\n           id                \n2015-05-01 1       13       4\n           2       14       7\n2015-05-02 1       15       5\n           2       16       8\n
            \n

            Note that unstack() requires that the remaining index is unique. That means\nthat there are no duplicate (date, id) entries. If there are duplicate entries, then its not clear what the desired output should be. One way to address the issue would be to group by the date and id and aggregate the values. Another option would be to pick one value and drop the others.

            \n soup wrap:

            If your DataFrames look like this:

            import datetime as DT
            import numpy as np
            import pandas as pd
            
            df1 = pd.DataFrame({'id':[1,2,1,2], 'value1':[13,14,15,16]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-1', '2015-5-2', '2015-5-2']))
            df2 = pd.DataFrame({'id':[1,1], 'value2':[4,5]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-2']))
            df3 = pd.DataFrame({'id':[2,2], 'value2':[7,8]}, index=pd.DatetimeIndex(['2015-5-1', '2015-5-2']))
            

            you could concatenate all the DataFrames:

            df = pd.concat([df1,df2,df3])
            #             id  value1  value2
            # 2015-05-01   1      13     NaN
            # 2015-05-01   2      14     NaN
            # 2015-05-02   1      15     NaN
            # 2015-05-02   2      16     NaN
            # 2015-05-01   1     NaN       4
            # 2015-05-02   1     NaN       5
            # 2015-05-01   2     NaN       7
            # 2015-05-02   2     NaN       8
            

            Since the result is being aligned on both the date and the id, it's natural to set id as an index. Then if we stack the DataFrame we get this Series:

            series = df.set_index(['id'], append=True).stack()
            #             id        
            # 2015-05-01  1   value1    13
            #             2   value1    14
            # 2015-05-02  1   value1    15
            #             2   value1    16
            # 2015-05-01  1   value2     4
            # 2015-05-02  1   value2     5
            # 2015-05-01  2   value2     7
            # 2015-05-02  2   value2     8
            # dtype: float64
            

            Now if we turn around and unstack the Series, the values are aligned based on the remaining index -- the date and the id:

            result = series.unstack()
            

            yields

                           value1  value2
                       id                
            2015-05-01 1       13       4
                       2       14       7
            2015-05-02 1       15       5
                       2       16       8
            

            Note that unstack() requires that the remaining index is unique. That means that there are no duplicate (date, id) entries. If there are duplicate entries, then its not clear what the desired output should be. One way to address the issue would be to group by the date and id and aggregate the values. Another option would be to pick one value and drop the others.

            qid & accept id: (30515888, 30516337) query: Align LaTeX math text in matplotlib text box soup:

            You can use the eqnarray enivironment as described here.

            \n
            import matplotlib.pyplot as plt\nimport matplotlib.gridspec as gridspec\nimport matplotlib.offsetbox as offsetbox\nfrom matplotlib import rc\n\nrc('text', usetex=True)\n\n# Figure top-level container. Weird size is because\n# this is part of a larger code.\nfig = plt.figure(figsize=(30, 25))\ngs = gridspec.GridSpec(10, 12)\nax_t = plt.subplot(gs[4:6, 10:12])\n\n# Some mock values.\ncp_r = [0.001, 8.3, 0.18, 15.2, 5000, 0.3]\ncp_e = [0.0005, 0.2, 0.11, 0.3, 200, 0.1]\n\n# Remove axis from frame.\nax_t.axis('off')\n\n# Text lines.\ntext1 = r'\begin{eqnarray*} '\ntext2 = r'y &=& ' + str(cp_r[0]) + '\pm ' + str(cp_e[0]) + '\\\\'\ntext3 = r'\log(ret) &=& ' + str(cp_r[1]) + '\pm ' + str(cp_e[1]) + '\\\\'\ntext4 = r'A_{{(B-C)}} &=& ' + str(cp_r[2]) + '\pm ' + str(cp_e[2]) + '\\\\'\ntext5 = r'(n-N)_o &=& ' + str(cp_r[3]) + '\pm ' + str(cp_e[3]) + '\\\\'\ntext6 = r'K_{{\odot}} &=& ' + str(cp_r[4]) + '\pm ' + str(cp_e[4]) + '\\\\'\ntext7 = r'd_{{frac}} &=& ' + str(cp_r[5]) + '\pm ' + str(cp_e[5])\ntext8 = r'\end{eqnarray*}'\ntext = text1 + text2 + text3 + text4 + text5 + text6 + text7 + text8\n\n# Draw text box.\nob = offsetbox.AnchoredText(text, pad=1, loc=6, prop=dict(size=13))\nob.patch.set(alpha=0.85)\nax_t.add_artist(ob)\n\nplt.savefig('out.png', dpi=300)\n
            \n
            \n

            Alternative solution using align environment:

            \n
            import matplotlib.pyplot as plt\nimport matplotlib.gridspec as gridspec\nimport matplotlib.offsetbox as offsetbox\ncustom_preamble = {\n    "text.usetex": True,\n    "text.latex.preamble": [\n        r"\usepackage{amsmath}", # for the align enivironment\n        ],\n    }\nplt.rcParams.update(custom_preamble)\n\n# Figure top-level container. Weird size is because\n# this is part of a larger code.\nfig = plt.figure(figsize=(30, 25))\ngs = gridspec.GridSpec(10, 12)\nax_t = plt.subplot(gs[4:6, 10:12])\n\n# Some mock values.\ncp_r = [0.001, 8.3, 0.18, 15.2, 5000, 0.3]\ncp_e = [0.0005, 0.2, 0.11, 0.3, 200, 0.1]\n\n# Remove axis from frame.\nax_t.axis('off')\n\n# Text lines.\ntext1 = r'\begin{align*} '\ntext2 = r'y &= ' + str(cp_r[0]) + '\pm ' + str(cp_e[0]) + '\\\\'\ntext3 = r'\log(ret) &= ' + str(cp_r[1]) + '\pm ' + str(cp_e[1]) + '\\\\'\ntext4 = r'A_{{(B-C)}} &= ' + str(cp_r[2]) + '\pm ' + str(cp_e[2]) + '\\\\'\ntext5 = r'(n-N)_o &= ' + str(cp_r[3]) + '\pm ' + str(cp_e[3]) + '\\\\'\ntext6 = r'K_{{\odot}} &= ' + str(cp_r[4]) + '\pm ' + str(cp_e[4]) + '\\\\'\ntext7 = r'd_{{frac}} &= ' + str(cp_r[5]) + '\pm ' + str(cp_e[5])\ntext8 = r'\end{align*}'\ntext = text1 + text2 + text3 + text4 + text5 + text6 + text7 + text8\n\n# Draw text box.\nob = offsetbox.AnchoredText(text, pad=1, loc=6, prop=dict(size=13))\nob.patch.set(alpha=0.85)\nax_t.add_artist(ob)\n\nplt.savefig('out.png', dpi=300)\n
            \n soup wrap:

            You can use the eqnarray enivironment as described here.

            import matplotlib.pyplot as plt
            import matplotlib.gridspec as gridspec
            import matplotlib.offsetbox as offsetbox
            from matplotlib import rc
            
            rc('text', usetex=True)
            
            # Figure top-level container. Weird size is because
            # this is part of a larger code.
            fig = plt.figure(figsize=(30, 25))
            gs = gridspec.GridSpec(10, 12)
            ax_t = plt.subplot(gs[4:6, 10:12])
            
            # Some mock values.
            cp_r = [0.001, 8.3, 0.18, 15.2, 5000, 0.3]
            cp_e = [0.0005, 0.2, 0.11, 0.3, 200, 0.1]
            
            # Remove axis from frame.
            ax_t.axis('off')
            
            # Text lines.
            text1 = r'\begin{eqnarray*} '
            text2 = r'y &=& ' + str(cp_r[0]) + '\pm ' + str(cp_e[0]) + '\\\\'
            text3 = r'\log(ret) &=& ' + str(cp_r[1]) + '\pm ' + str(cp_e[1]) + '\\\\'
            text4 = r'A_{{(B-C)}} &=& ' + str(cp_r[2]) + '\pm ' + str(cp_e[2]) + '\\\\'
            text5 = r'(n-N)_o &=& ' + str(cp_r[3]) + '\pm ' + str(cp_e[3]) + '\\\\'
            text6 = r'K_{{\odot}} &=& ' + str(cp_r[4]) + '\pm ' + str(cp_e[4]) + '\\\\'
            text7 = r'd_{{frac}} &=& ' + str(cp_r[5]) + '\pm ' + str(cp_e[5])
            text8 = r'\end{eqnarray*}'
            text = text1 + text2 + text3 + text4 + text5 + text6 + text7 + text8
            
            # Draw text box.
            ob = offsetbox.AnchoredText(text, pad=1, loc=6, prop=dict(size=13))
            ob.patch.set(alpha=0.85)
            ax_t.add_artist(ob)
            
            plt.savefig('out.png', dpi=300)
            

            Alternative solution using align environment:

            import matplotlib.pyplot as plt
            import matplotlib.gridspec as gridspec
            import matplotlib.offsetbox as offsetbox
            custom_preamble = {
                "text.usetex": True,
                "text.latex.preamble": [
                    r"\usepackage{amsmath}", # for the align enivironment
                    ],
                }
            plt.rcParams.update(custom_preamble)
            
            # Figure top-level container. Weird size is because
            # this is part of a larger code.
            fig = plt.figure(figsize=(30, 25))
            gs = gridspec.GridSpec(10, 12)
            ax_t = plt.subplot(gs[4:6, 10:12])
            
            # Some mock values.
            cp_r = [0.001, 8.3, 0.18, 15.2, 5000, 0.3]
            cp_e = [0.0005, 0.2, 0.11, 0.3, 200, 0.1]
            
            # Remove axis from frame.
            ax_t.axis('off')
            
            # Text lines.
            text1 = r'\begin{align*} '
            text2 = r'y &= ' + str(cp_r[0]) + '\pm ' + str(cp_e[0]) + '\\\\'
            text3 = r'\log(ret) &= ' + str(cp_r[1]) + '\pm ' + str(cp_e[1]) + '\\\\'
            text4 = r'A_{{(B-C)}} &= ' + str(cp_r[2]) + '\pm ' + str(cp_e[2]) + '\\\\'
            text5 = r'(n-N)_o &= ' + str(cp_r[3]) + '\pm ' + str(cp_e[3]) + '\\\\'
            text6 = r'K_{{\odot}} &= ' + str(cp_r[4]) + '\pm ' + str(cp_e[4]) + '\\\\'
            text7 = r'd_{{frac}} &= ' + str(cp_r[5]) + '\pm ' + str(cp_e[5])
            text8 = r'\end{align*}'
            text = text1 + text2 + text3 + text4 + text5 + text6 + text7 + text8
            
            # Draw text box.
            ob = offsetbox.AnchoredText(text, pad=1, loc=6, prop=dict(size=13))
            ob.patch.set(alpha=0.85)
            ax_t.add_artist(ob)
            
            plt.savefig('out.png', dpi=300)
            
            qid & accept id: (30528673, 30529743) query: Merge CSVs using Python (or Bash) soup:

            In python, you can use the pandas module that allows to fill a dataframe from a csv, merge dataframe and then save the merged dataframe into new csv file.

            \n

            For example :

            \n
            import pandas as pd\ndf1 = pd.DataFrame.from_csv("file1.csv", sep=",")\ndf2 = pd.DataFrame.from_csv("file2.csv", sep=",")\nfinal_df = df1.reset_index().merge(df2.reset_index(), how="outer").set_index('ID')\n\nfinal_df.to_csv("result.csv", sep=",")\n
            \n

            which would produce

            \n
            ID,Name,ContactNo,Designation\n53,Vikas,9874563210.0, \n23,MyShore,,Software Engineer \n
            \n

            You would have to play with the sep argument to adapt to your files format.

            \n soup wrap:

            In python, you can use the pandas module that allows to fill a dataframe from a csv, merge dataframe and then save the merged dataframe into new csv file.

            For example :

            import pandas as pd
            df1 = pd.DataFrame.from_csv("file1.csv", sep=",")
            df2 = pd.DataFrame.from_csv("file2.csv", sep=",")
            final_df = df1.reset_index().merge(df2.reset_index(), how="outer").set_index('ID')
            
            final_df.to_csv("result.csv", sep=",")
            

            which would produce

            ID,Name,ContactNo,Designation
            53,Vikas,9874563210.0, 
            23,MyShore,,Software Engineer 
            

            You would have to play with the sep argument to adapt to your files format.

            qid & accept id: (30559807, 30559909) query: Python - regex to match url with mongo object id soup:

            This will do (considering 24 hex chars), using raw keyword before string so no need to escape with double slashes:

            \n
            r'\/api\/v1\/users\/([a-f\d]{24})\/submissions'\n
            \n

            Python console:

            \n
            >>> re.findall(r'\/api\/v1\/users\/([a-f\d]{24})\/submissions','/api/v1/users/556b352f87d4693546d31185/submissions')\n['556b352f87d4693546d31185']\n
            \n soup wrap:

            This will do (considering 24 hex chars), using raw keyword before string so no need to escape with double slashes:

            r'\/api\/v1\/users\/([a-f\d]{24})\/submissions'
            

            Python console:

            >>> re.findall(r'\/api\/v1\/users\/([a-f\d]{24})\/submissions','/api/v1/users/556b352f87d4693546d31185/submissions')
            ['556b352f87d4693546d31185']
            
            qid & accept id: (30578068, 30599392) query: Pygame draw anti-aliased thick line soup:

            After many trials and errors, the optimal way to do it would be the following:

            \n

            1) First, we define the center of the shape given the X0_{x,y} start and X1_{x,y} end points of the line.

            \n
            center_L1 = (X0 + X1) / 2.\n
            \n

            2) Then find the slope (angle) of the line.

            \n
            length = 10 # Line size\nthickness = 2\nangle = math.atan2(X0[1] - X1[1], X0[0] - X1[0])\n
            \n

            3) Using the slope and the shape parameters you can calculate the following coordinates of the box ends.

            \n
            UL = (center_L1[0] + (length / 2.) * cos(angle) - (thickness / 2.) * sin(angle),\n      center_L1[1] + (thickness / 2.) * cos(angle) + (length / 2.) * sin(angle))\nUR = (center_L1[0] - (length / 2.) * cos(angle) - (thickness / 2.) * sin(angle),\n      center_L1[1] + (thickness / 2.) * cos(angle) - (length / 2.) * sin(angle))\nBL = (center_L1[0] + (length / 2.) * cos(angle) + (thickness / 2.) * sin(angle),\n      center_L1[1] - (thickness / 2.) * cos(angle) + (length / 2.) * sin(angle))\nBR = (center_L1[0] - (length / 2.) * cos(angle) + (thickness / 2.) * sin(angle),\n      center_L1[1] - (thickness / 2.) * cos(angle) - (length / 2.) * sin(angle))\n
            \n

            4) Using the computed coordinates we draw an anti-aliased polygon (thanks to @martineau) and then fill it as suggested on the gfxdraw website.

            \n
            pygame.gfxdraw.aapolygon(window, (UL, UR, BR, BL), color_L1)\npygame.gfxdraw.filled_polygon(window, (UL, UR, BR, BL), color_L1)\n
            \n soup wrap:

            After many trials and errors, the optimal way to do it would be the following:

            1) First, we define the center of the shape given the X0_{x,y} start and X1_{x,y} end points of the line.

            center_L1 = (X0 + X1) / 2.
            

            2) Then find the slope (angle) of the line.

            length = 10 # Line size
            thickness = 2
            angle = math.atan2(X0[1] - X1[1], X0[0] - X1[0])
            

            3) Using the slope and the shape parameters you can calculate the following coordinates of the box ends.

            UL = (center_L1[0] + (length / 2.) * cos(angle) - (thickness / 2.) * sin(angle),
                  center_L1[1] + (thickness / 2.) * cos(angle) + (length / 2.) * sin(angle))
            UR = (center_L1[0] - (length / 2.) * cos(angle) - (thickness / 2.) * sin(angle),
                  center_L1[1] + (thickness / 2.) * cos(angle) - (length / 2.) * sin(angle))
            BL = (center_L1[0] + (length / 2.) * cos(angle) + (thickness / 2.) * sin(angle),
                  center_L1[1] - (thickness / 2.) * cos(angle) + (length / 2.) * sin(angle))
            BR = (center_L1[0] - (length / 2.) * cos(angle) + (thickness / 2.) * sin(angle),
                  center_L1[1] - (thickness / 2.) * cos(angle) - (length / 2.) * sin(angle))
            

            4) Using the computed coordinates we draw an anti-aliased polygon (thanks to @martineau) and then fill it as suggested on the gfxdraw website.

            pygame.gfxdraw.aapolygon(window, (UL, UR, BR, BL), color_L1)
            pygame.gfxdraw.filled_polygon(window, (UL, UR, BR, BL), color_L1)
            
            qid & accept id: (30602088, 31006577) query: Calling C++ class functions from Ruby/Python soup:

            You can use cython or Boost.Python to call native code from python. Since you are using c++, i'd recommend looking into Boost.Python which offers a very natural way of wrapping c++ classes for python.

            \n

            As an example (close to what you provided), consider the following class definitions

            \n
            class Bar\n{\nprivate:\n    int value;\n\npublic:\n    Bar() : value(42){ }\n\n    //Functions to expose to Python:\n    int getValue() const { return value; }\n    void setValue(int newValue) { value = newValue; }\n};\n\nclass Foo\n{\nprivate:\n    //Integer Vector:\n    std::vector fooVector;\n    Bar bar;\n\npublic:\n    //Functions to expose to Python:\n    void pushBack(const int& newInt) { fooVector.push_back(newInt); }\n    int getInt(const int& element) { return fooVector.at(element); }\n    Bar& getBar() { return bar; }\n};\n\ndouble compute() { return 18.3; }\n
            \n

            This can be wrapped to python using Boost.Python

            \n
            #include \nBOOST_PYTHON_MODULE(MyLibrary) {\n    using namespace boost::python;\n\n    class_("Foo", init<>())\n        .def("pushBack", &Foo::pushBack, (arg("newInt")))\n        .def("getInt", &Foo::getInt, (arg("element")))\n        .def("getBar", &Foo::getBar, return_value_policy())\n    ;\n\n    class_("Bar", init<>())\n        .def("getValue", &Bar::getValue)\n        .def("setValue", &Bar::setValue, (arg("newValue")))\n    ;\n\n    def("compute", compute);\n}\n
            \n

            This code can be compiled to a static library MyLibrary.pyd and used like this

            \n
            import MyLibrary\n\nfoo = MyLibrary.Foo()\nfoo.pushBack(10);\nfoo.pushBack(20);\nfoo.pushBack(30);\nprint(foo.getInt(0)) # 10\nprint(foo.getInt(1)) # 20\nprint(foo.getInt(2)) # 30\n\nbar = foo.getBar()\nprint(bar.getValue()) # 42\nbar.setValue(17)\nprint(foo.getBar().getValue()) #17\n\nprint(MyLibrary.compute()) # 18.3\n
            \n soup wrap:

            You can use cython or Boost.Python to call native code from python. Since you are using c++, i'd recommend looking into Boost.Python which offers a very natural way of wrapping c++ classes for python.

            As an example (close to what you provided), consider the following class definitions

            class Bar
            {
            private:
                int value;
            
            public:
                Bar() : value(42){ }
            
                //Functions to expose to Python:
                int getValue() const { return value; }
                void setValue(int newValue) { value = newValue; }
            };
            
            class Foo
            {
            private:
                //Integer Vector:
                std::vector fooVector;
                Bar bar;
            
            public:
                //Functions to expose to Python:
                void pushBack(const int& newInt) { fooVector.push_back(newInt); }
                int getInt(const int& element) { return fooVector.at(element); }
                Bar& getBar() { return bar; }
            };
            
            double compute() { return 18.3; }
            

            This can be wrapped to python using Boost.Python

            #include 
            BOOST_PYTHON_MODULE(MyLibrary) {
                using namespace boost::python;
            
                class_("Foo", init<>())
                    .def("pushBack", &Foo::pushBack, (arg("newInt")))
                    .def("getInt", &Foo::getInt, (arg("element")))
                    .def("getBar", &Foo::getBar, return_value_policy())
                ;
            
                class_("Bar", init<>())
                    .def("getValue", &Bar::getValue)
                    .def("setValue", &Bar::setValue, (arg("newValue")))
                ;
            
                def("compute", compute);
            }
            

            This code can be compiled to a static library MyLibrary.pyd and used like this

            import MyLibrary
            
            foo = MyLibrary.Foo()
            foo.pushBack(10);
            foo.pushBack(20);
            foo.pushBack(30);
            print(foo.getInt(0)) # 10
            print(foo.getInt(1)) # 20
            print(foo.getInt(2)) # 30
            
            bar = foo.getBar()
            print(bar.getValue()) # 42
            bar.setValue(17)
            print(foo.getBar().getValue()) #17
            
            print(MyLibrary.compute()) # 18.3
            
            qid & accept id: (30620595, 30620952) query: Python: obtain multidimensional matrix as results from a function soup:

            Provided your function bla can accept arrays instead of scalars, you could use\nmeshgrid to prepare the inputs so that bla(A, B) returns the desired output:

            \n
            import numpy as np\ndef bla(a, b):\n    f = a + b\n    return f\n\nA, B = np.meshgrid([0.2,0.4], [2,4], sparse=True)\nbla(A, B)\n
            \n

            yields

            \n
            array([[ 2.2,  2.4],\n       [ 4.2,  4.4]])\n
            \n soup wrap:

            Provided your function bla can accept arrays instead of scalars, you could use meshgrid to prepare the inputs so that bla(A, B) returns the desired output:

            import numpy as np
            def bla(a, b):
                f = a + b
                return f
            
            A, B = np.meshgrid([0.2,0.4], [2,4], sparse=True)
            bla(A, B)
            

            yields

            array([[ 2.2,  2.4],
                   [ 4.2,  4.4]])
            
            qid & accept id: (30641097, 30641597) query: how to decrement and increment loop range 'i' variable in the execution of loop in python soup:

            So several things:

            \n
              \n
            • Use a while loop.
            • \n
            • Initialize i = 0
            • \n
            • Increment i += 1 when none of your conditions match
            • \n
            • Comments in Python are written with # comment, not // comment.
            • \n
            \n

            Example:

            \n
            final_result = 0\na = '3 4  4 5 6'\ni = 0\nwhile i < len(a):\n    print('iteration')\n    print('i is = ')\n    print(i)\n    if a[i] is ' ' and a[i + 1] is not ' ':\n        if i - 1 is 0:\n            final_result = int(a[i - 1]) + int(a[i + 1])\n            i += 2  # here goes the increment\n            print('1a- m here')\n            print(final_result)\n            print('i is = ')\n            print(i)\n        else:\n            final_result = final_result + int(a[i + 1])\n            i += 2  # here goes the increment\n            print('1b- m here')\n            print(final_result)\n    elif a[i] is ' ' and a[i + 1] is ' ':\n        if i - 1 is 0:\n            final_result = int(a[i - 1]) - int(a[i + 1])\n            i += 3  # here goes the increment\n            print('2a- m here')\n            print(final_result)\n        else:\n            final_result = final_result - int(a[i + 2])\n            i += 3  # here goes the increment\n            print('2b- m here')\n            print(final_result)\n            print('i is = ')\n            print(i)\n    else:\n        i += 1\nprint(final_result)\n
            \n

            Output:

            \n
            $ python3.4 foo.py\niteration\ni is = \n0\niteration\ni is = \n1\n1a- m here\n7\ni is = \n3\niteration\ni is = \n3\n2b- m here\n3\ni is = \n6\niteration\ni is = \n6\n1b- m here\n8\niteration\ni is = \n8\n1b- m here\n14\n14\n
            \n soup wrap:

            So several things:

            • Use a while loop.
            • Initialize i = 0
            • Increment i += 1 when none of your conditions match
            • Comments in Python are written with # comment, not // comment.

            Example:

            final_result = 0
            a = '3 4  4 5 6'
            i = 0
            while i < len(a):
                print('iteration')
                print('i is = ')
                print(i)
                if a[i] is ' ' and a[i + 1] is not ' ':
                    if i - 1 is 0:
                        final_result = int(a[i - 1]) + int(a[i + 1])
                        i += 2  # here goes the increment
                        print('1a- m here')
                        print(final_result)
                        print('i is = ')
                        print(i)
                    else:
                        final_result = final_result + int(a[i + 1])
                        i += 2  # here goes the increment
                        print('1b- m here')
                        print(final_result)
                elif a[i] is ' ' and a[i + 1] is ' ':
                    if i - 1 is 0:
                        final_result = int(a[i - 1]) - int(a[i + 1])
                        i += 3  # here goes the increment
                        print('2a- m here')
                        print(final_result)
                    else:
                        final_result = final_result - int(a[i + 2])
                        i += 3  # here goes the increment
                        print('2b- m here')
                        print(final_result)
                        print('i is = ')
                        print(i)
                else:
                    i += 1
            print(final_result)
            

            Output:

            $ python3.4 foo.py
            iteration
            i is = 
            0
            iteration
            i is = 
            1
            1a- m here
            7
            i is = 
            3
            iteration
            i is = 
            3
            2b- m here
            3
            i is = 
            6
            iteration
            i is = 
            6
            1b- m here
            8
            iteration
            i is = 
            8
            1b- m here
            14
            14
            
            qid & accept id: (30667382, 30668543) query: modify range in every loop of the range soup:

            Not sure if this is exactly what you want, but it's a start :)

            \n
            from itertools import combinations\n\n# Assume input is a list of strings called input_list\ninput_list = ['OG_1: A|1 A|3 B|1 C|2','OG_2: A|4 B|6','OG_3: C|8 B|9 A|10']\n\n# Create a dict to store relationships and a list to store OGs\nrels = {}\nspecies = set()\n\n# Populate the dict\nfor item in input_list:\n    params = item.split(': ')\n    og = params[0]\n    raw_species = params[1].split()\n    s = [rs.split('|')[0] for rs in raw_species]\n    rels[og] = s\n\n    for item in s:\n        species.add(item)\n\n# Get the possible combinations of species:\ncombos = [c for limit in range(1, len(l)-1) for c in combinations(species,limit)]\n\ndef combo_in_og(combo, og):\n    for item in combo:\n        if item not in rels[og]:\n            return False\n    return True\n\n# Loop over the combinations and print\nfor combo in combos:\n    valid_ogs = []\n    for og in ogs:\n        if combo_in_og(combo, og):\n            valid_ogs.append(og)\n    print('(species) ' + ','.join(combo) + ' (are in groups) ' + ', '.join(valid_ogs))\n
            \n

            Produces:

            \n
            (species) C (are in groups) OG_1, OG_3\n(species) A (are in groups) OG_1, OG_2, OG_3\n(species) B (are in groups) OG_1, OG_2, OG_3\n(species) C,A (are in groups) OG_1, OG_3\n(species) C,B (are in groups) OG_1, OG_3\n(species) A,B (are in groups) OG_1, OG_2, OG_3\n(species) C,A,B (are in groups) OG_1, OG_3\n
            \n

            Just a warning: what you're trying to do will start to take forever with large enough numbers of inputs, as its complexity is 2^N. You can't get around it (that's what the problem demands), but it's there.

            \n soup wrap:

            Not sure if this is exactly what you want, but it's a start :)

            from itertools import combinations
            
            # Assume input is a list of strings called input_list
            input_list = ['OG_1: A|1 A|3 B|1 C|2','OG_2: A|4 B|6','OG_3: C|8 B|9 A|10']
            
            # Create a dict to store relationships and a list to store OGs
            rels = {}
            species = set()
            
            # Populate the dict
            for item in input_list:
                params = item.split(': ')
                og = params[0]
                raw_species = params[1].split()
                s = [rs.split('|')[0] for rs in raw_species]
                rels[og] = s
            
                for item in s:
                    species.add(item)
            
            # Get the possible combinations of species:
            combos = [c for limit in range(1, len(l)-1) for c in combinations(species,limit)]
            
            def combo_in_og(combo, og):
                for item in combo:
                    if item not in rels[og]:
                        return False
                return True
            
            # Loop over the combinations and print
            for combo in combos:
                valid_ogs = []
                for og in ogs:
                    if combo_in_og(combo, og):
                        valid_ogs.append(og)
                print('(species) ' + ','.join(combo) + ' (are in groups) ' + ', '.join(valid_ogs))
            

            Produces:

            (species) C (are in groups) OG_1, OG_3
            (species) A (are in groups) OG_1, OG_2, OG_3
            (species) B (are in groups) OG_1, OG_2, OG_3
            (species) C,A (are in groups) OG_1, OG_3
            (species) C,B (are in groups) OG_1, OG_3
            (species) A,B (are in groups) OG_1, OG_2, OG_3
            (species) C,A,B (are in groups) OG_1, OG_3
            

            Just a warning: what you're trying to do will start to take forever with large enough numbers of inputs, as its complexity is 2^N. You can't get around it (that's what the problem demands), but it's there.

            qid & accept id: (30683301, 30683508) query: How many times is a particular row present? soup:

            This is pretty easy once you figure out what an identical row means. I simply use the hash of the stringifed values. If you have an alternate definition then that would work as well.

            \n
            In [37]: df = DataFrame({'A' : [1,1,1,2,3,3], 'B' : [2,2,2,2,3,3]})\n\nIn [38]: df\nOut[38]: \n   A  B\n0  1  2\n1  1  2\n2  1  2\n3  2  2\n4  3  3\n5  3  3\n
            \n

            Compute a hash for each row. Identical 'rows' yield identical hashes

            \n
            In [39]: hashed = df.apply(lambda x: hash(str(x.values)), axis=1)\n\nIn [40]: hashed\nOut[40]: \n0    4112993419872972622\n1    4112993419872972622\n2    4112993419872972622\n3    7113020419917972579\n4    6113011419891972603\n5    6113011419891972603\ndtype: int64\n
            \n

            Map the value counts back to the original indexes. You can pass take_last=False to .drop_duplicates() if you want the first unique row (rather than the last)

            \n
            In [41]: hashed.drop_duplicates().map(hashed.value_counts())\nOut[41]: \n0    3\n3    1\n4    2\ndtype: int64\n
            \n soup wrap:

            This is pretty easy once you figure out what an identical row means. I simply use the hash of the stringifed values. If you have an alternate definition then that would work as well.

            In [37]: df = DataFrame({'A' : [1,1,1,2,3,3], 'B' : [2,2,2,2,3,3]})
            
            In [38]: df
            Out[38]: 
               A  B
            0  1  2
            1  1  2
            2  1  2
            3  2  2
            4  3  3
            5  3  3
            

            Compute a hash for each row. Identical 'rows' yield identical hashes

            In [39]: hashed = df.apply(lambda x: hash(str(x.values)), axis=1)
            
            In [40]: hashed
            Out[40]: 
            0    4112993419872972622
            1    4112993419872972622
            2    4112993419872972622
            3    7113020419917972579
            4    6113011419891972603
            5    6113011419891972603
            dtype: int64
            

            Map the value counts back to the original indexes. You can pass take_last=False to .drop_duplicates() if you want the first unique row (rather than the last)

            In [41]: hashed.drop_duplicates().map(hashed.value_counts())
            Out[41]: 
            0    3
            3    1
            4    2
            dtype: int64
            
            qid & accept id: (30683325, 30683375) query: How to execute and save result of an OS command to a file soup:

            Use subprocess.check_call redirecting stdout to a file object:

            \n
            from subprocess import check_call, STDOUT, CalledProcessError\n\nwith open("out.txt","w") as f:\n    try:\n        check_call(['ls', '-l'], stdout=f, stderr=STDOUT)\n    except CalledProcessError as e:\n        print(e.message)\n
            \n

            Whatever you what to do when the command returns a non-zero exit status should be handled in the except. If you want a file for stdout and another to handle stderr open two files:

            \n
            from subprocess import check_call, STDOUT, CalledProcessError, call\n\nwith open("stdout.txt","w") as f, open("stderr.txt","w") as f2:\n    try:\n        check_call(['ls', '-l'], stdout=f, stderr=f2)\n    except CalledProcessError as e:\n        print(e.message)\n
            \n soup wrap:

            Use subprocess.check_call redirecting stdout to a file object:

            from subprocess import check_call, STDOUT, CalledProcessError
            
            with open("out.txt","w") as f:
                try:
                    check_call(['ls', '-l'], stdout=f, stderr=STDOUT)
                except CalledProcessError as e:
                    print(e.message)
            

            Whatever you what to do when the command returns a non-zero exit status should be handled in the except. If you want a file for stdout and another to handle stderr open two files:

            from subprocess import check_call, STDOUT, CalledProcessError, call
            
            with open("stdout.txt","w") as f, open("stderr.txt","w") as f2:
                try:
                    check_call(['ls', '-l'], stdout=f, stderr=f2)
                except CalledProcessError as e:
                    print(e.message)
            
            qid & accept id: (30685363, 30686394) query: python csv to dictionary columnwise soup:

            You need to parse the first row, create the columns, and then progress to the rest of the rows.

            \n

            For example:

            \n
            columns = []\nwith open(file,'rU') as f: \n    reader = csv.reader(f)\n    for row in reader:\n        if columns:\n            for i, value in enumerate(row):\n                columns[i].append(value)\n        else:\n            # first row\n            columns = [[value] for value in row]\n# you now have a column-major 2D array of your file.\nas_dict = {c[0] : c[1:] for c in columns}\nprint(as_dict)\n
            \n

            output:

            \n
            {\n    ' numbers': [' 1', ' 2', ' 3', ' 4'], \n    ' colors ': [' blue', ' red', ' green', ' yellow'],\n    'strings': ['string1', 'string2', 'string3', 'string4']\n}\n
            \n

            (some weird spaces, which were in your input "file". Remove spaces before/after commas, or use value.strip() if they're in your real input.)

            \n soup wrap:

            You need to parse the first row, create the columns, and then progress to the rest of the rows.

            For example:

            columns = []
            with open(file,'rU') as f: 
                reader = csv.reader(f)
                for row in reader:
                    if columns:
                        for i, value in enumerate(row):
                            columns[i].append(value)
                    else:
                        # first row
                        columns = [[value] for value in row]
            # you now have a column-major 2D array of your file.
            as_dict = {c[0] : c[1:] for c in columns}
            print(as_dict)
            

            output:

            {
                ' numbers': [' 1', ' 2', ' 3', ' 4'], 
                ' colors ': [' blue', ' red', ' green', ' yellow'],
                'strings': ['string1', 'string2', 'string3', 'string4']
            }
            

            (some weird spaces, which were in your input "file". Remove spaces before/after commas, or use value.strip() if they're in your real input.)

            qid & accept id: (30701329, 30701370) query: How to count occurrences of specific element for arrays in a list? soup:

            One way is to use Counter

            \n
            In [3]: from collections import Counter\n
            \n

            Gives frequencies of all numbers

            \n
            In [4]: [Counter(x) for x in a]\nOut[4]: [Counter({2: 3, 1: 1}), Counter({1: 1, 3: 1})]\n
            \n

            To get count for only 2

            \n
            In [5]: [Counter(x)[2] for x in a]\nOut[5]: [3, 0]\n
            \n

            Alternatively, you could use np.bincount method, to count number of occurrences of each value in array of non-negative ints.

            \n
            In [6]: [np.bincount(x) for x in a]\nOut[6]: [array([0, 1, 3], dtype=int64), array([0, 1, 0, 1], dtype=int64)]\n
            \n

            Extract counts for number 2

            \n
            In [7]: [np.bincount(x)[2] for x in a]\nOut[7]: [3, 0]\n
            \n soup wrap:

            One way is to use Counter

            In [3]: from collections import Counter
            

            Gives frequencies of all numbers

            In [4]: [Counter(x) for x in a]
            Out[4]: [Counter({2: 3, 1: 1}), Counter({1: 1, 3: 1})]
            

            To get count for only 2

            In [5]: [Counter(x)[2] for x in a]
            Out[5]: [3, 0]
            

            Alternatively, you could use np.bincount method, to count number of occurrences of each value in array of non-negative ints.

            In [6]: [np.bincount(x) for x in a]
            Out[6]: [array([0, 1, 3], dtype=int64), array([0, 1, 0, 1], dtype=int64)]
            

            Extract counts for number 2

            In [7]: [np.bincount(x)[2] for x in a]
            Out[7]: [3, 0]
            
            qid & accept id: (30727894, 30728974) query: Using selenium at hosted app? soup:

            If you tied up to Firefox or any other browser "with a head", the common approach is to start a "Virtual Display" with the help of PyVirtualDisplay which is a wrapper around Xvfb, Xephyr and Xvnc, see this answer for an example working code.

            \n
            \n

            Another option would be to use a "headless" browser, such as PhantomJS. In this case, the change is usually very simple, replacing:

            \n
            firefox = webdriver.Firefox()\n
            \n

            with:

            \n
            driver = webdriver.PhantomJS()\n
            \n

            Assuming you have PhantomJS installed.

            \n

            Demo:

            \n
            >>> from selenium import webdriver\n>>> driver = webdriver.PhantomJS()\n>>> driver.get("http://www.hltv.org/match/2296366-gplay-gamers2-acer-predator-masters-powered-by-intel")\n>>> driver.title\nu'HLTV.org - Hot Match: GPlay vs Gamers2'\n
            \n
            \n

            The third option (mine most favorite) would be use a remote selenium server, either your own, or provided by third-party services like BrowserStack or Sauce Labs. Example code:

            \n
            from selenium import webdriver\nfrom selenium.webdriver.common.keys import Keys\nfrom selenium.webdriver.common.desired_capabilities import DesiredCapabilities\n\ndesired_cap = {'os': 'Windows', 'os_version': 'xp', 'browser': 'IE', 'browser_version': '7.0' }\n\ndriver = webdriver.Remote(\n    command_executor='http://username:key@hub.browserstack.com:80/wd/hub',\n    desired_capabilities=desired_cap)\n\ndriver.get("http://www.google.com")\nif not "Google" in driver.title:\n    raise Exception("Unable to load google page!")\nelem = driver.find_element_by_name("q")\nelem.send_keys("BrowerStack")\nelem.submit()\nprint driver.title\ndriver.quit()\n
            \n

            In case of BrowserStack or Sauce Labs you have an enormous amount of browsers and operating systems to choose from. Note that these are not free services and you would need a username and a key for this code to work.

            \n soup wrap:

            If you tied up to Firefox or any other browser "with a head", the common approach is to start a "Virtual Display" with the help of PyVirtualDisplay which is a wrapper around Xvfb, Xephyr and Xvnc, see this answer for an example working code.


            Another option would be to use a "headless" browser, such as PhantomJS. In this case, the change is usually very simple, replacing:

            firefox = webdriver.Firefox()
            

            with:

            driver = webdriver.PhantomJS()
            

            Assuming you have PhantomJS installed.

            Demo:

            >>> from selenium import webdriver
            >>> driver = webdriver.PhantomJS()
            >>> driver.get("http://www.hltv.org/match/2296366-gplay-gamers2-acer-predator-masters-powered-by-intel")
            >>> driver.title
            u'HLTV.org - Hot Match: GPlay vs Gamers2'
            

            The third option (mine most favorite) would be use a remote selenium server, either your own, or provided by third-party services like BrowserStack or Sauce Labs. Example code:

            from selenium import webdriver
            from selenium.webdriver.common.keys import Keys
            from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
            
            desired_cap = {'os': 'Windows', 'os_version': 'xp', 'browser': 'IE', 'browser_version': '7.0' }
            
            driver = webdriver.Remote(
                command_executor='http://username:key@hub.browserstack.com:80/wd/hub',
                desired_capabilities=desired_cap)
            
            driver.get("http://www.google.com")
            if not "Google" in driver.title:
                raise Exception("Unable to load google page!")
            elem = driver.find_element_by_name("q")
            elem.send_keys("BrowerStack")
            elem.submit()
            print driver.title
            driver.quit()
            

            In case of BrowserStack or Sauce Labs you have an enormous amount of browsers and operating systems to choose from. Note that these are not free services and you would need a username and a key for this code to work.

            qid & accept id: (30740326, 30740575) query: I dont know how to add Proxy to my Phantomjs script soup:

            You can easily join the service_args to get a string:

            \n
            saStr = " ".join(service_args)\n
            \n

            and put that before the script:

            \n
            params = CASPER +' '+ saStr + ' ' + SCRIPT\n
            \n soup wrap:

            You can easily join the service_args to get a string:

            saStr = " ".join(service_args)
            

            and put that before the script:

            params = CASPER +' '+ saStr + ' ' + SCRIPT
            
            qid & accept id: (30777540, 30787355) query: Finding a vector that is approximately equally distant from all vectors in a set soup:

            I agree that in general this is a pretty tough optimization problem, especially at the scale you're describing. Each objective function evaluation requires O(nm + n^2) work for n points of dimension m -- O(nm) to compute distances from each point to the new point and O(n^2) to compute the objective given the distances. This is pretty scary when m=300 and n=3M. Thus even one function evaluation is probably intractable, not to mention solving the full optimization problem.

            \n

            One approach that has been mentioned in the other answer is to take the centroid of the points, which can be computed efficiently -- O(nm). A downside of this approach is that it could do terribly at the proposed objective. For instance, consider a situation in 1-dimensional space with 3 million points with value 1 and 1 point with value 0. By inspection, the optimal solution is v=0.5 with objective value 0 (it's equidistant from every point), but the centroid will select v=1 (well, a tiny bit smaller than that) with objective value 3 million.

            \n

            An approach that I think will do better than the centroid is to optimize each dimension separately (ignoring the existence of the other dimensions). While the objective function is still expensive to compute in this case, a bit of algebra shows that the derivative of the objective is quite easy to compute. It is the sum over all pairs (i, j) where i < v and j > v of the value 4*((v-i)+(v-j)). Remember we're optimizing a single dimension so the points i and j are 1-dimensional, as is v. For each dimension we therefore can sort the data (O(n lg n)) and then compute the derivative for a value v in O(n) time using a binary search and basic algebra. We can then use scipy.optimize.newton to find the zero of the derivative, which will be the optimal value for that dimension. Iterating over all dimensions, we'll have an approximate solution to our problem.

            \n

            First consider the proposed approach versus the centroid method in a simple setting, with 1-dimensional data points {0, 3, 3}:

            \n
            import bisect\nimport scipy.optimize\n\ndef fulldist(x, data):\n    dists = [sum([(x[i]-d[i])*(x[i]-d[i]) for i in range(len(x))])**0.5 for d in data]\n    obj = 0.0\n    for i in range(len(data)-1):\n        for j in range(i+1, len(data)):\n            obj += (dists[i]-dists[j]) * (dists[i]-dists[j])\n    return obj\n\ndef f1p(x, d):\n    lownum = bisect.bisect_left(d, x)\n    highnum = len(d) - lownum\n    lowsum = highnum * (x*lownum - sum([d[i] for i in range(lownum)]))\n    highsum = lownum * (x*highnum - sum([d[i] for i in range(lownum, len(d))]))\n    return 4.0 * (lowsum + highsum)\n\ndata = [(0.0,), (3.0,), (3.0,)]\nopt = []\ncentroid = []\nfor d in range(len(data[0])):\n    thisdim = [x[d] for x in data]\n    meanval = sum(thisdim) / len(thisdim)\n    centroid.append(meanval)\n    thisdim.sort()\n    opt.append(scipy.optimize.newton(f1p, meanval, args=(thisdim,)))\nprint "Proposed", opt, "objective", fulldist(opt, data)\n# Proposed [1.5] objective 0.0\nprint "Centroid", centroid, "objective", fulldist(centroid, data)\n# Centroid [2.0] objective 2.0\n
            \n

            The proposed approach finds the exact optimal solution, while the centroid method misses by a bit.

            \n

            Consider a slightly larger example with 1000 points of dimension 300, with each point drawn from a gaussian mixture. Each point's value is normally distributed with mean 0 and variance 1 with probability 0.1 and normally distributed with mean 100 and variance 1 with probability 0.9:

            \n
            data = []\nfor n in range(1000):\n    d = []\n    for m in range(300):\n        if random.random() <= 0.1:\n            d.append(random.normalvariate(0.0, 1.0))\n        else:\n            d.append(random.normalvariate(100.0, 1.0))\n    data.append(d)\n
            \n

            The resulting objective values were 1.1e6 for the proposed approach and 1.6e9 for the centroid approach, meaning the proposed approach decreased the objective by more than 99.9%. Obviously the differences in the objective value are heavily affected by the distribution of the points.

            \n

            Finally, to test the scaling (removing the final objective value calculations, since they're in general intractable), I get the following scaling with m=300: 0.9 seconds for 1,000 points, 7.1 seconds for 10,000 points, and 122.3 seconds for 100,000 points. Therefore I expect this should take about 1-2 hours for your full dataset with 3 million points.

            \n soup wrap:

            I agree that in general this is a pretty tough optimization problem, especially at the scale you're describing. Each objective function evaluation requires O(nm + n^2) work for n points of dimension m -- O(nm) to compute distances from each point to the new point and O(n^2) to compute the objective given the distances. This is pretty scary when m=300 and n=3M. Thus even one function evaluation is probably intractable, not to mention solving the full optimization problem.

            One approach that has been mentioned in the other answer is to take the centroid of the points, which can be computed efficiently -- O(nm). A downside of this approach is that it could do terribly at the proposed objective. For instance, consider a situation in 1-dimensional space with 3 million points with value 1 and 1 point with value 0. By inspection, the optimal solution is v=0.5 with objective value 0 (it's equidistant from every point), but the centroid will select v=1 (well, a tiny bit smaller than that) with objective value 3 million.

            An approach that I think will do better than the centroid is to optimize each dimension separately (ignoring the existence of the other dimensions). While the objective function is still expensive to compute in this case, a bit of algebra shows that the derivative of the objective is quite easy to compute. It is the sum over all pairs (i, j) where i < v and j > v of the value 4*((v-i)+(v-j)). Remember we're optimizing a single dimension so the points i and j are 1-dimensional, as is v. For each dimension we therefore can sort the data (O(n lg n)) and then compute the derivative for a value v in O(n) time using a binary search and basic algebra. We can then use scipy.optimize.newton to find the zero of the derivative, which will be the optimal value for that dimension. Iterating over all dimensions, we'll have an approximate solution to our problem.

            First consider the proposed approach versus the centroid method in a simple setting, with 1-dimensional data points {0, 3, 3}:

            import bisect
            import scipy.optimize
            
            def fulldist(x, data):
                dists = [sum([(x[i]-d[i])*(x[i]-d[i]) for i in range(len(x))])**0.5 for d in data]
                obj = 0.0
                for i in range(len(data)-1):
                    for j in range(i+1, len(data)):
                        obj += (dists[i]-dists[j]) * (dists[i]-dists[j])
                return obj
            
            def f1p(x, d):
                lownum = bisect.bisect_left(d, x)
                highnum = len(d) - lownum
                lowsum = highnum * (x*lownum - sum([d[i] for i in range(lownum)]))
                highsum = lownum * (x*highnum - sum([d[i] for i in range(lownum, len(d))]))
                return 4.0 * (lowsum + highsum)
            
            data = [(0.0,), (3.0,), (3.0,)]
            opt = []
            centroid = []
            for d in range(len(data[0])):
                thisdim = [x[d] for x in data]
                meanval = sum(thisdim) / len(thisdim)
                centroid.append(meanval)
                thisdim.sort()
                opt.append(scipy.optimize.newton(f1p, meanval, args=(thisdim,)))
            print "Proposed", opt, "objective", fulldist(opt, data)
            # Proposed [1.5] objective 0.0
            print "Centroid", centroid, "objective", fulldist(centroid, data)
            # Centroid [2.0] objective 2.0
            

            The proposed approach finds the exact optimal solution, while the centroid method misses by a bit.

            Consider a slightly larger example with 1000 points of dimension 300, with each point drawn from a gaussian mixture. Each point's value is normally distributed with mean 0 and variance 1 with probability 0.1 and normally distributed with mean 100 and variance 1 with probability 0.9:

            data = []
            for n in range(1000):
                d = []
                for m in range(300):
                    if random.random() <= 0.1:
                        d.append(random.normalvariate(0.0, 1.0))
                    else:
                        d.append(random.normalvariate(100.0, 1.0))
                data.append(d)
            

            The resulting objective values were 1.1e6 for the proposed approach and 1.6e9 for the centroid approach, meaning the proposed approach decreased the objective by more than 99.9%. Obviously the differences in the objective value are heavily affected by the distribution of the points.

            Finally, to test the scaling (removing the final objective value calculations, since they're in general intractable), I get the following scaling with m=300: 0.9 seconds for 1,000 points, 7.1 seconds for 10,000 points, and 122.3 seconds for 100,000 points. Therefore I expect this should take about 1-2 hours for your full dataset with 3 million points.

            qid & accept id: (30782867, 30783541) query: python create empty object of arbitrary type? soup:
            \n

            is there a way to create an empty object such that the += operator would behave simply like a regular assignment = regardless of the type on the r.h.s?

            \n
            \n

            Sure. Just write a class and define your __add__ method to return the RHS unmodified.

            \n
            class DummyItem:\n    def __add__(self, other):\n        return other\n\ns = DummyItem()\ns += 23\nprint s\n
            \n

            Result:

            \n
            23\n
            \n soup wrap:

            is there a way to create an empty object such that the += operator would behave simply like a regular assignment = regardless of the type on the r.h.s?

            Sure. Just write a class and define your __add__ method to return the RHS unmodified.

            class DummyItem:
                def __add__(self, other):
                    return other
            
            s = DummyItem()
            s += 23
            print s
            

            Result:

            23
            
            qid & accept id: (30784217, 31390024) query: Get the big-endian byte sequence of integer in Python soup:

            After searching the best way to tackle this problem, using pyjwkest seems to be a good one instead of creating my own function.

            \n
            pip install pyjwkest\n
            \n

            Then we use long_to_base64 function for this

            \n
            >>> from jwkest import long_to_base64\n>>> long_to_base64(65537)\n'AQAB'\n
            \n soup wrap:

            After searching the best way to tackle this problem, using pyjwkest seems to be a good one instead of creating my own function.

            pip install pyjwkest
            

            Then we use long_to_base64 function for this

            >>> from jwkest import long_to_base64
            >>> long_to_base64(65537)
            'AQAB'
            
            qid & accept id: (30793198, 30793218) query: Best way to Convert pairs of base 10 integers to ascii characters in python soup:

            Add a bit of maths.

            \n
              \n
            • / - Integer division
            • \n
            • % - Modulus operator
            • \n
            \n

            Code

            \n
            >>> num = 5270\n>>> pairs = [chr(num/100),chr(num%100)]\n>>> pairs\n['4', 'F']\n
            \n

            And to match desired output

            \n
            >>> ''.join(pairs)\n'4F'\n
            \n soup wrap:

            Add a bit of maths.

            • / - Integer division
            • % - Modulus operator

            Code

            >>> num = 5270
            >>> pairs = [chr(num/100),chr(num%100)]
            >>> pairs
            ['4', 'F']
            

            And to match desired output

            >>> ''.join(pairs)
            '4F'
            
            qid & accept id: (30795450, 30802086) query: how to search values in a file and replace soup:

            I've got for you another solution, basically what it does, instead of reading whole content of file into memory, you read line by line and check in each line you read if it has one of elements of list_to_search, then modify it if so:

            \n
            list_to_search =['TRC_BTM', 'TRC_HCI', 'TRC_L2CAP']\nmyDict = {'TRC_BTM': '6', 'TRC_HCI': '6', 'TRC_L2CAP': '6'}\n\nfilename ='file.conf'\n\nwith open(filename, 'rb+') as f:\n\n    while True:         \n        line = f.readline()\n        if not line: break          \n        for key in list_to_search:\n            if key in line:\n                f.seek(-len(line),1)\n                f.write(key + '=' + myDict[key] + '\n')\n                f.flush()\n
            \n

            EDIT: In response to your comment below:

            \n
            with open(filename, 'rb+') as f:\n\n    while True:         \n        line = f.readline()\n        if not line: break        \n        if '=2' in line:\n                f.seek(-len(line),1)\n                f.write(line.split('=2')[0]+'=6')\n                f.flush()\n
            \n soup wrap:

            I've got for you another solution, basically what it does, instead of reading whole content of file into memory, you read line by line and check in each line you read if it has one of elements of list_to_search, then modify it if so:

            list_to_search =['TRC_BTM', 'TRC_HCI', 'TRC_L2CAP']
            myDict = {'TRC_BTM': '6', 'TRC_HCI': '6', 'TRC_L2CAP': '6'}
            
            filename ='file.conf'
            
            with open(filename, 'rb+') as f:
            
                while True:         
                    line = f.readline()
                    if not line: break          
                    for key in list_to_search:
                        if key in line:
                            f.seek(-len(line),1)
                            f.write(key + '=' + myDict[key] + '\n')
                            f.flush()
            

            EDIT: In response to your comment below:

            with open(filename, 'rb+') as f:
            
                while True:         
                    line = f.readline()
                    if not line: break        
                    if '=2' in line:
                            f.seek(-len(line),1)
                            f.write(line.split('=2')[0]+'=6')
                            f.flush()
            
            qid & accept id: (30801180, 30801245) query: Find k smallest pairs in two lists soup:

            You can use heapq.nsmallest with sum as its key function :

            \n
            >>> import heapq\n>>> heapq.nsmallest(3,c,key=sum)\n[(1, 2), (1, 4), (3, 2)]\n
            \n

            Or as @jonrsharpe said in comment you can use sorted :

            \n
            sorted(c, key=sum)[:k]\n
            \n soup wrap:

            You can use heapq.nsmallest with sum as its key function :

            >>> import heapq
            >>> heapq.nsmallest(3,c,key=sum)
            [(1, 2), (1, 4), (3, 2)]
            

            Or as @jonrsharpe said in comment you can use sorted :

            sorted(c, key=sum)[:k]
            
            qid & accept id: (30837683, 30837721) query: chunk of data into fixed lengths chunks and then add a space and again add them all as a string soup:

            You can simply do

            \n
            x="a85b080040010000"\nprint re.sub(r"(.{2})",r"\1 ",x)\n
            \n

            or

            \n
            x="a85b080040010000"\n\nprint " ".join([i for i in re.split(r"(.{2})",x) if i])\n
            \n soup wrap:

            You can simply do

            x="a85b080040010000"
            print re.sub(r"(.{2})",r"\1 ",x)
            

            or

            x="a85b080040010000"
            
            print " ".join([i for i in re.split(r"(.{2})",x) if i])
            
            qid & accept id: (30904903, 30905128) query: writing csv output python soup:

            Something like this should work for your question:

            \n
            import csv\nimport calendar\nfrom collections import defaultdict\n\nmonths = [calendar.month_name[i] for i in range(0, 13)]\ntotals = defaultdict(int)\n\nwith open("data.csv", "r") as inf, open("data-out.csv", "w") as ouf:\n    reader = csv.DictReader(inf)\n    writer = csv.DictWriter(ouf, ['Name'] + months[5:9])\n    writer.writeheader()\n    for row in reader:\n        m1 = months[int(row['Date1'].split('/')[0])]\n        p2 = int(row['Price2'])\n        totals[m1] += p2\n\n        m2 = months[int(row['Date2'].split('/')[0])]\n        p1 = int(row['Price1'])\n        totals[m2] += p1\n\n        writer.writerow({'Name': row['Name'], m1: p2, m2: p1})\n\n    totals['Name'] = 'Total'\n    writer.writerow(totals)\n
            \n
            \n
            with open("data-out.csv", "r") as f:\n    print(f.read())\n\nName,May,June,July,August\nABC,7500,1000,,\nDEF,500,,3000,\nGHI,,3500,,5000\nTotal,8000,4500,3000,5000\n
            \n
            \n

            If your Date#'s span the entire year you can change:

            \n
            writer = csv.DictWriter(ouf, ['Name'] + months[5:9])\n
            \n

            to

            \n
            writer = csv.DictWriter(ouf, ['Name'] + months[1:])\n
            \n soup wrap:

            Something like this should work for your question:

            import csv
            import calendar
            from collections import defaultdict
            
            months = [calendar.month_name[i] for i in range(0, 13)]
            totals = defaultdict(int)
            
            with open("data.csv", "r") as inf, open("data-out.csv", "w") as ouf:
                reader = csv.DictReader(inf)
                writer = csv.DictWriter(ouf, ['Name'] + months[5:9])
                writer.writeheader()
                for row in reader:
                    m1 = months[int(row['Date1'].split('/')[0])]
                    p2 = int(row['Price2'])
                    totals[m1] += p2
            
                    m2 = months[int(row['Date2'].split('/')[0])]
                    p1 = int(row['Price1'])
                    totals[m2] += p1
            
                    writer.writerow({'Name': row['Name'], m1: p2, m2: p1})
            
                totals['Name'] = 'Total'
                writer.writerow(totals)
            

            with open("data-out.csv", "r") as f:
                print(f.read())
            
            Name,May,June,July,August
            ABC,7500,1000,,
            DEF,500,,3000,
            GHI,,3500,,5000
            Total,8000,4500,3000,5000
            

            If your Date#'s span the entire year you can change:

            writer = csv.DictWriter(ouf, ['Name'] + months[5:9])
            

            to

            writer = csv.DictWriter(ouf, ['Name'] + months[1:])
            
            qid & accept id: (30909414, 30910339) query: Loops to minimize function of arrays in python soup:

            There is already a handy formula for least squares fitting.

            \n

            I came up with two different ways to solve your problem.

            \n
            \n

            For the first one, consider the matrix K:

            \n
            L = len(X)\nK = np.identity(L) - np.ones((L, L)) / L\n
            \n

            In your case, A and B are defined as:

            \n
            A = K.dot(np.array([Y, Z]).transpose())\nB = K.dot(np.array([X]).transpose())\n
            \n

            Apply the formula to find C that minimizes the error A * C - B:

            \n
            C = np.linalg.inv(np.transpose(A).dot(A))\nC = C.dot(np.transpose(A)).dot(B)\n
            \n

            Then the result is:

            \n
            a, b = C.reshape(2)\n
            \n

            Also, note that numpy already provides linalg.lstsq that does the exact same thing:

            \n
            a, b = np.linalg.lstsq(A, B)[0].reshape(2)\n
            \n
            \n

            A simpler way is to define A as:

            \n
            A = np.array([Y, Z, [1]*len(X)]).transpose()\n
            \n

            Then solve it against X to get the coefficients and the mean:

            \n
            a, b, mean = np.linalg.lstsq(A, X)[0]\n
            \n

            If you need a proof of this result, have a look at this post.

            \n
            \n

            Example:

            \n
            >>> import numpy as np\n>>> X = [5, 7, 9, 5]\n>>> Y = [2, 0, 4, 1]\n>>> Z = [7, 2, 4, 6]\n>>> A = np.array([Y, Z, [1] * len(X)]).transpose()\n>>> a, b, mean = np.linalg.lstsq(A, X)[0]\n>>> print(a, b, mean)\n0.860082304527 -0.736625514403 8.49382716049\n
            \n soup wrap:

            There is already a handy formula for least squares fitting.

            I came up with two different ways to solve your problem.


            For the first one, consider the matrix K:

            L = len(X)
            K = np.identity(L) - np.ones((L, L)) / L
            

            In your case, A and B are defined as:

            A = K.dot(np.array([Y, Z]).transpose())
            B = K.dot(np.array([X]).transpose())
            

            Apply the formula to find C that minimizes the error A * C - B:

            C = np.linalg.inv(np.transpose(A).dot(A))
            C = C.dot(np.transpose(A)).dot(B)
            

            Then the result is:

            a, b = C.reshape(2)
            

            Also, note that numpy already provides linalg.lstsq that does the exact same thing:

            a, b = np.linalg.lstsq(A, B)[0].reshape(2)
            

            A simpler way is to define A as:

            A = np.array([Y, Z, [1]*len(X)]).transpose()
            

            Then solve it against X to get the coefficients and the mean:

            a, b, mean = np.linalg.lstsq(A, X)[0]
            

            If you need a proof of this result, have a look at this post.


            Example:

            >>> import numpy as np
            >>> X = [5, 7, 9, 5]
            >>> Y = [2, 0, 4, 1]
            >>> Z = [7, 2, 4, 6]
            >>> A = np.array([Y, Z, [1] * len(X)]).transpose()
            >>> a, b, mean = np.linalg.lstsq(A, X)[0]
            >>> print(a, b, mean)
            0.860082304527 -0.736625514403 8.49382716049
            
            qid & accept id: (30925637, 30925821) query: Getting a pdf from scipy.stats in a generic way soup:

            To evaluate the pdf at abscissas, you would pass abcissas as the first argument to pdf. To specify the parameters, use the * operator to unpack the param tuple and pass those values to distr.pdf:

            \n
            pdf = distr.pdf(abscissas, *param)\n
            \n
            \n

            For example,

            \n
            import numpy as np\nimport scipy.stats as stats\n\ndistrNameList = ['beta', 'expon', 'gamma']\nsample = stats.norm(0, 1).rvs(1000)\nabscissas = np.linspace(0,1, 10)\nfor distrName in distrNameList:\n    distr = getattr(stats.distributions, distrName)\n    param = distr.fit(sample)\n    pdf = distr.pdf(abscissas, *param)\n    print(pdf)\n
            \n soup wrap:

            To evaluate the pdf at abscissas, you would pass abcissas as the first argument to pdf. To specify the parameters, use the * operator to unpack the param tuple and pass those values to distr.pdf:

            pdf = distr.pdf(abscissas, *param)
            

            For example,

            import numpy as np
            import scipy.stats as stats
            
            distrNameList = ['beta', 'expon', 'gamma']
            sample = stats.norm(0, 1).rvs(1000)
            abscissas = np.linspace(0,1, 10)
            for distrName in distrNameList:
                distr = getattr(stats.distributions, distrName)
                param = distr.fit(sample)
                pdf = distr.pdf(abscissas, *param)
                print(pdf)
            
            qid & accept id: (30936020, 30936049) query: replace multiple occurrences of any special character by one in python soup:

            You can use \W to match any non-word character:

            \n
            line = re.sub(r'\W+', '.', line)\n
            \n
            \n

            If you want to replace with same special character then use:

            \n
            line = re.sub(r'(\W)(?=\1)', '', line)\n
            \n soup wrap:

            You can use \W to match any non-word character:

            line = re.sub(r'\W+', '.', line)
            

            If you want to replace with same special character then use:

            line = re.sub(r'(\W)(?=\1)', '', line)
            
            qid & accept id: (30949202, 30956282) query: Spark DataFrame TimestampType - how to get Year, Month, Day values from field? soup:

            You can use simple map as with any other RDD:

            \n
            elevDF = sqlContext.createDataFrame(sc.parallelize([\n        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=1, value=638.55),\n        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=2, value=638.55),\n        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=3, value=638.55),\n        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=4, value=638.55),\n        Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=5, value=638.55)]))\n\n(elevDF\n .map(lambda (date, hour, value): (date.year, date.month, date.day))\n .collect())\n
            \n

            and the result is:

            \n
            [(1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1)]\n
            \n

            Btw: datetime.datetime stores an hour anyway so keeping it separately seems to be a waste of memory.

            \n

            Since Spark 1.5 you can use a number of date processing functions

            \n
            import datetime\nfrom pyspark.sql.functions import year, month, dayofmonth\n\nelevDF = sc.parallelize([\n    (datetime.datetime(1984, 1, 1, 0, 0), 1, 638.55),\n    (datetime.datetime(1984, 1, 1, 0, 0), 2, 638.55),\n    (datetime.datetime(1984, 1, 1, 0, 0), 3, 638.55),\n    (datetime.datetime(1984, 1, 1, 0, 0), 4, 638.55),\n    (datetime.datetime(1984, 1, 1, 0, 0), 5, 638.55)\n]).toDF(["date", "hour", "value"])\n\nelevDF.select(year("date").alias('year'), month("date").alias('month'), dayofmonth("date").alias('day')).show()\n# +----+-----+---+\n# |year|month|day|\n# +----+-----+---+\n# |1984|    1|  1|\n# |1984|    1|  1|\n# |1984|    1|  1|\n# |1984|    1|  1|\n# |1984|    1|  1|\n# +----+-----+---+\n
            \n soup wrap:

            You can use simple map as with any other RDD:

            elevDF = sqlContext.createDataFrame(sc.parallelize([
                    Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=1, value=638.55),
                    Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=2, value=638.55),
                    Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=3, value=638.55),
                    Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=4, value=638.55),
                    Row(date=datetime.datetime(1984, 1, 1, 0, 0), hour=5, value=638.55)]))
            
            (elevDF
             .map(lambda (date, hour, value): (date.year, date.month, date.day))
             .collect())
            

            and the result is:

            [(1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1), (1984, 1, 1)]
            

            Btw: datetime.datetime stores an hour anyway so keeping it separately seems to be a waste of memory.

            Since Spark 1.5 you can use a number of date processing functions

            import datetime
            from pyspark.sql.functions import year, month, dayofmonth
            
            elevDF = sc.parallelize([
                (datetime.datetime(1984, 1, 1, 0, 0), 1, 638.55),
                (datetime.datetime(1984, 1, 1, 0, 0), 2, 638.55),
                (datetime.datetime(1984, 1, 1, 0, 0), 3, 638.55),
                (datetime.datetime(1984, 1, 1, 0, 0), 4, 638.55),
                (datetime.datetime(1984, 1, 1, 0, 0), 5, 638.55)
            ]).toDF(["date", "hour", "value"])
            
            elevDF.select(year("date").alias('year'), month("date").alias('month'), dayofmonth("date").alias('day')).show()
            # +----+-----+---+
            # |year|month|day|
            # +----+-----+---+
            # |1984|    1|  1|
            # |1984|    1|  1|
            # |1984|    1|  1|
            # |1984|    1|  1|
            # |1984|    1|  1|
            # +----+-----+---+
            
            qid & accept id: (30955735, 30955827) query: How to access a button's parent in Tkinter without writing class? soup:

            You can use but.master to access the parent of the but object.

            \n

            To get the container widget of a widget that's handling a callback, you can do:

            \n
            def callback(evt):\n    handling_widget = evt.widget\n    parent_of_handling_widget = handling_widget.master\n    # or evt.widget.master\n    parent_of_handling_widget.destroy()\n
            \n

            That said, I'm not exactly sure why you're trying to avoid using a custom class. It's a natural solution to your problem.

            \n
            import tkinter\nfrom tkinter import ttk\n\nclass MyButton(ttk.Button):\n\n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.configure(command=self.callback)\n\n    def callback(self):\n        self.master.destroy()\n\ntk = tkinter.Tk()\nb = MyButton(tk, text="close window!")\nb.pack()  # or whatever geometry manager you're using\n\n# we're done!\n
            \n soup wrap:

            You can use but.master to access the parent of the but object.

            To get the container widget of a widget that's handling a callback, you can do:

            def callback(evt):
                handling_widget = evt.widget
                parent_of_handling_widget = handling_widget.master
                # or evt.widget.master
                parent_of_handling_widget.destroy()
            

            That said, I'm not exactly sure why you're trying to avoid using a custom class. It's a natural solution to your problem.

            import tkinter
            from tkinter import ttk
            
            class MyButton(ttk.Button):
            
                def __init__(self, *args, **kwargs):
                    super().__init__(*args, **kwargs)
                    self.configure(command=self.callback)
            
                def callback(self):
                    self.master.destroy()
            
            tk = tkinter.Tk()
            b = MyButton(tk, text="close window!")
            b.pack()  # or whatever geometry manager you're using
            
            # we're done!
            
            qid & accept id: (30960440, 30960561) query: pandas count true values in multi-index frame soup:

            Notice that if you unstack the id index level of df then you get:

            \n
            In [35]: df.unstack(['id'])\nOut[35]: \n       val             \nid       1      2     3\nyear                   \n2001  True  False  True\n2002  True   True  True\n
            \n

            And we can think of the values above as a boolean array, arr:

            \n
            arr = df.unstack(['id']).values\n# array([[ True, False,  True],\n#        [ True,  True,  True]], dtype=bool)\n
            \n

            Imagine taking all the rows of the array except the last one:

            \n
            In [44]: arr[:-1]\nOut[44]: array([[ True, False,  True]], dtype=bool)\n
            \n

            and comparing it to all the rows of the array except the first one:

            \n
            In [45]: arr[1:]\nOut[45]: array([[ True,  True,  True]], dtype=bool)\n
            \n

            We want to count in how many locations they are equal and also equal to True:

            \n
            In [41]: ((arr[:-1] == arr[1:]) & (arr[:-1] == True)).sum()\nOut[41]: 2\n
            \n soup wrap:

            Notice that if you unstack the id index level of df then you get:

            In [35]: df.unstack(['id'])
            Out[35]: 
                   val             
            id       1      2     3
            year                   
            2001  True  False  True
            2002  True   True  True
            

            And we can think of the values above as a boolean array, arr:

            arr = df.unstack(['id']).values
            # array([[ True, False,  True],
            #        [ True,  True,  True]], dtype=bool)
            

            Imagine taking all the rows of the array except the last one:

            In [44]: arr[:-1]
            Out[44]: array([[ True, False,  True]], dtype=bool)
            

            and comparing it to all the rows of the array except the first one:

            In [45]: arr[1:]
            Out[45]: array([[ True,  True,  True]], dtype=bool)
            

            We want to count in how many locations they are equal and also equal to True:

            In [41]: ((arr[:-1] == arr[1:]) & (arr[:-1] == True)).sum()
            Out[41]: 2
            
            qid & accept id: (30977603, 30997654) query: Parse logs containing python tracebacks using logstash soup:

            Well, I found a solution. So the approach I followed is that I will ignore the starting of a log message which starts with '['and all the other lines will be appended at the end of the previous message. Then grok filter can be applied and the traceback can be parsed. Note that I have to apply two grok filters:

            \n
              \n
            1. For when there is a traceback with GREEDYDATA to get the traceback.

            2. \n
            3. For when there is no traceback, GREEDYDATA parsing fails and I'll have to remove the _grokparsefailure tag and then again apply grok without GREEDYDATA. This is done with the help of if block.

            4. \n
            \n

            The final logstash filter looks something like this:

            \n
            filter {\n\n    multiline {\n        pattern => "^[^\[]"\n        what => "previous"\n    }\n\n\n\n    grok {\n        match => [\n            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"\n        ]\n    }\n\n    if "_grokparsefailure" in [tags] {\n        grok {\n            match => [\n            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)"\n                ]\n            remove_tag => ["_grokparsefailure"]\n        }\n    }\n\n    else {\n        mutate {\n            convert => {"traceback" => "string"}\n        }\n    }\n\n    date {\n        match => ["timestamp", "dd/MM/YYYY:HH:MM:ss Z"]\n        locale => en\n    }\n    geoip {\n        source => "clientip"\n    }\n    useragent {\n        source => "agent"\n        target => "Useragent"\n    }\n}\n
            \n

            Alternatively, if you don't want to use the if block to check another grok pattern and remove the _grokparsefailure, you can use the first grok filter to check for both the message types by including multiple message-pattern checks in the match array of grok filter. It can be done like this:

            \n
                    grok {\n            match => [\n            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)",\n            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"\n                ]\n        }\n
            \n

            And there is a third approach as well (possibly the most elegant one). It looks something like this:

            \n
            grok {\n    match => [\n        "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)(%{GREEDYDATA:traceback})?"\n    ]\n}\n
            \n

            Note that in this method, the field whose existence is optional has to be enclosed in "()?". Here, (%{GREEDYDATA:traceback})?

            \n

            Thus, the grok filter sees that if the field is available, it will be parsed. Otherwise, it will be skipped.

            \n soup wrap:

            Well, I found a solution. So the approach I followed is that I will ignore the starting of a log message which starts with '['and all the other lines will be appended at the end of the previous message. Then grok filter can be applied and the traceback can be parsed. Note that I have to apply two grok filters:

            1. For when there is a traceback with GREEDYDATA to get the traceback.

            2. For when there is no traceback, GREEDYDATA parsing fails and I'll have to remove the _grokparsefailure tag and then again apply grok without GREEDYDATA. This is done with the help of if block.

            The final logstash filter looks something like this:

            filter {
            
                multiline {
                    pattern => "^[^\[]"
                    what => "previous"
                }
            
            
            
                grok {
                    match => [
                        "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"
                    ]
                }
            
                if "_grokparsefailure" in [tags] {
                    grok {
                        match => [
                        "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)"
                            ]
                        remove_tag => ["_grokparsefailure"]
                    }
                }
            
                else {
                    mutate {
                        convert => {"traceback" => "string"}
                    }
                }
            
                date {
                    match => ["timestamp", "dd/MM/YYYY:HH:MM:ss Z"]
                    locale => en
                }
                geoip {
                    source => "clientip"
                }
                useragent {
                    source => "agent"
                    target => "Useragent"
                }
            }
            

            Alternatively, if you don't want to use the if block to check another grok pattern and remove the _grokparsefailure, you can use the first grok filter to check for both the message types by including multiple message-pattern checks in the match array of grok filter. It can be done like this:

                    grok {
                        match => [
                        "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)",
                        "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"
                            ]
                    }
            

            And there is a third approach as well (possibly the most elegant one). It looks something like this:

            grok {
                match => [
                    "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)(%{GREEDYDATA:traceback})?"
                ]
            }
            

            Note that in this method, the field whose existence is optional has to be enclosed in "()?". Here, (%{GREEDYDATA:traceback})?

            Thus, the grok filter sees that if the field is available, it will be parsed. Otherwise, it will be skipped.

            qid & accept id: (30992225, 31100761) query: Extract links for certain section only from blogspot using BeautifulSoup soup:

            If you don't necessarily need to use BeautifulSoup I think it would be easier to do something like this:

            \n
            import feedparser\n\nurl = feedparser.parse('http://ellywonderland.blogspot.com/feeds/posts/default?alt=rss')\nfor x in url.entries:\n    print str(x.link)\n
            \n

            Output:

            \n
            http://ellywonderland.blogspot.com/2011/03/my-vintage-pre-wedding.html\nhttp://ellywonderland.blogspot.com/2011/02/pre-wedding-vintage.html\nhttp://ellywonderland.blogspot.com/2010/12/tissue-paper-flower-crepe-paper.html\nhttp://ellywonderland.blogspot.com/2010/12/menguap-menurut-islam.html\nhttp://ellywonderland.blogspot.com/2010/12/weddings-idea.html\nhttp://ellywonderland.blogspot.com/2010/12/kawin.html\nhttp://ellywonderland.blogspot.com/2010/11/vitamin-c-collagen.html\nhttp://ellywonderland.blogspot.com/2010/11/port-dickson.html\nhttp://ellywonderland.blogspot.com/2010/11/ellys-world.html\n
            \n

            feedparser can parse the RSS feed of the blogspot page and can return the data you want, in this case the href for the post titles.

            \n soup wrap:

            If you don't necessarily need to use BeautifulSoup I think it would be easier to do something like this:

            import feedparser
            
            url = feedparser.parse('http://ellywonderland.blogspot.com/feeds/posts/default?alt=rss')
            for x in url.entries:
                print str(x.link)
            

            Output:

            http://ellywonderland.blogspot.com/2011/03/my-vintage-pre-wedding.html
            http://ellywonderland.blogspot.com/2011/02/pre-wedding-vintage.html
            http://ellywonderland.blogspot.com/2010/12/tissue-paper-flower-crepe-paper.html
            http://ellywonderland.blogspot.com/2010/12/menguap-menurut-islam.html
            http://ellywonderland.blogspot.com/2010/12/weddings-idea.html
            http://ellywonderland.blogspot.com/2010/12/kawin.html
            http://ellywonderland.blogspot.com/2010/11/vitamin-c-collagen.html
            http://ellywonderland.blogspot.com/2010/11/port-dickson.html
            http://ellywonderland.blogspot.com/2010/11/ellys-world.html
            

            feedparser can parse the RSS feed of the blogspot page and can return the data you want, in this case the href for the post titles.

            qid & accept id: (31009455, 31009614) query: lxml etree find closest element before soup:

            Use the preceding axis:

            \n
            \n

            The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.

            \n
            \n
            for el in elems:\n    try:\n        print el.xpath("preceding::c[@attr1]")[-1].get("attr1")\n    except IndexError:\n        print "No preceding 'c' element."\n
            \n

            Demo:

            \n
            >>> from lxml import etree\n>>> \n>>> data = """\n... \n...     \n...         \n...     \n... \n...     \n...     \n...         \n...     \n...     \n...     \n... \n... """\n>>> xmltree = etree.fromstring(data)\n>>> elems = xmltree.xpath('//d')\n>>> \n>>> for el in elems:\n...     try:\n...         print el.xpath("preceding::c[@attr1]")[-1].get("attr1")\n...     except IndexError:\n...         print "No preceding 'c' element."\n... \nNo preceding 'c' element.\nimportant\n
            \n soup wrap:

            Use the preceding axis:

            The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.

            for el in elems:
                try:
                    print el.xpath("preceding::c[@attr1]")[-1].get("attr1")
                except IndexError:
                    print "No preceding 'c' element."
            

            Demo:

            >>> from lxml import etree
            >>> 
            >>> data = """
            ... 
            ...     
            ...         
            ...     
            ... 
            ...     
            ...     
            ...         
            ...     
            ...     
            ...     
            ... 
            ... """
            >>> xmltree = etree.fromstring(data)
            >>> elems = xmltree.xpath('//d')
            >>> 
            >>> for el in elems:
            ...     try:
            ...         print el.xpath("preceding::c[@attr1]")[-1].get("attr1")
            ...     except IndexError:
            ...         print "No preceding 'c' element."
            ... 
            No preceding 'c' element.
            important
            
            qid & accept id: (31011179, 31011255) query: Converting JSON to HTML table in Python soup:

            Try the following:

            \n
            infoFromJson = json.loads(jsonfile)\nprint json2html.convert(json = infoFromJson)\n
            \n

            The result from json2html.convert is a string.

            \n

            If you don't have module:

            \n
            $ pip install json2html\n
            \n

            More examples here.

            \n soup wrap:

            Try the following:

            infoFromJson = json.loads(jsonfile)
            print json2html.convert(json = infoFromJson)
            

            The result from json2html.convert is a string.

            If you don't have module:

            $ pip install json2html
            

            More examples here.

            qid & accept id: (31011565, 31011623) query: Get a value from a dictionary without linking to the memory location soup:

            Dictionaries doesn't preserves the objects values just a reference to objects.

            \n

            You need to assign a copy of the list to another variable name, you can use [:] to create a shallow copy :

            \n
            >>> D = {"A":[1,2,3]}\n>>> C = D["A"][:]\n>>> C.append(4)\n>>> D["A"]\n[1, 2, 3]\n
            \n

            Or use copy module :

            \n
            >>> import copy\n>>> C = copy.copy(D["A"])\n>>> C.append(4)\n>>> D["A"]\n[1, 2, 3]\n
            \n soup wrap:

            Dictionaries doesn't preserves the objects values just a reference to objects.

            You need to assign a copy of the list to another variable name, you can use [:] to create a shallow copy :

            >>> D = {"A":[1,2,3]}
            >>> C = D["A"][:]
            >>> C.append(4)
            >>> D["A"]
            [1, 2, 3]
            

            Or use copy module :

            >>> import copy
            >>> C = copy.copy(D["A"])
            >>> C.append(4)
            >>> D["A"]
            [1, 2, 3]
            
            qid & accept id: (31014848, 31015480) query: Reorder Columns by String Variable soup:

            Here is a simple clean solution written in python. You have to replace input.csv and output.csv with your CSV files.

            \n
            import csv \n\nlabels = [\n    "Reading Comprehension", "Sentence Skills", "Arithmetic",\n    "College Level Math", "Elementary Algebra"\n]\n\nwith open('output.csv', 'wb') as outfile, \\n     open('input.csv', 'rb') as infile:\n    writer = csv.writer(outfile)\n    reader = csv.reader(infile) \n\n    for row in reader: \n        head = row[:5]\n        tail = []\n        for label in labels:\n            tail.append(next((i for i in row[5:] if i.startswith(label)), ""))\n        writer.writerow(head + tail)\n
            \n

            Here is another shorter solution, which uses piping:

            \n
            #!/usr/bin/python    \nfrom sys import stdin, stdout\n\nlabels = [\n    "Reading Comprehension", "Sentence Skills", "Arithmetic",\n    "College Level Math", "Elementary Algebra"\n]\n\nfor line in stdin: \n    values = line.strip().split(',')\n    stdout.write(','.join(values[:5]))\n    for label in labels:\n        stdout.write(',')\n        stdout.write(next((i for i in values[5:] if i.startswith(label)), ''))\n    stdout.write('\n')\nstdout.flush()\n
            \n

            If you save this code in a file, for example called reorder, and make this file executable, you can reformat your CSV file like this:

            \n
            $ cat input.csv | ./reorder\n
            \n

            The reformatted csv content is then written to the standard output.

            \n soup wrap:

            Here is a simple clean solution written in python. You have to replace input.csv and output.csv with your CSV files.

            import csv 
            
            labels = [
                "Reading Comprehension", "Sentence Skills", "Arithmetic",
                "College Level Math", "Elementary Algebra"
            ]
            
            with open('output.csv', 'wb') as outfile, \
                 open('input.csv', 'rb') as infile:
                writer = csv.writer(outfile)
                reader = csv.reader(infile) 
            
                for row in reader: 
                    head = row[:5]
                    tail = []
                    for label in labels:
                        tail.append(next((i for i in row[5:] if i.startswith(label)), ""))
                    writer.writerow(head + tail)
            

            Here is another shorter solution, which uses piping:

            #!/usr/bin/python    
            from sys import stdin, stdout
            
            labels = [
                "Reading Comprehension", "Sentence Skills", "Arithmetic",
                "College Level Math", "Elementary Algebra"
            ]
            
            for line in stdin: 
                values = line.strip().split(',')
                stdout.write(','.join(values[:5]))
                for label in labels:
                    stdout.write(',')
                    stdout.write(next((i for i in values[5:] if i.startswith(label)), ''))
                stdout.write('\n')
            stdout.flush()
            

            If you save this code in a file, for example called reorder, and make this file executable, you can reformat your CSV file like this:

            $ cat input.csv | ./reorder
            

            The reformatted csv content is then written to the standard output.

            qid & accept id: (31021283, 31021698) query: to get max number after concatenation in list soup:

            The simplest, but not very efficient way is:

            \n
            import itertools\n\nnums = [4, 94, 9, 14, 1]\nmax_num = 0\nmax_nums = None\nfor p in itertools.permutations(map(str, nums)):\n    num = int(''.join(p))\n    if num > max_num:\n        max_num = num\n        max_nums = p\nprint map(int, max_nums)\nprint max_num\n
            \n

            Output:

            \n
            [9, 94, 4, 14, 1]\n9944141\n
            \n soup wrap:

            The simplest, but not very efficient way is:

            import itertools
            
            nums = [4, 94, 9, 14, 1]
            max_num = 0
            max_nums = None
            for p in itertools.permutations(map(str, nums)):
                num = int(''.join(p))
                if num > max_num:
                    max_num = num
                    max_nums = p
            print map(int, max_nums)
            print max_num
            

            Output:

            [9, 94, 4, 14, 1]
            9944141
            
            qid & accept id: (31029467, 31029498) query: minimize memory consumption when dealing with python list assignment soup:

            You can make an iterator from b, then in a list comprehension if the current element of a is None, you can next(b_iter) to grab the next item.

            \n
            b_iter = iter(b)\na = [next(b_iter) if i is None else i for i in a]\n
            \n

            As an example

            \n
            >>> a = [None, 0, None, None, 0, None, None, None, 0, None, None, None, None, None, 0]\n>>> b = [7, 1, 4, 8, 2, 1, 1, 1, 1, 6, 1]\n>>> b_iter = iter(b)\n>>> [next(b_iter) if i is None else i for i in a]\n[7, 0, 1, 4, 0, 8, 2, 1, 0, 1, 1, 1, 6, 1, 0]\n
            \n soup wrap:

            You can make an iterator from b, then in a list comprehension if the current element of a is None, you can next(b_iter) to grab the next item.

            b_iter = iter(b)
            a = [next(b_iter) if i is None else i for i in a]
            

            As an example

            >>> a = [None, 0, None, None, 0, None, None, None, 0, None, None, None, None, None, 0]
            >>> b = [7, 1, 4, 8, 2, 1, 1, 1, 1, 6, 1]
            >>> b_iter = iter(b)
            >>> [next(b_iter) if i is None else i for i in a]
            [7, 0, 1, 4, 0, 8, 2, 1, 0, 1, 1, 1, 6, 1, 0]
            
            qid & accept id: (31076841, 31076941) query: How can I wait until an element gains or loses a class? soup:

            There is no built-in way to achieve it. You would need to write a custom Expected Condition:

            \n
            from selenium.webdriver.support import expected_conditions as EC\n\nclass wait_for_class(object):\n    def __init__(self, locator, class_name):\n        self.locator = locator\n        self.class_name = class_name\n\n    def __call__(self, driver):\n        try:\n            element_class = EC._find_element(driver, self.locator).get_attribute('class')\n            return element_class and self.class_name in element_class\n        except StaleElementReferenceException:\n            return False\n
            \n

            Usage:

            \n
            wait = WebDriverWait(driver, 10)\nwait.until(wait_for_class((By.ID, 'select-1'), "ui-state-error"))\n
            \n

            Expected conditions are, basically callables, which means you can just write a function instead, but I like to follow the way they are implemented as classes internally in python-selenium.

            \n soup wrap:

            There is no built-in way to achieve it. You would need to write a custom Expected Condition:

            from selenium.webdriver.support import expected_conditions as EC
            
            class wait_for_class(object):
                def __init__(self, locator, class_name):
                    self.locator = locator
                    self.class_name = class_name
            
                def __call__(self, driver):
                    try:
                        element_class = EC._find_element(driver, self.locator).get_attribute('class')
                        return element_class and self.class_name in element_class
                    except StaleElementReferenceException:
                        return False
            

            Usage:

            wait = WebDriverWait(driver, 10)
            wait.until(wait_for_class((By.ID, 'select-1'), "ui-state-error"))
            

            Expected conditions are, basically callables, which means you can just write a function instead, but I like to follow the way they are implemented as classes internally in python-selenium.

            qid & accept id: (31078921, 31079593) query: Find selected features by RandomizedLogisticRegression soup:

            You should use the get_support function:

            \n
            from sklearn.datasets import load_iris\nfrom sklearn.linear_model import RandomizedLogisticRegression\n\niris = load_iris()\nX, y = iris.data, iris.target\n\nclf = RandomizedLogisticRegression()\nclf.fit(X,y)\nprint clf.get_support()\n\n#prints [False  True  True  True]\n
            \n

            Alternatively, you can get the indices of the support features:

            \n
            print clf.get_support(indices=True)\n#prints [1 2 3]\n
            \n soup wrap:

            You should use the get_support function:

            from sklearn.datasets import load_iris
            from sklearn.linear_model import RandomizedLogisticRegression
            
            iris = load_iris()
            X, y = iris.data, iris.target
            
            clf = RandomizedLogisticRegression()
            clf.fit(X,y)
            print clf.get_support()
            
            #prints [False  True  True  True]
            

            Alternatively, you can get the indices of the support features:

            print clf.get_support(indices=True)
            #prints [1 2 3]
            
            qid & accept id: (31105362, 31105674) query: Using DataFrame to get matrix of identifiers soup:

            You can use the get_dummies function to your advantage here:

            \n
            users = data.set_index('date')['user_id']\nvisits = pd.get_dummies(users)\n
            \n

            This gives us a dataframe which uses "one-hot" encoding to denote whether a user visited on the date:

            \n
                        a1  a15  a3  a4  a5  a8\ndate                               \n2011-01-02   0    0   0   0   0   1\n2011-01-05   1    0   0   0   0   0\n2011-01-05   1    0   0   0   0   0\n2011-01-12   0    0   0   1   0   0\n2011-01-12   0    0   1   0   0   0\n2011-01-12   1    0   0   0   0   0\n2011-01-12   0    1   0   0   0   0\n2011-01-19   0    1   0   0   0   0\n2011-01-19   1    0   0   0   0   0\n2011-01-19   0    0   0   0   1   0\n
            \n

            But the dates are repeated. We therefore group by the date index and aggregate, asking if the user visited on any of the entries for that date:

            \n
            visits.groupby(visits.index).any().astype(int)\n
            \n

            which gives:

            \n
                        a1  a15  a3  a4  a5  a8\ndate                               \n2011-01-02   0    0   0   0   0   1\n2011-01-05   1    0   0   0   0   0\n2011-01-12   1    1   1   1   0   0\n2011-01-19   1    1   0   0   1   0\n
            \n soup wrap:

            You can use the get_dummies function to your advantage here:

            users = data.set_index('date')['user_id']
            visits = pd.get_dummies(users)
            

            This gives us a dataframe which uses "one-hot" encoding to denote whether a user visited on the date:

                        a1  a15  a3  a4  a5  a8
            date                               
            2011-01-02   0    0   0   0   0   1
            2011-01-05   1    0   0   0   0   0
            2011-01-05   1    0   0   0   0   0
            2011-01-12   0    0   0   1   0   0
            2011-01-12   0    0   1   0   0   0
            2011-01-12   1    0   0   0   0   0
            2011-01-12   0    1   0   0   0   0
            2011-01-19   0    1   0   0   0   0
            2011-01-19   1    0   0   0   0   0
            2011-01-19   0    0   0   0   1   0
            

            But the dates are repeated. We therefore group by the date index and aggregate, asking if the user visited on any of the entries for that date:

            visits.groupby(visits.index).any().astype(int)
            

            which gives:

                        a1  a15  a3  a4  a5  a8
            date                               
            2011-01-02   0    0   0   0   0   1
            2011-01-05   1    0   0   0   0   0
            2011-01-12   1    1   1   1   0   0
            2011-01-19   1    1   0   0   1   0
            
            qid & accept id: (31124914, 31125046) query: Python extract info from a local html file soup:

            Locate the element containing PATTERN: text, find the font parent and get the next font sibling element:

            \n
            soup = BeautifulSoup(data)\n\nfor elm in soup.find_all(text="PATTERN:"):\n    print elm.find_parent("font").find_next_sibling("font").get_text(strip=True)\n
            \n

            Demo:

            \n
            >>> from bs4 import BeautifulSoup\n>>>\n>>> data = """\n... \n... \n...  Cluster Support= [Pattern=\n...  50\n...  % : Variation=\n...  20\n...  %]; Database Support= [Min= \n...  1\n...   seq: Max=\n...  50\n...  %]\n... 
            \n... cluster=0 size=3\n... =<100:\n... 85:\n... 70:\n... 50:\n... 35:\n... 20>\n...
            \n... PATTERN:\n... {1,} {2,3,} {4,5,}\n... \n... =\n... 5\n... \n...
            \n... {\n... 1\n... 12\n... }\n... {\n... 24\n... }\n... {\n... 2\n... 3\n... 25\n... }\n... {\n... 1\n... 4\n... 5\n... }\n... {\n... 26\n... }\n...
            \n... PATTERN:\n... {9,10,} {11,} {12,13,}\n... \n... =\n... 5\n... \n...
            \n... {\n... 9\n... 10\n... }\n... {\n... 11\n... }\n... {\n... 11\n... }\n... {\n... 12\n... 13\n... }\n...
            \n... TOTAL LEN=\n... 10\n...
            \n...
            \n...
            \n... """\n>>> \n>>> soup = BeautifulSoup(data)\n>>> \n>>> for elm in soup.find_all(text="PATTERN:"):\n... print elm.find_parent("font").find_next_sibling("font").get_text(strip=True)\n... \n{1,} {2,3,} {4,5,}\n{9,10,} {11,} {12,13,}\n
            \n

            Note that, since I have lxml installed, BeautifulSoup uses it as an underlying parser. I've tried with html.parser also and it worked for me. html5lib does not work as the previous two. Anyway, specify the parser explicitly:

            \n
            soup = BeautifulSoup(data, "lxml")\n
            \n

            or:

            \n
            soup = BeautifulSoup(data, "html.parser")\n
            \n soup wrap:

            Locate the element containing PATTERN: text, find the font parent and get the next font sibling element:

            soup = BeautifulSoup(data)
            
            for elm in soup.find_all(text="PATTERN:"):
                print elm.find_parent("font").find_next_sibling("font").get_text(strip=True)
            

            Demo:

            >>> from bs4 import BeautifulSoup
            >>>
            >>> data = """
            ... 
            ... 
            ...  Cluster Support= [Pattern=
            ...  50
            ...  % : Variation=
            ...  20
            ...  %]; Database Support= [Min= 
            ...  1
            ...   seq: Max=
            ...  50
            ...  %]
            ... 
            ... cluster=0 size=3 ... =<100: ... 85: ... 70: ... 50: ... 35: ... 20> ...
            ... PATTERN: ... {1,} {2,3,} {4,5,} ... ... = ... 5 ... ...
            ... { ... 1 ... 12 ... } ... { ... 24 ... } ... { ... 2 ... 3 ... 25 ... } ... { ... 1 ... 4 ... 5 ... } ... { ... 26 ... } ...
            ... PATTERN: ... {9,10,} {11,} {12,13,} ... ... = ... 5 ... ...
            ... { ... 9 ... 10 ... } ... { ... 11 ... } ... { ... 11 ... } ... { ... 12 ... 13 ... } ...
            ... TOTAL LEN= ... 10 ...
            ...
            ...
            ... """ >>> >>> soup = BeautifulSoup(data) >>> >>> for elm in soup.find_all(text="PATTERN:"): ... print elm.find_parent("font").find_next_sibling("font").get_text(strip=True) ... {1,} {2,3,} {4,5,} {9,10,} {11,} {12,13,}

            Note that, since I have lxml installed, BeautifulSoup uses it as an underlying parser. I've tried with html.parser also and it worked for me. html5lib does not work as the previous two. Anyway, specify the parser explicitly:

            soup = BeautifulSoup(data, "lxml")
            

            or:

            soup = BeautifulSoup(data, "html.parser")
            
            qid & accept id: (31137183, 31460375) query: GMail API - Get last message of a thread soup:

            The Gmail API now supports the field internalDate.

            \n
            \n

            internalDate - The internal message creation timestamp (epoch ms),\n which determines ordering in the inbox.

            \n
            \n

            Getting the latest message in a thread is no harder than a User.thread: get-request, asking for the id and internalDate of the individual messages, and figuring out which was created last.

            \n
            fields = messages(id,internalDate)\n\nGET https://www.googleapis.com/gmail/v1/users/me/threads/14e92e929dcc2df2?fields=messages(id%2CinternalDate)&access_token={YOUR_API_KEY}\n
            \n

            Response:

            \n
            {\n "messages": [\n  {\n   "id": "14e92e929dcc2df2",\n   "internalDate": "1436983830000" \n  },\n  {\n   "id": "14e92e94a2645355",\n   "internalDate": "1436983839000"\n  },\n  {\n   "id": "14e92e95cfa0651d",\n   "internalDate": "1436983844000"\n  },\n  {\n   "id": "14e92e9934505214",\n   "internalDate": "1436983857000" // <-- This is it!\n  }\n ]\n}\n
            \n soup wrap:

            The Gmail API now supports the field internalDate.

            internalDate - The internal message creation timestamp (epoch ms), which determines ordering in the inbox.

            Getting the latest message in a thread is no harder than a User.thread: get-request, asking for the id and internalDate of the individual messages, and figuring out which was created last.

            fields = messages(id,internalDate)
            
            GET https://www.googleapis.com/gmail/v1/users/me/threads/14e92e929dcc2df2?fields=messages(id%2CinternalDate)&access_token={YOUR_API_KEY}
            

            Response:

            {
             "messages": [
              {
               "id": "14e92e929dcc2df2",
               "internalDate": "1436983830000" 
              },
              {
               "id": "14e92e94a2645355",
               "internalDate": "1436983839000"
              },
              {
               "id": "14e92e95cfa0651d",
               "internalDate": "1436983844000"
              },
              {
               "id": "14e92e9934505214",
               "internalDate": "1436983857000" // <-- This is it!
              }
             ]
            }
            
            qid & accept id: (31137766, 31140614) query: Add fields and correct indentation for json file (using python or ruby) soup:

            I think you can write up a template file specifying how the records should look like, with field names and empty strings as their values.

            \n

            Considering the structure you want the data to be in, the format file will look like :

            \n
            {\n  "id":"",\n  "name":"", \n  "phone":"",\n  "email":"", \n  "website":"", \n  "location": {\n    "latitude":"", \n    "longitude":"", \n    "address": {\n      "line1":"", \n      "line2":"", \n      "line3":"", \n      "postcode":"",\n      "city":"", \n      "country":""\n     }\n  }\n} \n
            \n

            And then utilize it in the code like this :

            \n
            require 'json'\n\nformat = JSON.parse File.read('format.json')\nrecords = JSON.parse File.read('input.json')\n\ndef convert(record, format)\n  ret = {}\n  format.each do |key, value|\n    ret[key] = record[key] ? record[key] : convert(record, format[key])\n  end\n  ret\nend\n\nrecords.map! {|record| convert(record, format) }\n\nFile.open('output.json', 'w') do |file|\n  file << JSON.generate(records)\nend\n
            \n

            It will convert to the format given in the format file. This solution works for any kind of formats, if it's all about just grouping original fields under a new field or fields. You can simply change the format to another in the format file and the data will be converted to that format without any change to the code.

            \n

            UPDATE

            \n

            Here is the code to convert the data back to the regular CSV list :

            \n
            data = {\n  :id => 1,\n    :location => {\n      :address => {\n        :line1 => 'line1'\n      }\n    },\n  :website => 'site'\n}\n\ndef deconvert(record)\n  ret = {}\n  record.each do |key, value|\n    if value.is_a? Hash\n      ret.merge!( deconvert(value) )\n    else\n       ret.merge!(key => value)\n    end\n  end\n  ret\nend\n\nputs deconvert data\n# => {:id=>1, :line1=>"line1", :website=>"site"} \n
            \n soup wrap:

            I think you can write up a template file specifying how the records should look like, with field names and empty strings as their values.

            Considering the structure you want the data to be in, the format file will look like :

            {
              "id":"",
              "name":"", 
              "phone":"",
              "email":"", 
              "website":"", 
              "location": {
                "latitude":"", 
                "longitude":"", 
                "address": {
                  "line1":"", 
                  "line2":"", 
                  "line3":"", 
                  "postcode":"",
                  "city":"", 
                  "country":""
                 }
              }
            } 
            

            And then utilize it in the code like this :

            require 'json'
            
            format = JSON.parse File.read('format.json')
            records = JSON.parse File.read('input.json')
            
            def convert(record, format)
              ret = {}
              format.each do |key, value|
                ret[key] = record[key] ? record[key] : convert(record, format[key])
              end
              ret
            end
            
            records.map! {|record| convert(record, format) }
            
            File.open('output.json', 'w') do |file|
              file << JSON.generate(records)
            end
            

            It will convert to the format given in the format file. This solution works for any kind of formats, if it's all about just grouping original fields under a new field or fields. You can simply change the format to another in the format file and the data will be converted to that format without any change to the code.

            UPDATE

            Here is the code to convert the data back to the regular CSV list :

            data = {
              :id => 1,
                :location => {
                  :address => {
                    :line1 => 'line1'
                  }
                },
              :website => 'site'
            }
            
            def deconvert(record)
              ret = {}
              record.each do |key, value|
                if value.is_a? Hash
                  ret.merge!( deconvert(value) )
                else
                   ret.merge!(key => value)
                end
              end
              ret
            end
            
            puts deconvert data
            # => {:id=>1, :line1=>"line1", :website=>"site"} 
            
            qid & accept id: (31146021, 31322383) query: Save app data in Weather App soup:

            I figured it out by myself. This answer is for future references.\nWhat I did was, saved all the data that I got from that API(which was already in JSON format) into a json file.

            \n

            To write in file weather.json

            \n
            import json\nfrom urllib import urlopen\n\nurl = urlopen('http://api.openweathermap.org/data/2.5/forecast/daily?q={}&mode=json&units={}'.format(getname,temp_type)).read()\n#where getname is the name of city.\n#and temp_type is either C(Celsius) or F(Fahrenheit)\nresult = json.loads(url)\nout_file = open("weather.json","w")\njson.dump(result,self.out_file, indent=4)\n#indent = 4, just to make it easy to read.\nout_file.close()\n
            \n

            And to read from file weather.json

            \n
            in_file = open("weather.json", "r")\nresult = json.load(self.in_file)\nin_file.close()\n
            \n

            And for the icons I used requests module and saved each icon with a unique name, then everytime the user did a new search or refresh the application then automatically the file would be updated and new icons would be downloaded and replaced with the existing ones.

            \n
            import requests\nconditions_image1 = "http://openweathermap.org/img/w/{}.png".format(result['list'][1]['weather'][0]['icon'])\n#or whatever be the name of your image\nresponse1 = requests.get(conditions_image1)\nif response1.status_code == 200:\n    f = open("./icons/wc1.png", 'wb')\n    f.write(response1.content)\n    f.close()\n
            \n

            And also as I am using kivy, So I would like to mention that you need to add json in buildozer.spec file (As you might have tried it in your PC first)

            \n
            source.include_exts = py,png,jpg,kv,atlas,json \n
            \n soup wrap:

            I figured it out by myself. This answer is for future references. What I did was, saved all the data that I got from that API(which was already in JSON format) into a json file.

            To write in file weather.json

            import json
            from urllib import urlopen
            
            url = urlopen('http://api.openweathermap.org/data/2.5/forecast/daily?q={}&mode=json&units={}'.format(getname,temp_type)).read()
            #where getname is the name of city.
            #and temp_type is either C(Celsius) or F(Fahrenheit)
            result = json.loads(url)
            out_file = open("weather.json","w")
            json.dump(result,self.out_file, indent=4)
            #indent = 4, just to make it easy to read.
            out_file.close()
            

            And to read from file weather.json

            in_file = open("weather.json", "r")
            result = json.load(self.in_file)
            in_file.close()
            

            And for the icons I used requests module and saved each icon with a unique name, then everytime the user did a new search or refresh the application then automatically the file would be updated and new icons would be downloaded and replaced with the existing ones.

            import requests
            conditions_image1 = "http://openweathermap.org/img/w/{}.png".format(result['list'][1]['weather'][0]['icon'])
            #or whatever be the name of your image
            response1 = requests.get(conditions_image1)
            if response1.status_code == 200:
                f = open("./icons/wc1.png", 'wb')
                f.write(response1.content)
                f.close()
            

            And also as I am using kivy, So I would like to mention that you need to add json in buildozer.spec file (As you might have tried it in your PC first)

            source.include_exts = py,png,jpg,kv,atlas,json 
            
            qid & accept id: (31149123, 31149188) query: Get the indicies of a dataframe to use on a list soup:

            You could express this as a groupby/agg operation:

            \n
            import pandas as pd\na = [['Lazy', 'Brown', 'Fox'], ['Jumps', 'Over'], ['Big', 'Blue', 'Sea']]\ndf = pd.DataFrame({'Name':list('ABC'), 'Group':[1,1,2]})\ndf['a'] = a\nprint(df.groupby(['Group'])['a'].sum())\n
            \n

            yields

            \n
            Group\n1    [Lazy, Brown, Fox, Jumps, Over]\n2                   [Big, Blue, Sea]\nName: a, dtype: object\n
            \n

            Aggregation by summing works because the sum of two lists is a concatenated list:

            \n
            In [322]: ['Lazy', 'Brown', 'Fox'] + ['Jumps', 'Over']\nOut[322]: ['Lazy', 'Brown', 'Fox', 'Jumps', 'Over']\n
            \n soup wrap:

            You could express this as a groupby/agg operation:

            import pandas as pd
            a = [['Lazy', 'Brown', 'Fox'], ['Jumps', 'Over'], ['Big', 'Blue', 'Sea']]
            df = pd.DataFrame({'Name':list('ABC'), 'Group':[1,1,2]})
            df['a'] = a
            print(df.groupby(['Group'])['a'].sum())
            

            yields

            Group
            1    [Lazy, Brown, Fox, Jumps, Over]
            2                   [Big, Blue, Sea]
            Name: a, dtype: object
            

            Aggregation by summing works because the sum of two lists is a concatenated list:

            In [322]: ['Lazy', 'Brown', 'Fox'] + ['Jumps', 'Over']
            Out[322]: ['Lazy', 'Brown', 'Fox', 'Jumps', 'Over']
            
            qid & accept id: (31162560, 31163085) query: Flask route rule as function args soup:

            I'm not sure if you can accept multiple args, the way you'd like to.

            \n

            One way to do this, is to define multiple routes.

            \n
            @app.route('/test/')\n@app.route('/test//')\n@app.route('/test///')\ndef test(command=None, arg1=None, arg2=None):\n    a = [arg1, arg2]\n    # Remove any args that are None\n    args = [arg for arg in a if arg is not None]\n    if command == "say":\n        return ' '.join(args)\n    else:\n        return "Unknown Command"\n
            \n

            http://127.0.0.1/test/say/hello/ should return hello

            \n

            http://127.0.0.1/test/say/hello/there should return hello there

            \n

            Another way to do this is to use path:

            \n
            @app.route('/test//')\ndef test(command, path):\n    args = path.split('/')\n    return " ".join(args)\n
            \n

            If you use this, then if you go to http://127.0.0.1/test/say/hello/there.

            \n

            Then path will be set to the value hello/there. This is why we split it.

            \n soup wrap:

            I'm not sure if you can accept multiple args, the way you'd like to.

            One way to do this, is to define multiple routes.

            @app.route('/test/')
            @app.route('/test//')
            @app.route('/test///')
            def test(command=None, arg1=None, arg2=None):
                a = [arg1, arg2]
                # Remove any args that are None
                args = [arg for arg in a if arg is not None]
                if command == "say":
                    return ' '.join(args)
                else:
                    return "Unknown Command"
            

            http://127.0.0.1/test/say/hello/ should return hello

            http://127.0.0.1/test/say/hello/there should return hello there

            Another way to do this is to use path:

            @app.route('/test//')
            def test(command, path):
                args = path.split('/')
                return " ".join(args)
            

            If you use this, then if you go to http://127.0.0.1/test/say/hello/there.

            Then path will be set to the value hello/there. This is why we split it.

            qid & accept id: (31193012, 31193146) query: Django Scheduled Deletion soup:

            Just create a celery task to delete the model. Use a post-save signal handler to trigger the celery deletion task (with a delay of 24 hours) for the model (when created is True).

            \n
            \n
            from celery import shared_task\n\n@shared_task\ndef delete_model(model_pk):\n    try:\n        MyModel.objects.get(pk=model_pk).delete()\n    except MyModel.DoesNotExist:\n        pass\n
            \n
            \n
            from django.dispatch import receiver\nfrom django.db.models.signals import post_save\nfrom datetime import datetime, timedelta\n\n@receiver(post_save, sender=MyModel)\ndef model_expiration(sender, instance, created, **kwargs):\n    if created:\n         delete_model.apply_async(\n            args=(instance.pk,), \n            eta=datetime.utcnow() + timedelta(hours=24)\n         )\n
            \n soup wrap:

            Just create a celery task to delete the model. Use a post-save signal handler to trigger the celery deletion task (with a delay of 24 hours) for the model (when created is True).


            from celery import shared_task
            
            @shared_task
            def delete_model(model_pk):
                try:
                    MyModel.objects.get(pk=model_pk).delete()
                except MyModel.DoesNotExist:
                    pass
            

            from django.dispatch import receiver
            from django.db.models.signals import post_save
            from datetime import datetime, timedelta
            
            @receiver(post_save, sender=MyModel)
            def model_expiration(sender, instance, created, **kwargs):
                if created:
                     delete_model.apply_async(
                        args=(instance.pk,), 
                        eta=datetime.utcnow() + timedelta(hours=24)
                     )
            
            qid & accept id: (31193239, 31193306) query: Printing row and columns in reverse soup:

            This is easily achieved with a zip.

            \n
            for row in zip(*contents):\n    print(row)\n
            \n

            This prints:

            \n
            (0, 0, 4, 0, 0, 4, 4, 0)\n(3, 3, 2, 1, 4, 6, 3, 0)\n(1, 4, 3, 2, 6, 9, 5, 0)\n(1, 5, 2, 4, 9, 11, 6, 0)\n(6, 11, 3, 0, 11, 14, 3, 0)\n(3, 14, 4, 0, 14, 18, 4, 0)\n(7, 21, 2, 0, 21, 23, 2, 3)\n(5, 26, 4, 0, 26, 30, 4, 3)\n(2, 28, 5, 2, 30, 35, 7, 0)\n(4, 32, 3, 3, 35, 38, 6, 0)\n(1, 33, 4, 5, 38, 42, 9, 0)\n
            \n soup wrap:

            This is easily achieved with a zip.

            for row in zip(*contents):
                print(row)
            

            This prints:

            (0, 0, 4, 0, 0, 4, 4, 0)
            (3, 3, 2, 1, 4, 6, 3, 0)
            (1, 4, 3, 2, 6, 9, 5, 0)
            (1, 5, 2, 4, 9, 11, 6, 0)
            (6, 11, 3, 0, 11, 14, 3, 0)
            (3, 14, 4, 0, 14, 18, 4, 0)
            (7, 21, 2, 0, 21, 23, 2, 3)
            (5, 26, 4, 0, 26, 30, 4, 3)
            (2, 28, 5, 2, 30, 35, 7, 0)
            (4, 32, 3, 3, 35, 38, 6, 0)
            (1, 33, 4, 5, 38, 42, 9, 0)
            
            qid & accept id: (31202918, 31203151) query: Django/jQuery: handling template inheritence and JS files loading soup:

            Since your script uses jQuery, you can simply use the $(document).ready() and $(window).load() functions of jQuery to bind a function on the event that DOM is ready and all window contents have been loaded, respectively.

            \n

            If you do not use jQuery, take a look at these relative questions to understand how to imitate the above behaviour with pure JS:

            \n\n

            EDIT 1: The inclusion order matters. You have to include the jQuery scripts before any scripts that require jQuery are executed.

            \n

            EDIT 2: You can organize your templates better by keeping the scripts separately from the main content, either with a second template:

            \n

            base.html

            \n
            \n\n\n...\n\n    {% include "content.html" %}\n    {% include "js.html" %}\n\n\n
            \n

            js.html

            \n
            \n\n\n
            \n

            (in this case you render base.html)

            \n

            Or with blocks (recommended):

            \n

            base.html

            \n
            \n\n\n...\n\n    {% block content %}{% endblock %}\n    {% block scripts %}{% endblock %}\n\n\n
            \n

            content.html

            \n
            {% extends 'base.html' %}\n{% block content %}\n    ...\n{% endblock %}\n{% block scripts %}\n    \n    \n    \n{% endblock %}    \n
            \n

            (in this case you render content.html)

            \n soup wrap:

            Since your script uses jQuery, you can simply use the $(document).ready() and $(window).load() functions of jQuery to bind a function on the event that DOM is ready and all window contents have been loaded, respectively.

            If you do not use jQuery, take a look at these relative questions to understand how to imitate the above behaviour with pure JS:

            EDIT 1: The inclusion order matters. You have to include the jQuery scripts before any scripts that require jQuery are executed.

            EDIT 2: You can organize your templates better by keeping the scripts separately from the main content, either with a second template:

            base.html

            
            
            
            ...
            
                {% include "content.html" %}
                {% include "js.html" %}
            
            
            

            js.html

            
            
            
            

            (in this case you render base.html)

            Or with blocks (recommended):

            base.html

            
            
            
            ...
            
                {% block content %}{% endblock %}
                {% block scripts %}{% endblock %}
            
            
            

            content.html

            {% extends 'base.html' %}
            {% block content %}
                ...
            {% endblock %}
            {% block scripts %}
                
                
                
            {% endblock %}    
            

            (in this case you render content.html)

            qid & accept id: (31230972, 31233151) query: Selenium Steam community market listings python soup:

            From what I understand, you are working with this page.

            \n

            To get the list of prices, iterate over results containing in the div elements with market_listing_row class and get the text of the elements with market_listing_their_price class:

            \n
            for result in driver.find_elements_by_css_selector("div.market_listing_row"):\n    price = result.find_element_by_css_selector("div.market_listing_their_price")\n    print price.text.strip()\n
            \n

            This would print price results like these:

            \n
            Starting at: $0.63\nStarting at: $0.27\n
            \n soup wrap:

            From what I understand, you are working with this page.

            To get the list of prices, iterate over results containing in the div elements with market_listing_row class and get the text of the elements with market_listing_their_price class:

            for result in driver.find_elements_by_css_selector("div.market_listing_row"):
                price = result.find_element_by_css_selector("div.market_listing_their_price")
                print price.text.strip()
            

            This would print price results like these:

            Starting at: $0.63
            Starting at: $0.27
            
            qid & accept id: (31247678, 31250371) query: Text file to csv with glob. Need to change delimiter depending on section of file being read soup:

            I cannot speak to the time efficiency of this method, but it might just get what you want done. The basic idea is to create a list to contain the lines of each text file, and then output the list to your new csv file. You save a 'delimiter' variable and then change it by checking each line as you go through the text files.

            \n

            For example:\nI created two text files on my Desktop. They read as follows:

            \n

            delimiter_test_1.txt

            \n
            \n

            test=delimiter=here

            \n

            does-it-work

            \n

            I'm:Not:Sure

            \n
            \n

            delimiter_test_2.txt

            \n
            \n

            This:File:Uses:Colons

            \n

            Pretty:Much:The:Whole:Time

            \n

            does-it-work

            \n

            If-Written-Correctly-yes

            \n
            \n

            I then ran this script on them:

            \n
            import csv\nimport glob\nimport os\n\ndirectory = raw_input("INPUT Folder for Log Dump Files:")\noutput = raw_input("OUTPUT Folder for .csv files:")\n\ntxt_files = os.path.join(directory, '*.txt')\n\ndelimiter = ':'\nfor txt_file in glob.glob(txt_files):\n    SavingList = []\n\n    with open(txt_file, 'r') as text:\n            for line in text:\n                if line == 'test=delimiter=here\n':\n                    delimiter = '='\n                elif line == 'does-it-work\n':\n                    delimiter = '-'\n                elif line == "I'm:Not:Sure":\n                    delimiter = ':'\n\n                SavingList.append(line.split(delimiter))\n\n    with open('%s.csv' %os.path.join(output, txt_file.split('.')[0]), 'wb') as output_file:\n            writer = csv.writer(output_file)\n            for m in xrange(len(SavingList)):\n                writer.writerow(SavingList[m])\n
            \n

            And got two csv files with the text split based on the desired delimiter. Depending on how many different lines you have for changing the delimiter you could set up a dictionary of said lines. Then your check becomes:

            \n
            if line in my_dictionary.keys():\n    delimiter = my_dictionary[line]\n
            \n

            for example.

            \n soup wrap:

            I cannot speak to the time efficiency of this method, but it might just get what you want done. The basic idea is to create a list to contain the lines of each text file, and then output the list to your new csv file. You save a 'delimiter' variable and then change it by checking each line as you go through the text files.

            For example: I created two text files on my Desktop. They read as follows:

            delimiter_test_1.txt

            test=delimiter=here

            does-it-work

            I'm:Not:Sure

            delimiter_test_2.txt

            This:File:Uses:Colons

            Pretty:Much:The:Whole:Time

            does-it-work

            If-Written-Correctly-yes

            I then ran this script on them:

            import csv
            import glob
            import os
            
            directory = raw_input("INPUT Folder for Log Dump Files:")
            output = raw_input("OUTPUT Folder for .csv files:")
            
            txt_files = os.path.join(directory, '*.txt')
            
            delimiter = ':'
            for txt_file in glob.glob(txt_files):
                SavingList = []
            
                with open(txt_file, 'r') as text:
                        for line in text:
                            if line == 'test=delimiter=here\n':
                                delimiter = '='
                            elif line == 'does-it-work\n':
                                delimiter = '-'
                            elif line == "I'm:Not:Sure":
                                delimiter = ':'
            
                            SavingList.append(line.split(delimiter))
            
                with open('%s.csv' %os.path.join(output, txt_file.split('.')[0]), 'wb') as output_file:
                        writer = csv.writer(output_file)
                        for m in xrange(len(SavingList)):
                            writer.writerow(SavingList[m])
            

            And got two csv files with the text split based on the desired delimiter. Depending on how many different lines you have for changing the delimiter you could set up a dictionary of said lines. Then your check becomes:

            if line in my_dictionary.keys():
                delimiter = my_dictionary[line]
            

            for example.

            qid & accept id: (31335460, 31335740) query: Get length of subset pandas DataFrame soup:

            You can do a groupby by the first column and then calculate the length of each group (using your example data, but with column names):

            \n
            In [8]: df = pd.DataFrame([['one', 2, 3],\n   ...:  ['one', 3, 4],\n   ...:  ['two', 4, 6]], columns=['A', 'B', 'C'])\n\nIn [10]: df.groupby('A')['B'].transform(lambda x: len(x))\nOut[10]:\n0    2\n1    2\n2    1\nName: B, dtype: int64\n
            \n

            Adding it to the dataframe:

            \n
            In [17]: df['len'] = df.groupby('A')['B'].transform(lambda x: len(x))\n\nIn [18]: df\nOut[18]:\n     A  B  C  len\n0  one  2  3    2\n1  one  3  4    2\n2  two  4  6    1\n
            \n soup wrap:

            You can do a groupby by the first column and then calculate the length of each group (using your example data, but with column names):

            In [8]: df = pd.DataFrame([['one', 2, 3],
               ...:  ['one', 3, 4],
               ...:  ['two', 4, 6]], columns=['A', 'B', 'C'])
            
            In [10]: df.groupby('A')['B'].transform(lambda x: len(x))
            Out[10]:
            0    2
            1    2
            2    1
            Name: B, dtype: int64
            

            Adding it to the dataframe:

            In [17]: df['len'] = df.groupby('A')['B'].transform(lambda x: len(x))
            
            In [18]: df
            Out[18]:
                 A  B  C  len
            0  one  2  3    2
            1  one  3  4    2
            2  two  4  6    1
            
            qid & accept id: (31347715, 31347891) query: How to use scipy to optimize the position of n points? soup:

            This seems to me like a straight forward case for scipy.optimize.

            \n

            If you function is differentiable (i.e. if you can define the Jacobian and/or Hessian matrices) I would recommend using a gradient-based method since they converge faster. Also it sounds like you have an unconstrained minimization problem unless you have constraints on valid values of each point (or some other constraints).

            \n

            So basically if you have your objective function

            \n
            def fitness(points):\n    # calculates fitness value\n
            \n

            Then you can do something like

            \n
            from scipy.optimize import minimize\n\nx0 = [] # fill with your initial guesses\nnew_points = minimize(fitness, x0, method='Nelder-Mead')  # or whatever algorithm\n
            \n

            Then new_points will be a list of the optimized points

            \n

            For completeness, the full signature of minimize is

            \n
            scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)\n
            \n

            You can see that the two arguments following method are jac and hess which is where you may pass functions that can calculate the Jacobian and Hessian of your objective function, respectively. As I mentioned in the comments, if you are unable to calculate these (due to not having an equation describing your objective function or the objective function being mathematically non-differentiable) you can use gradient-free algorithms to perform the optimization.

            \n soup wrap:

            This seems to me like a straight forward case for scipy.optimize.

            If you function is differentiable (i.e. if you can define the Jacobian and/or Hessian matrices) I would recommend using a gradient-based method since they converge faster. Also it sounds like you have an unconstrained minimization problem unless you have constraints on valid values of each point (or some other constraints).

            So basically if you have your objective function

            def fitness(points):
                # calculates fitness value
            

            Then you can do something like

            from scipy.optimize import minimize
            
            x0 = [] # fill with your initial guesses
            new_points = minimize(fitness, x0, method='Nelder-Mead')  # or whatever algorithm
            

            Then new_points will be a list of the optimized points

            For completeness, the full signature of minimize is

            scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)
            

            You can see that the two arguments following method are jac and hess which is where you may pass functions that can calculate the Jacobian and Hessian of your objective function, respectively. As I mentioned in the comments, if you are unable to calculate these (due to not having an equation describing your objective function or the objective function being mathematically non-differentiable) you can use gradient-free algorithms to perform the optimization.

            qid & accept id: (31349527, 31349563) query: Nested List of Lists to Single List of tuples soup:
            >>> exampleList = [['A', 'B', 'C', 'D'], [1, 2, 3, 4], [10, 20, 30, 40]]\n>>> list(zip(*exampleList))\n[('A', 1, 10), ('B', 2, 20), ('C', 3, 30), ('D', 4, 40)]\n
            \n

            Edit:

            \n

            If you want your output to be a list of lists, instead of a list of tuples,

            \n
            [list(i) for i in zip(*empampleList)]\n
            \n

            should do the trick

            \n soup wrap:
            >>> exampleList = [['A', 'B', 'C', 'D'], [1, 2, 3, 4], [10, 20, 30, 40]]
            >>> list(zip(*exampleList))
            [('A', 1, 10), ('B', 2, 20), ('C', 3, 30), ('D', 4, 40)]
            

            Edit:

            If you want your output to be a list of lists, instead of a list of tuples,

            [list(i) for i in zip(*empampleList)]
            

            should do the trick

            qid & accept id: (31349898, 31350526) query: How to print JSON with keys in numeric order (i.e. as if the string keys were integers) soup:

            You can use a dict comprehension trick:

            \n
            import json\n\nd = dict({'2':'two', '11':'eleven'})\njson.dumps({int(x):d[x] for x in d.keys()}, sort_keys=True)\n
            \n

            Output:

            \n
            '{"2": "two", "11": "eleven"}'\n
            \n soup wrap:

            You can use a dict comprehension trick:

            import json
            
            d = dict({'2':'two', '11':'eleven'})
            json.dumps({int(x):d[x] for x in d.keys()}, sort_keys=True)
            

            Output:

            '{"2": "two", "11": "eleven"}'
            
            qid & accept id: (31364390, 31364649) query: Python Sorting Regular Expression soup:

            I think what you need is:

            \n
            import re\n\nplayer_string = "player a 34 45 56 player b 38 93 75 playerc 39 29 18 playerd 38 98"\n\npattern = re.compile(r"([\w ]*?)\s+(\d+)\s+(\d+)\s+(\d+)")\nmatches = pattern.findall(player_string)\nd = {}\nfor m in matches :\n    print m\n    d[m[0].strip()] = m[1:]\n\nprint d\n
            \n

            After the last player "playerd" you only have 2 numbers, not 3 as regex expects.

            \n

            Output:

            \n
            {'playerc': ('39', '29', '18'), 'player b': ('38', '93', '75'), 'player a': ('34', '45', '56')}\n
            \n soup wrap:

            I think what you need is:

            import re
            
            player_string = "player a 34 45 56 player b 38 93 75 playerc 39 29 18 playerd 38 98"
            
            pattern = re.compile(r"([\w ]*?)\s+(\d+)\s+(\d+)\s+(\d+)")
            matches = pattern.findall(player_string)
            d = {}
            for m in matches :
                print m
                d[m[0].strip()] = m[1:]
            
            print d
            

            After the last player "playerd" you only have 2 numbers, not 3 as regex expects.

            Output:

            {'playerc': ('39', '29', '18'), 'player b': ('38', '93', '75'), 'player a': ('34', '45', '56')}
            
            qid & accept id: (31367608, 31367793) query: Get item with value from tuple in python soup:

            You can search for a particular tuple in the results list by iterating over the list and checking the value of the second item of each tuple (which is your key):

            \n
            results = [('object%d' % i, '111.111.5.%d' % i) for i in range(1,8)]\n\nkey = '111.111.5.4'\nresult = None\nfor t in results:\n    if t[1] == key:\n        result = t\n\nprint result\n
            \n

            Output:

            \n
            \n('object4', '111.111.5.4')\n
            \n

            This demonstrates accessing an item in a tuple with a zero-based index (1 in this case means the second element). Your code will be more readable if you unpack the tuples in the for loop:

            \n
            for obj, value in results:\n    if value == key:\n        result = (obj, value)\n
            \n

            Your results might be more generally useful if you convert them to a dictionary:

            \n
            >>> results_dict = {v:k for k,v in results}\n>>> print results_dict['111.111.5.6']\nobject6\n>>> print results_dict['111.111.5.1']\nobject1\n>>> print results_dict['blah']\nTraceback (most recent call last):\n  File "", line 1, in \nKeyError: 'blah'\n>>> print results_dict.get('111.111.5.5')\nobject5\n>>> print results_dict.get('123456')\nNone\n
            \n

            Using dict.get() is close to the syntax that you requested in your question.

            \n soup wrap:

            You can search for a particular tuple in the results list by iterating over the list and checking the value of the second item of each tuple (which is your key):

            results = [('object%d' % i, '111.111.5.%d' % i) for i in range(1,8)]
            
            key = '111.111.5.4'
            result = None
            for t in results:
                if t[1] == key:
                    result = t
            
            print result
            

            Output:

            ('object4', '111.111.5.4')
            

            This demonstrates accessing an item in a tuple with a zero-based index (1 in this case means the second element). Your code will be more readable if you unpack the tuples in the for loop:

            for obj, value in results:
                if value == key:
                    result = (obj, value)
            

            Your results might be more generally useful if you convert them to a dictionary:

            >>> results_dict = {v:k for k,v in results}
            >>> print results_dict['111.111.5.6']
            object6
            >>> print results_dict['111.111.5.1']
            object1
            >>> print results_dict['blah']
            Traceback (most recent call last):
              File "", line 1, in 
            KeyError: 'blah'
            >>> print results_dict.get('111.111.5.5')
            object5
            >>> print results_dict.get('123456')
            None
            

            Using dict.get() is close to the syntax that you requested in your question.

            qid & accept id: (31370701, 31372653) query: Joinable PriorityQueue in python's asyncio soup:

            Because of the way JoinableQueue and PriorityQueue are implemented, you can get a JoinablePriorityQueue by subclassing both via multiple inheritance, as long as you list JoinableQueue first.

            \n

            The reason this works is that PriorityQueue is very simply implemented:

            \n
            class PriorityQueue(Queue):\n    """A subclass of Queue; retrieves entries in priority order (lowest first).\n\n    Entries are typically tuples of the form: (priority number, data).\n    """\n\n    def _init(self, maxsize):\n        self._queue = []\n\n    def _put(self, item, heappush=heapq.heappush):\n        heappush(self._queue, item)\n\n    def _get(self, heappop=heapq.heappop):\n        return heappop(self._queue)\n
            \n

            While JoinableQueue is more complicated, the only method both it and PriorityQueue implement is _put, and crucially, JoinableQUeue calls super()._put(..) in its own put implementation, which means it will cooperate with PriorityQueue properly.

            \n

            Here's an example demonstrating that it works:

            \n
            from asyncio import PriorityQueue, JoinableQueue\nimport asyncio\nimport random\n\nclass JoinablePriorityQueue(JoinableQueue, PriorityQueue):\n    pass\n\n\n@asyncio.coroutine\ndef consume(q):\n    while True:\n        a = yield from q.get()\n        print("got a {}".format(a))\n        if a[1] is None:\n            q.task_done()\n            return\n        asyncio.sleep(1)\n        q.task_done()\n\n@asyncio.coroutine\ndef produce(q):\n    for i in range(10):\n        yield from q.put((random.randint(0,10), i))\n    yield from q.put((100, None)) # Will be last\n    asyncio.async(consume(q))\n    print("waiting...")\n    yield from q.join()\n    print("waited")\n\nloop = asyncio.get_event_loop()\nq = JoinablePriorityQueue()\nloop.run_until_complete(produce(q))\n
            \n

            Output:

            \n
            waiting...\ngot a (1, 2)\ngot a (2, 1)\ngot a (4, 4)\ngot a (5, 0)\ngot a (6, 8)\ngot a (6, 9)\ngot a (8, 3)\ngot a (9, 5)\ngot a (9, 7)\ngot a (10, 6)\ngot a (100, None)\nwaited\n
            \n soup wrap:

            Because of the way JoinableQueue and PriorityQueue are implemented, you can get a JoinablePriorityQueue by subclassing both via multiple inheritance, as long as you list JoinableQueue first.

            The reason this works is that PriorityQueue is very simply implemented:

            class PriorityQueue(Queue):
                """A subclass of Queue; retrieves entries in priority order (lowest first).
            
                Entries are typically tuples of the form: (priority number, data).
                """
            
                def _init(self, maxsize):
                    self._queue = []
            
                def _put(self, item, heappush=heapq.heappush):
                    heappush(self._queue, item)
            
                def _get(self, heappop=heapq.heappop):
                    return heappop(self._queue)
            

            While JoinableQueue is more complicated, the only method both it and PriorityQueue implement is _put, and crucially, JoinableQUeue calls super()._put(..) in its own put implementation, which means it will cooperate with PriorityQueue properly.

            Here's an example demonstrating that it works:

            from asyncio import PriorityQueue, JoinableQueue
            import asyncio
            import random
            
            class JoinablePriorityQueue(JoinableQueue, PriorityQueue):
                pass
            
            
            @asyncio.coroutine
            def consume(q):
                while True:
                    a = yield from q.get()
                    print("got a {}".format(a))
                    if a[1] is None:
                        q.task_done()
                        return
                    asyncio.sleep(1)
                    q.task_done()
            
            @asyncio.coroutine
            def produce(q):
                for i in range(10):
                    yield from q.put((random.randint(0,10), i))
                yield from q.put((100, None)) # Will be last
                asyncio.async(consume(q))
                print("waiting...")
                yield from q.join()
                print("waited")
            
            loop = asyncio.get_event_loop()
            q = JoinablePriorityQueue()
            loop.run_until_complete(produce(q))
            

            Output:

            waiting...
            got a (1, 2)
            got a (2, 1)
            got a (4, 4)
            got a (5, 0)
            got a (6, 8)
            got a (6, 9)
            got a (8, 3)
            got a (9, 5)
            got a (9, 7)
            got a (10, 6)
            got a (100, None)
            waited
            
            qid & accept id: (31387880, 31388005) query: Finding minimum and maximum value for each row, excluding NaN values soup:

            To ignore the NaN values use nanmin and the analagous nanmax:

            \n
            npnanmin(wind_speed, axis=0)\nnpnanmax(wind_speed, axis=0)\n
            \n

            This will ignore the NaN values as desired

            \n

            Example:

            \n
            In [93]:\nwind_speed = np.array([234,np.NaN,343, np.NaN])\nwind_speed\n\nOut[93]:\narray([ 234.,   nan,  343.,   nan])\n\nIn [94]:\nprint(np.nanmin(wind_speed, axis=0), np.nanmax(wind_speed, axis=0))\n234.0 343.0\n
            \n soup wrap:

            To ignore the NaN values use nanmin and the analagous nanmax:

            npnanmin(wind_speed, axis=0)
            npnanmax(wind_speed, axis=0)
            

            This will ignore the NaN values as desired

            Example:

            In [93]:
            wind_speed = np.array([234,np.NaN,343, np.NaN])
            wind_speed
            
            Out[93]:
            array([ 234.,   nan,  343.,   nan])
            
            In [94]:
            print(np.nanmin(wind_speed, axis=0), np.nanmax(wind_speed, axis=0))
            234.0 343.0
            
            qid & accept id: (31390194, 31390250) query: Iterate through each value of list in order, starting at random value soup:
            >>> start = randint(0, len(numbers))\n>>> start\n1\n
            \n

            You can use list slicing then iterate over that

            \n
            >>> numbers[start:] + numbers[:start]\n[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]\n
            \n

            You can also use the modulus % operator in a list comprehension

            \n
            >>> [numbers[i%len(numbers)] for i in range(start, start + len(numbers))]\n[1, 2, 3, 4, 5, 6, 7, 8, 9, 0]\n
            \n soup wrap:
            >>> start = randint(0, len(numbers))
            >>> start
            1
            

            You can use list slicing then iterate over that

            >>> numbers[start:] + numbers[:start]
            [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
            

            You can also use the modulus % operator in a list comprehension

            >>> [numbers[i%len(numbers)] for i in range(start, start + len(numbers))]
            [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
            
            qid & accept id: (31392017, 31392900) query: How to separate upper and lower case letters in a string soup:

            Are you looking to get two strings, one with all the uppercase letters and another with all the lowercase letters? Below is a function that will return two strings, the upper then the lowercase:

            \n
            def split_upper_lower(input):\n    upper = ''.join([x for x in input if x.isupper()])\n    lower = ''.join([x for x in input if x.islower()])\n\n    return upper, lower\n
            \n

            You can then call it with the following:

            \n
            upper, lower = split_upper_lower('AbBZxYp')\n
            \n

            which gives you two variables, upper and lower. Use them as necessary.

            \n soup wrap:

            Are you looking to get two strings, one with all the uppercase letters and another with all the lowercase letters? Below is a function that will return two strings, the upper then the lowercase:

            def split_upper_lower(input):
                upper = ''.join([x for x in input if x.isupper()])
                lower = ''.join([x for x in input if x.islower()])
            
                return upper, lower
            

            You can then call it with the following:

            upper, lower = split_upper_lower('AbBZxYp')
            

            which gives you two variables, upper and lower. Use them as necessary.

            qid & accept id: (31393692, 31399746) query: How do you @rpc _returns polymorphic types in spyne? soup:

            Deleting my previous answer as you apparently need polymorphism, not multiple return types.

            \n

            So, There are two ways of doing polymorphism in Spyne: The Python way and the Spyne way.

            \n

            Let:

            \n
            class A(ComplexModel):\n    i = Integer\n\nclass B(A):\n    s = Unicode\n\nclass C(A):\n    d = DateTime\n
            \n

            The Python way uses duck typing to return values.

            \n

            Let's define a generic class:

            \n
            class GenericA(ComplexModel):\n    i = Integer\n    s = Unicode\n    d = DateTime\n
            \n

            and use it as return value of our sample service:

            \n
            class SomeService(ServiceBase):\n    @rpc(Unicode(values=['A', 'B', 'C']), _returns=GenericA)\n    def get_some_a(self, type_name):\n        # (...)\n
            \n

            This way, you get your data, but it's tagged as a GenericA object. If you don't care about this, you can create a class that has all types from all objects (assuming attributes with same names have the same type) and just be done with it. This is easy, stable and works today.

            \n

            If that's not enough for your needs, you have to do the polymorphism the Spyne way. To do that, first set your return type to the base class:

            \n
            class SomeService(ServiceBase):\n    @rpc(Unicode(values=['A', 'B', 'C']), _returns=A)\n    def get_some_a(self, type_name):\n        # (...)\n
            \n

            and tag your output protocol to be polymorphic:

            \n
            application = Application([SomeService], 'tns',\n    in_protocol=Soap11(validator='lxml'),\n    out_protocol=Soap11(polymorphic=True)\n)\n
            \n

            This requires at least Spyne-2.12.

            \n

            Working example: https://github.com/arskom/spyne/blob/a1b3593f3754a9c8a6787c29ff50f591db89fd49/examples/xml/polymorphism.py

            \n soup wrap:

            Deleting my previous answer as you apparently need polymorphism, not multiple return types.

            So, There are two ways of doing polymorphism in Spyne: The Python way and the Spyne way.

            Let:

            class A(ComplexModel):
                i = Integer
            
            class B(A):
                s = Unicode
            
            class C(A):
                d = DateTime
            

            The Python way uses duck typing to return values.

            Let's define a generic class:

            class GenericA(ComplexModel):
                i = Integer
                s = Unicode
                d = DateTime
            

            and use it as return value of our sample service:

            class SomeService(ServiceBase):
                @rpc(Unicode(values=['A', 'B', 'C']), _returns=GenericA)
                def get_some_a(self, type_name):
                    # (...)
            

            This way, you get your data, but it's tagged as a GenericA object. If you don't care about this, you can create a class that has all types from all objects (assuming attributes with same names have the same type) and just be done with it. This is easy, stable and works today.

            If that's not enough for your needs, you have to do the polymorphism the Spyne way. To do that, first set your return type to the base class:

            class SomeService(ServiceBase):
                @rpc(Unicode(values=['A', 'B', 'C']), _returns=A)
                def get_some_a(self, type_name):
                    # (...)
            

            and tag your output protocol to be polymorphic:

            application = Application([SomeService], 'tns',
                in_protocol=Soap11(validator='lxml'),
                out_protocol=Soap11(polymorphic=True)
            )
            

            This requires at least Spyne-2.12.

            Working example: https://github.com/arskom/spyne/blob/a1b3593f3754a9c8a6787c29ff50f591db89fd49/examples/xml/polymorphism.py

            qid & accept id: (31394463, 31395025) query: Python (Maya) pass flags as variables soup:

            The standard way to pass flags to a maya command is to use python's built-in **args syntax:

            \n
            mesh_options = {'type':'mesh', 'long':True } \nmeshes = cmds.ls(**mesh_options)\n
            \n

            is equivalent to

            \n
            cmds.ls(long=True, type='mesh') \n
            \n

            In you case you want something like

            \n
            opts = {"ln":"AttrNameTextField","k":True}\nif typenum == 1:\n   opts["at"] = "enum"\n   opts["en"] = "Off:On"\nelse:\n   opts["at"] = "float"\n   opts["min"] = 0\n   opts["max"] = 1\ncmds.addAttr(selectedObject, **opts)\n
            \n soup wrap:

            The standard way to pass flags to a maya command is to use python's built-in **args syntax:

            mesh_options = {'type':'mesh', 'long':True } 
            meshes = cmds.ls(**mesh_options)
            

            is equivalent to

            cmds.ls(long=True, type='mesh') 
            

            In you case you want something like

            opts = {"ln":"AttrNameTextField","k":True}
            if typenum == 1:
               opts["at"] = "enum"
               opts["en"] = "Off:On"
            else:
               opts["at"] = "float"
               opts["min"] = 0
               opts["max"] = 1
            cmds.addAttr(selectedObject, **opts)
            
            qid & accept id: (31404238, 31404405) query: A list as a key for PySpark's reduceByKey soup:

            Try this:

            \n
            rdd.map(lambda (k, v): (tuple(k), v)).groupByKey()\n
            \n

            Since Python lists are mutable it means that cannot be hashed (don't provide __hash__ method):

            \n
            >>> a_list = [1, 2, 3]\n>>> a_list.__hash__ is None\nTrue\n>>> hash(a_list)\nTraceback (most recent call last):\n  File "", line 1, in \nTypeError: unhashable type: 'list'\n
            \n

            Tuples from the other hand are immutable and provide __hash__ method implementation:

            \n
            >>> a_tuple = (1, 2, 3)\n>>> a_tuple.__hash__ is None\nFalse\n>>> hash(a_tuple)\n2528502973977326415\n
            \n

            hence can be used as a key. Similarly if you want to use unique values as a key you should use frozenset:

            \n
            rdd.map(lambda (k, v): (frozenset(k), v)).groupByKey().collect()\n
            \n

            instead of set.

            \n
            # This will fail with TypeError: unhashable type: 'set'\nrdd.map(lambda (k, v): (set(k), v)).groupByKey().collect()\n
            \n soup wrap:

            Try this:

            rdd.map(lambda (k, v): (tuple(k), v)).groupByKey()
            

            Since Python lists are mutable it means that cannot be hashed (don't provide __hash__ method):

            >>> a_list = [1, 2, 3]
            >>> a_list.__hash__ is None
            True
            >>> hash(a_list)
            Traceback (most recent call last):
              File "", line 1, in 
            TypeError: unhashable type: 'list'
            

            Tuples from the other hand are immutable and provide __hash__ method implementation:

            >>> a_tuple = (1, 2, 3)
            >>> a_tuple.__hash__ is None
            False
            >>> hash(a_tuple)
            2528502973977326415
            

            hence can be used as a key. Similarly if you want to use unique values as a key you should use frozenset:

            rdd.map(lambda (k, v): (frozenset(k), v)).groupByKey().collect()
            

            instead of set.

            # This will fail with TypeError: unhashable type: 'set'
            rdd.map(lambda (k, v): (set(k), v)).groupByKey().collect()
            
            qid & accept id: (31404492, 31404733) query: Python: Fastest way of parsing first column of large table in array soup:

            A few suggestions:

            \n
              \n
            • Rather than creating a list that you then turn into a set, just work with a set directly:

              \n
              sam1_identifiers = set()\nfor line in reader1:\n    sam1_identifiers.add(line[0])\n
              \n

              This is probably more memory efficient, because you have a single set rather than a list and a set. That might make it a bit faster.

              \n

              Note also that I've changed the variable name – list is the name of a Python builtin function, so you shouldn't use it for your own variables.

            • \n
            • Since you want to find the identifiers which are only in sam1, rather than the nested if/for statements, just compare and throw away any identifiers found in sam2 that are in the set of IDs in sam1.

              \n
              sam2_identifiers = set()\nfor line in reader2:\n    sam2_identifiers.add(line[0])\n\nprint sam1 - sam2\n
              \n

              or even

              \n
              sam2_identifiers = set()\nfor line in reader2:\n    sam1_identifiers.discard(line[0])\n\nprint sam1_identifiers\n
              \n

              I suspect that's faster than the nested loops.

            • \n
            • Perhaps I've missed something, but don't you look through every column for each line of sam2? Isn't it sufficient just to look at line[0] for the identifier, as with sam1?

            • \n
            \n soup wrap:

            A few suggestions:

            • Rather than creating a list that you then turn into a set, just work with a set directly:

              sam1_identifiers = set()
              for line in reader1:
                  sam1_identifiers.add(line[0])
              

              This is probably more memory efficient, because you have a single set rather than a list and a set. That might make it a bit faster.

              Note also that I've changed the variable name – list is the name of a Python builtin function, so you shouldn't use it for your own variables.

            • Since you want to find the identifiers which are only in sam1, rather than the nested if/for statements, just compare and throw away any identifiers found in sam2 that are in the set of IDs in sam1.

              sam2_identifiers = set()
              for line in reader2:
                  sam2_identifiers.add(line[0])
              
              print sam1 - sam2
              

              or even

              sam2_identifiers = set()
              for line in reader2:
                  sam1_identifiers.discard(line[0])
              
              print sam1_identifiers
              

              I suspect that's faster than the nested loops.

            • Perhaps I've missed something, but don't you look through every column for each line of sam2? Isn't it sufficient just to look at line[0] for the identifier, as with sam1?

            qid & accept id: (31416465, 31427014) query: django filter to calculate hours within range soup:

            I can think of at least two approaches to your problem.

            \n

            A (rather convoluted) query:

            \n
            month_start = datetime(year, month, 1, 0, 0, 0, 0, tz);\nnext_month = (month % 12) + 1\nnext_month_start = datetime(year, next_month, 1, 0, 0, 0, 0, tz)\n\nmodels.InOut.objects.filter(\n    (\n        Q(in_dt__gte=month_start) and Q(in_dt__lt=next_month_start))\n        | (Q(out_dt__gte=month_start) and Q(out_dt__lt=next_month_start)\n    )\n ).annotate(\n     start_in_month=Func(F('in_dt'), month_start, function='MAX'),\n     end_in_month=Func(F('out_dt'), month_end, function='MIN')\n ).aggregate(worked=Sum(F('end_in_month') - F('start_in_month'))\n
            \n

            If using PostgreSQL you need to use

            \n
             .annotate(\n     start_in_month=Func(F('in_dt'), month_start, function='GREATEST'),\n     end_in_month=Func(F('out_dt'), month_end, function='LEAST')\n )\n
            \n

            since in PostgreSQL MAX() and MIN() are not defined for date types.

            \n

            Note also the aggregation does not work on SQLite because it does not have the appropriate data types (dates are stored as text).

            \n

            Preprocessing entries

            \n

            In your database, the InOut entries that span the month border are logically (not physically) two entries:

            \n
              \n
            1. One that starts at the designated time and ends at the month end
            2. \n
            3. One that starts at the end of the month and ends at the designated time
            4. \n
            \n

            Filtering out the affected InOut objects takes a little thinking, especially since F() objects cannot (currently) resolve parts of datetimes (e.g. in_dt__month).

            \n

            Something one the lines of

            \n
            # XXX - magic number of months\nfor month in range(1, 13):\n    for wraparound in models.InOut.objects.filter(\n        Q(in_dt__month=month) and ~Q(out_dt__month=month)\n    )\n        year = wraparound.in_dt.year\n        next_month = (month % 12) + 1\n        month_end = datetime(year, next_month, calendar.monthrange(year, month)[1], 23, 59, 59, 999999, tz)\n        next_month_start = datetime(year, next_month, 1, 0, 0, 0, 0, tz)\n\n        models.InOut.objects.bulk_create([\n            models.InOut(user=wraparound.user, in_dt=wraparound.in_dt, out_dt=month_end),\n            models.InOut(user=wraparound.user, in_dt=next_month_start, out_dt=wraparound.out_dt)\n        ])\n        wraparound.delete()\n
            \n

            could do the trick, however.

            \n

            Ideally, you don't do this afterwards but already when saving the time entry in your view. However this might confound users because they now get two entries instead of one when entering a wraparound work span.

            \n

            Caveat emptor: You might need to dicker around with next_month, next_month_start and __lt as well as __gte a bit, because this \nimplementation looses a microsecond at the end of each wraparound after expansion.

            \n

            And yes, it is a nice exercise ;-)

            \n soup wrap:

            I can think of at least two approaches to your problem.

            A (rather convoluted) query:

            month_start = datetime(year, month, 1, 0, 0, 0, 0, tz);
            next_month = (month % 12) + 1
            next_month_start = datetime(year, next_month, 1, 0, 0, 0, 0, tz)
            
            models.InOut.objects.filter(
                (
                    Q(in_dt__gte=month_start) and Q(in_dt__lt=next_month_start))
                    | (Q(out_dt__gte=month_start) and Q(out_dt__lt=next_month_start)
                )
             ).annotate(
                 start_in_month=Func(F('in_dt'), month_start, function='MAX'),
                 end_in_month=Func(F('out_dt'), month_end, function='MIN')
             ).aggregate(worked=Sum(F('end_in_month') - F('start_in_month'))
            

            If using PostgreSQL you need to use

             .annotate(
                 start_in_month=Func(F('in_dt'), month_start, function='GREATEST'),
                 end_in_month=Func(F('out_dt'), month_end, function='LEAST')
             )
            

            since in PostgreSQL MAX() and MIN() are not defined for date types.

            Note also the aggregation does not work on SQLite because it does not have the appropriate data types (dates are stored as text).

            Preprocessing entries

            In your database, the InOut entries that span the month border are logically (not physically) two entries:

            1. One that starts at the designated time and ends at the month end
            2. One that starts at the end of the month and ends at the designated time

            Filtering out the affected InOut objects takes a little thinking, especially since F() objects cannot (currently) resolve parts of datetimes (e.g. in_dt__month).

            Something one the lines of

            # XXX - magic number of months
            for month in range(1, 13):
                for wraparound in models.InOut.objects.filter(
                    Q(in_dt__month=month) and ~Q(out_dt__month=month)
                )
                    year = wraparound.in_dt.year
                    next_month = (month % 12) + 1
                    month_end = datetime(year, next_month, calendar.monthrange(year, month)[1], 23, 59, 59, 999999, tz)
                    next_month_start = datetime(year, next_month, 1, 0, 0, 0, 0, tz)
            
                    models.InOut.objects.bulk_create([
                        models.InOut(user=wraparound.user, in_dt=wraparound.in_dt, out_dt=month_end),
                        models.InOut(user=wraparound.user, in_dt=next_month_start, out_dt=wraparound.out_dt)
                    ])
                    wraparound.delete()
            

            could do the trick, however.

            Ideally, you don't do this afterwards but already when saving the time entry in your view. However this might confound users because they now get two entries instead of one when entering a wraparound work span.

            Caveat emptor: You might need to dicker around with next_month, next_month_start and __lt as well as __gte a bit, because this implementation looses a microsecond at the end of each wraparound after expansion.

            And yes, it is a nice exercise ;-)

            qid & accept id: (31419156, 31419288) query: Iterate through XML child of a child tags in Python soup:

            You have to get the children of the child and iterate through all of the grandchildren

            \n
            tree = ET.parse('command_details.xml')\nroot = tree.getroot()\n\nfor child in root:\n\n    if child.attrib['major'] == str(hex(int(major_bits[::-1], 2))) and child.attrib['minor'] == str(hex(int(minor_bits[::-1], 2))):\n        command_name = str(child.attrib['name'])    \n        for grandchild in child.getchildren():\n            print str(grandchild.attrib['bytes'])\n            print str(grandchild.attrib['descrip'])\n
            \n

            Or if you want to print the full XML line, you can do:

            \n
            print ET.tostring(grandchild).strip()\n
            \n soup wrap:

            You have to get the children of the child and iterate through all of the grandchildren

            tree = ET.parse('command_details.xml')
            root = tree.getroot()
            
            for child in root:
            
                if child.attrib['major'] == str(hex(int(major_bits[::-1], 2))) and child.attrib['minor'] == str(hex(int(minor_bits[::-1], 2))):
                    command_name = str(child.attrib['name'])    
                    for grandchild in child.getchildren():
                        print str(grandchild.attrib['bytes'])
                        print str(grandchild.attrib['descrip'])
            

            Or if you want to print the full XML line, you can do:

            print ET.tostring(grandchild).strip()
            
            qid & accept id: (31440326, 31440506) query: Parsing bits from a 128 byte block of hex in Python soup:

            Say that you have the starting byte # stored in start variable, and ending byte # stored in end variable, and then the hex string stored in string variable.

            \n

            Since every byte is two hexadecimal digits, you can simply do this to get the byte in hexadecimal string form:

            \n
            string[start*2:(end+1)*2]\n
            \n

            You need to do end+1 because it appears that your byte ranges are inclusive in your example, but Python slicing is exclusive on the end of the range. More on slicing if you're unfamiliar.

            \n

            To make this concrete for you, here is a minimal working example. You may have to do parsing and massaging to get your ranges to look like mine, but this is the idea:

            \n
            string = "100000000000000220000000000000003000000000000000" \\n         "000000000000000000000000000000000000000000000000" \\n         "000000000000000000000000000000000000000000000000" \\n         "000000000000000000000000000000000000000000000000" \\n         "000000000000000000000000000000000000000000000000" \\n         "0000000000000000"\n\nranges = ['0', '2-1', '3', '127-4']\n\nfor offset in ranges:\n    offset_list = offset.split('-')\n    if len(offset_list) == 1:\n        start = int(offset_list[0])\n        end = int(offset_list[0])\n    else:\n        start = int(offset_list[1])\n        end = int(offset_list[0])\n    the_bytes = string[start*2:(end+1)*2]\n    print('%d-%d: %s' % (start, end, the_bytes))\n
            \n

            Output:

            \n
            0-0: 10\n1-2: 0000\n3-3: 00\n4-127: 00000002200000000000000030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\n
            \n soup wrap:

            Say that you have the starting byte # stored in start variable, and ending byte # stored in end variable, and then the hex string stored in string variable.

            Since every byte is two hexadecimal digits, you can simply do this to get the byte in hexadecimal string form:

            string[start*2:(end+1)*2]
            

            You need to do end+1 because it appears that your byte ranges are inclusive in your example, but Python slicing is exclusive on the end of the range. More on slicing if you're unfamiliar.

            To make this concrete for you, here is a minimal working example. You may have to do parsing and massaging to get your ranges to look like mine, but this is the idea:

            string = "100000000000000220000000000000003000000000000000" \
                     "000000000000000000000000000000000000000000000000" \
                     "000000000000000000000000000000000000000000000000" \
                     "000000000000000000000000000000000000000000000000" \
                     "000000000000000000000000000000000000000000000000" \
                     "0000000000000000"
            
            ranges = ['0', '2-1', '3', '127-4']
            
            for offset in ranges:
                offset_list = offset.split('-')
                if len(offset_list) == 1:
                    start = int(offset_list[0])
                    end = int(offset_list[0])
                else:
                    start = int(offset_list[1])
                    end = int(offset_list[0])
                the_bytes = string[start*2:(end+1)*2]
                print('%d-%d: %s' % (start, end, the_bytes))
            

            Output:

            0-0: 10
            1-2: 0000
            3-3: 00
            4-127: 00000002200000000000000030000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
            
            qid & accept id: (31456902, 31457093) query: Jenkins and Python soup:

            I am using Jenkins daily and run Python scripts from Jenkins jobs. Here are some examples of how I call Python scripts with parameters:

            \n
            python ./local_lib/bin/regression.py -u daily_regression\npython ./local_lib/bin/run.py -t $Test_Name -b -c -no_compile -no_wlf\npython ./local_lib/bin/run.py -t $Test_Name -b -c -no_compile -no_wlf -args="$sim_args"\npython ./local_lib/bin/results.py  -r daily_regression -html  -o $WORK/results/daily_regression_results.html\n
            \n

            Is this what you want?

            \n

            You should use the "This build is parameterized" option from the Jenkins job configuration panel. I added a String Parameter with the name "Test_Name" which I then use in the "Execute Shell" text area like this $Test_Name

            \n

            And here is how to use it in the Python script. Let's take the Python script call:

            \n
            python ./local_lib/bin/run.py -t $Test_Name -b -c -no_compile -no_wlf\n
            \n

            The script run.py may be:

            \n
            import sys\n\nprint "Number of arguments", len(sys.argv)\nfor arg in sys.argv :\n    print arg\n\n# or\n\nprint 'second method'\n\nfor i in range(len(sys.argv)) :\n    print sys.argv[i]\n\n# take the test name given by the Jenkins parameter $Test_Name\ntestName = sys.argv[2] # arg 0 is script name, arg 1 == '-t', arg 2 == '', ...\nprint testName\n
            \n

            Output is:

            \n
            E:\>python run.py -t $Test_Name -b -c -no_compile -no_wlf\nNumber of arguments 7\nrun.py\n-t\n$Test_Name\n-b\n-c\n-no_compile\n-no_wlf\nsecond method\njenkins.py\n-t\n$Test_Name\n-b\n-c\n-no_compile\n-no_wlf\n$Test_Name\n
            \n

            I do not have access to Jenkins on the PC am I now, so in this case $Test_Name was printed as it is, but on Jenkins it would have been replaced by the name user provided in the textbox when started the Jenkins job.

            \n soup wrap:

            I am using Jenkins daily and run Python scripts from Jenkins jobs. Here are some examples of how I call Python scripts with parameters:

            python ./local_lib/bin/regression.py -u daily_regression
            python ./local_lib/bin/run.py -t $Test_Name -b -c -no_compile -no_wlf
            python ./local_lib/bin/run.py -t $Test_Name -b -c -no_compile -no_wlf -args="$sim_args"
            python ./local_lib/bin/results.py  -r daily_regression -html  -o $WORK/results/daily_regression_results.html
            

            Is this what you want?

            You should use the "This build is parameterized" option from the Jenkins job configuration panel. I added a String Parameter with the name "Test_Name" which I then use in the "Execute Shell" text area like this $Test_Name

            And here is how to use it in the Python script. Let's take the Python script call:

            python ./local_lib/bin/run.py -t $Test_Name -b -c -no_compile -no_wlf
            

            The script run.py may be:

            import sys
            
            print "Number of arguments", len(sys.argv)
            for arg in sys.argv :
                print arg
            
            # or
            
            print 'second method'
            
            for i in range(len(sys.argv)) :
                print sys.argv[i]
            
            # take the test name given by the Jenkins parameter $Test_Name
            testName = sys.argv[2] # arg 0 is script name, arg 1 == '-t', arg 2 == '', ...
            print testName
            

            Output is:

            E:\>python run.py -t $Test_Name -b -c -no_compile -no_wlf
            Number of arguments 7
            run.py
            -t
            $Test_Name
            -b
            -c
            -no_compile
            -no_wlf
            second method
            jenkins.py
            -t
            $Test_Name
            -b
            -c
            -no_compile
            -no_wlf
            $Test_Name
            

            I do not have access to Jenkins on the PC am I now, so in this case $Test_Name was printed as it is, but on Jenkins it would have been replaced by the name user provided in the textbox when started the Jenkins job.

            qid & accept id: (31462265, 31462327) query: Make a number more probable to result from random soup:

            That's a fitting name!

            \n

            Just do a little manipulation of the inputs. First set x to be in the range from 0 to 1.5.

            \n
            x = numpy.random.uniform(1.5)\n
            \n

            x has a 2/3 chance of being greater than 0.5 and 1/3 chance being smaller. Then if x is greater than 1.0, subtract .5 from it

            \n
            if x >= 1.0:\n    x = x - 0.5\n
            \n soup wrap:

            That's a fitting name!

            Just do a little manipulation of the inputs. First set x to be in the range from 0 to 1.5.

            x = numpy.random.uniform(1.5)
            

            x has a 2/3 chance of being greater than 0.5 and 1/3 chance being smaller. Then if x is greater than 1.0, subtract .5 from it

            if x >= 1.0:
                x = x - 0.5
            
            qid & accept id: (31485414, 31485453) query: matplotlib/python: how to have an area with no values soup:

            In general, to get matplotlib to not draw a value, set it to np.nan.\nThis is useful for drawing a line break due to a vertical asymptote, for example.

            \n

            However, if the part of a plot that you do wish to leave empty occurs entirely along a left or right margin, then all you really need to do is call ax.set_xlim to expand the region to be drawn:

            \n
            ax.set_xlim(min(x), max(x)+DT.timedelta(days=7))\n
            \n
            \n

            For example,

            \n
            import datetime as DT\nimport numpy as np\nimport matplotlib.pyplot as plt\nnp.random.seed(2015)\n\nx = [DT.date(2015,6,15)+DT.timedelta(days=i*7)\n     for i in range(5)]\ny = np.random.randint(25, size=(3,5)).astype(float)\n\nfig, ax = plt.subplots()\ncolor = ['blue', 'green', 'red']\nfor i in range(3):\n    ax.plot(x, y[i], '.-', markersize=8, fillstyle='full', linewidth=1.5, \n            clip_on=True, zorder=30)\n\n    ax.fill_between(x, 0, y[i], alpha=0.5, color=color[i], \n                    edgecolor="white", zorder=20)\nax.set_xlim(min(x), max(x)+DT.timedelta(days=7))\nplt.show()\n
            \n

            enter image description here

            \n soup wrap:

            In general, to get matplotlib to not draw a value, set it to np.nan. This is useful for drawing a line break due to a vertical asymptote, for example.

            However, if the part of a plot that you do wish to leave empty occurs entirely along a left or right margin, then all you really need to do is call ax.set_xlim to expand the region to be drawn:

            ax.set_xlim(min(x), max(x)+DT.timedelta(days=7))
            

            For example,

            import datetime as DT
            import numpy as np
            import matplotlib.pyplot as plt
            np.random.seed(2015)
            
            x = [DT.date(2015,6,15)+DT.timedelta(days=i*7)
                 for i in range(5)]
            y = np.random.randint(25, size=(3,5)).astype(float)
            
            fig, ax = plt.subplots()
            color = ['blue', 'green', 'red']
            for i in range(3):
                ax.plot(x, y[i], '.-', markersize=8, fillstyle='full', linewidth=1.5, 
                        clip_on=True, zorder=30)
            
                ax.fill_between(x, 0, y[i], alpha=0.5, color=color[i], 
                                edgecolor="white", zorder=20)
            ax.set_xlim(min(x), max(x)+DT.timedelta(days=7))
            plt.show()
            

            enter image description here

            qid & accept id: (31498516, 31498782) query: Insert nested value in mongodb using python soup:

            You seem to be talking about a "list" of "lists" that you want to transform into a "list" of "dict".

            \n

            So basically

            \n
            listOfList = [[1,2],[3,4],[5,6]]\nmap(lambda x: { "OrginalName": x[0], "ExportPath": x[1] }, listOfList )\n
            \n

            Produces:

            \n
            [         \n    {'ExportPath': 2, 'OrginalName': 1}, \n    {'ExportPath': 4, 'OrginalName': 3},\n    {'ExportPath': 6, 'OrginalName': 5}\n]\n
            \n

            So you can use that to construct your statement without trying to look as in:

            \n
            for message in mbox:\n    post = { \n       'From' : message['From'],\n       'To' : message['To'],\n       'Date' : message['Date'],\n       'Subject' : message['subject'],\n       'Body' : getbody(message)\n    }\n    stackf = getattachements(message)\n    if len(stackf) > 0:\n        mapped = map(lambda x: { "OrginalName": x[0], "ExportPath": x[1] }, stackf )\n        post['Attachement'] = mapped\n\n    collection.insert_one(post)\n
            \n

            Not sure what the "index" values meant to you other than looking up the current "Attachment" values. But this will insert a new document for every "messsage" and put all the "Attachments" in an array of that document matching your new dict structure.

            \n soup wrap:

            You seem to be talking about a "list" of "lists" that you want to transform into a "list" of "dict".

            So basically

            listOfList = [[1,2],[3,4],[5,6]]
            map(lambda x: { "OrginalName": x[0], "ExportPath": x[1] }, listOfList )
            

            Produces:

            [         
                {'ExportPath': 2, 'OrginalName': 1}, 
                {'ExportPath': 4, 'OrginalName': 3},
                {'ExportPath': 6, 'OrginalName': 5}
            ]
            

            So you can use that to construct your statement without trying to look as in:

            for message in mbox:
                post = { 
                   'From' : message['From'],
                   'To' : message['To'],
                   'Date' : message['Date'],
                   'Subject' : message['subject'],
                   'Body' : getbody(message)
                }
                stackf = getattachements(message)
                if len(stackf) > 0:
                    mapped = map(lambda x: { "OrginalName": x[0], "ExportPath": x[1] }, stackf )
                    post['Attachement'] = mapped
            
                collection.insert_one(post)
            

            Not sure what the "index" values meant to you other than looking up the current "Attachment" values. But this will insert a new document for every "messsage" and put all the "Attachments" in an array of that document matching your new dict structure.

            qid & accept id: (31498844, 31499170) query: How do i randomly select more than one item from a list in linux scripting? soup:

            If speed is not important, the following approach could be used in Python. The data must be stored in a CSV file and updated each time. I am assuming a simple tab delimited file as shown in the question:

            \n
            import random, collections, csv\n\ndef pick_non_zero(count):\n    ditems = collections.defaultdict(int)\n\n    # Read the current stock file in\n    with open("stock.csv", "r") as f_input:\n        csv_input = csv.reader(f_input, delimiter="\t")\n        headers = csv_input.next()\n\n        for item, quantity in csv_input:\n            ditems[item] += int(quantity)\n\n    lchoices = []\n\n    for n in range(count):\n        # Create a list of items with quantity remaining\n        lnon_zero = [item for item, quantity in ditems.items() if quantity > 0]\n\n        if len(lnon_zero) == 0:\n            lchoices.append("No more stock")\n            break\n\n        # Pick one\n        choice = random.choice(lnon_zero)\n        # Reduce quantity by 1\n        ditems[choice] -= 1\n        lchoices.append(choice)\n\n    # Write the updated stock back to the file\n    with open("stock.csv", "wb") as f_output:\n        csv_output = csv.writer(f_output, delimiter="\t")\n        csv_output.writerow(headers)\n\n        for item, quantity in ditems.items():\n            csv_output.writerow([item, quantity])\n\n    print "Stock left"\n\n    for item, quantity in ditems.items():\n        print "%-10s  %d" % (item, quantity)\n\n    return lchoices\n\nlpicked = pick_non_zero(6)\n\nprint\nprint "Picked:", lpicked\n
            \n

            Giving the following possible output:

            \n
            Stock left\nCOMH000     0\nCOMT000     2\nCOMT001     3\nCT100H000   0\nCOM#005     3\nCOM#004     2\nCOM#006     2\nCOME001     8\n\nPicked: ['CT100H000', 'COMH000', 'COME001', 'COME001', 'COMH000', 'COMT000']\n
            \n

            Updated to use a CSV file. Tested using Python 2.7.

            \n soup wrap:

            If speed is not important, the following approach could be used in Python. The data must be stored in a CSV file and updated each time. I am assuming a simple tab delimited file as shown in the question:

            import random, collections, csv
            
            def pick_non_zero(count):
                ditems = collections.defaultdict(int)
            
                # Read the current stock file in
                with open("stock.csv", "r") as f_input:
                    csv_input = csv.reader(f_input, delimiter="\t")
                    headers = csv_input.next()
            
                    for item, quantity in csv_input:
                        ditems[item] += int(quantity)
            
                lchoices = []
            
                for n in range(count):
                    # Create a list of items with quantity remaining
                    lnon_zero = [item for item, quantity in ditems.items() if quantity > 0]
            
                    if len(lnon_zero) == 0:
                        lchoices.append("No more stock")
                        break
            
                    # Pick one
                    choice = random.choice(lnon_zero)
                    # Reduce quantity by 1
                    ditems[choice] -= 1
                    lchoices.append(choice)
            
                # Write the updated stock back to the file
                with open("stock.csv", "wb") as f_output:
                    csv_output = csv.writer(f_output, delimiter="\t")
                    csv_output.writerow(headers)
            
                    for item, quantity in ditems.items():
                        csv_output.writerow([item, quantity])
            
                print "Stock left"
            
                for item, quantity in ditems.items():
                    print "%-10s  %d" % (item, quantity)
            
                return lchoices
            
            lpicked = pick_non_zero(6)
            
            print
            print "Picked:", lpicked
            

            Giving the following possible output:

            Stock left
            COMH000     0
            COMT000     2
            COMT001     3
            CT100H000   0
            COM#005     3
            COM#004     2
            COM#006     2
            COME001     8
            
            Picked: ['CT100H000', 'COMH000', 'COME001', 'COME001', 'COMH000', 'COMT000']
            

            Updated to use a CSV file. Tested using Python 2.7.

            qid & accept id: (31499985, 31500136) query: Effective regex for multiple strings with characters and numbers soup:

            A way to do it:

            \n
            regex = re.compile(r'\b(?=[0-9U])(?:[0-9]+\s*U\.?S\.?D|U\.?S\.?D\s*[0-9]+)\b', re.I)\n\nresult = [x.strip(' USD.usd') for x in regex.findall(yourstring)]\n
            \n

            pattern details:

            \n
            \b         # word boundary\n(?=[0-9U]) # only here to quickly discard word-boundaries not followed\n           # by a digit or the letter U without to test the two branches\n           # of the following alternation. You can remove it if you want.\n\n(?:\n    [0-9]+\s*U\.?S\.?D # USD after\n  |                    # OR\n    U\.?S\.?D\s*[0-9]+ # USD before\n)\n\b\n
            \n

            Note that spaces and dots are optional for the two branches.

            \n

            Then the "USD" part of the result is removed with a simple strip. it's more handy (and probably faster) than trying to exclude USD from the match result with lookarounds.

            \n soup wrap:

            A way to do it:

            regex = re.compile(r'\b(?=[0-9U])(?:[0-9]+\s*U\.?S\.?D|U\.?S\.?D\s*[0-9]+)\b', re.I)
            
            result = [x.strip(' USD.usd') for x in regex.findall(yourstring)]
            

            pattern details:

            \b         # word boundary
            (?=[0-9U]) # only here to quickly discard word-boundaries not followed
                       # by a digit or the letter U without to test the two branches
                       # of the following alternation. You can remove it if you want.
            
            (?:
                [0-9]+\s*U\.?S\.?D # USD after
              |                    # OR
                U\.?S\.?D\s*[0-9]+ # USD before
            )
            \b
            

            Note that spaces and dots are optional for the two branches.

            Then the "USD" part of the result is removed with a simple strip. it's more handy (and probably faster) than trying to exclude USD from the match result with lookarounds.

            qid & accept id: (31502786, 31502848) query: Python: put all function arguments into **kwargs automatically soup:

            In General

            \n

            Well, you can create kwargs as a dictionary of all the arguments that f2() accepts and pass it. Though I do not see any benefit from that, using -

            \n
            def f1(arg1, arg2, arg3):\n    f2(arg1=arg1, arg2=arg2, arg3=arg3)\n
            \n

            Looks fine to me , and would be easier than building the dictionary and calling it as **kwargs.

            \n

            Anyway the way to do it is -

            \n
            >>> def a(a,b,c):\n...     kwargs = {'a':a , 'b':b , 'c':c}\n...     d(**kwargs)\n...\n>>> def d(**kwargs):\n...     print(kwargs)\n...\n>>> a(1,2,3)\n{'c': 3, 'a': 1, 'b': 2}\n
            \n
            \n

            For your use case

            \n
            \n

            The problem is that f1 is going to be defined by the client, the processing of argument is common for all, so I want to hide the processing details, so that the client passes all the arguments to the implementation. Furthermore, I want to ease the definition and automatically pass all arguments and not specify them explicitly.

            \n
            \n

            locals() inside a function returns you the copy of the local variables in the function at that time as a dictionary. If as in your question if the definition of f1() and f2() are same you can use locals() , by calling it at the start of the function before any other code. Example -

            \n
            >>> def a(a,b,c):\n...     lcl = locals()\n...     print(lcl)\n...     d(**lcl)\n...     e = 123\n...     print(locals())\n...\n>>> def d(**kwargs):\n...     print(kwargs)\n...\n>>> a(1,2,3)\n{'c': 3, 'a': 1, 'b': 2}\n{'c': 3, 'a': 1, 'b': 2}\n{'c': 3, 'a': 1, 'e': 123, 'lcl': {...}, 'b': 2}\n
            \n soup wrap:

            In General

            Well, you can create kwargs as a dictionary of all the arguments that f2() accepts and pass it. Though I do not see any benefit from that, using -

            def f1(arg1, arg2, arg3):
                f2(arg1=arg1, arg2=arg2, arg3=arg3)
            

            Looks fine to me , and would be easier than building the dictionary and calling it as **kwargs.

            Anyway the way to do it is -

            >>> def a(a,b,c):
            ...     kwargs = {'a':a , 'b':b , 'c':c}
            ...     d(**kwargs)
            ...
            >>> def d(**kwargs):
            ...     print(kwargs)
            ...
            >>> a(1,2,3)
            {'c': 3, 'a': 1, 'b': 2}
            

            For your use case

            The problem is that f1 is going to be defined by the client, the processing of argument is common for all, so I want to hide the processing details, so that the client passes all the arguments to the implementation. Furthermore, I want to ease the definition and automatically pass all arguments and not specify them explicitly.

            locals() inside a function returns you the copy of the local variables in the function at that time as a dictionary. If as in your question if the definition of f1() and f2() are same you can use locals() , by calling it at the start of the function before any other code. Example -

            >>> def a(a,b,c):
            ...     lcl = locals()
            ...     print(lcl)
            ...     d(**lcl)
            ...     e = 123
            ...     print(locals())
            ...
            >>> def d(**kwargs):
            ...     print(kwargs)
            ...
            >>> a(1,2,3)
            {'c': 3, 'a': 1, 'b': 2}
            {'c': 3, 'a': 1, 'b': 2}
            {'c': 3, 'a': 1, 'e': 123, 'lcl': {...}, 'b': 2}
            
            qid & accept id: (31556782, 31557124) query: PANDAS: merging calculated data in groupby dataframe into main dataframe soup:
            df_container = []\nfor customer_id, group in grouped_data:\n    group['days_since'] = (group['date'] - group['date'].shift().fillna(pd.datetime(2000,1,1))).astype('timedelta64[D]')\n    df_container.append(group)\n\ndata_df = pd.concat(df_container)\n
            \n

            Maybe is this what you want?

            \n
              customer_id       date  invoice_amt no_days_since_last_purchase  days_since\n8        101A 2011-10-01       275.76                         NaN        4291\n4        101A 2011-12-09       124.76                          69          69\n1        101A 2012-02-01       234.45                          54          54\n0        101A 2012-03-21       654.76                          49          49\n9        102A 2011-09-21       532.21                         NaN        4281\n6        102A 2011-11-18       652.65                          58          58\n2        102A 2012-01-23        99.45                          66          66\n7        104B 2011-10-12       765.21                         NaN        4302\n5        104B 2011-11-27       346.87                          46          46\n3        104B 2011-12-18       767.63                          21          21\n
            \n soup wrap:
            df_container = []
            for customer_id, group in grouped_data:
                group['days_since'] = (group['date'] - group['date'].shift().fillna(pd.datetime(2000,1,1))).astype('timedelta64[D]')
                df_container.append(group)
            
            data_df = pd.concat(df_container)
            

            Maybe is this what you want?

              customer_id       date  invoice_amt no_days_since_last_purchase  days_since
            8        101A 2011-10-01       275.76                         NaN        4291
            4        101A 2011-12-09       124.76                          69          69
            1        101A 2012-02-01       234.45                          54          54
            0        101A 2012-03-21       654.76                          49          49
            9        102A 2011-09-21       532.21                         NaN        4281
            6        102A 2011-11-18       652.65                          58          58
            2        102A 2012-01-23        99.45                          66          66
            7        104B 2011-10-12       765.21                         NaN        4302
            5        104B 2011-11-27       346.87                          46          46
            3        104B 2011-12-18       767.63                          21          21
            
            qid & accept id: (31562534, 31563144) query: Scipy: Centroid of convex hull soup:

            Assuming you have constructed the convex hull using scipy.spatial.ConvexHull, the returned object should then have the positions of the points, so the centroid may be as simple as,

            \n
            import numpy as np\nfrom scipy.spatial import ConvexHull\n\npoints = np.random.rand(30, 2)   # 30 random points in 2-D\nhull = ConvexHull(points)\n\n#Get centoid\ncx = np.mean(hull.points[hull.vertices,0])\ncy = np.mean(hull.points[hull.vertices,1])\n
            \n

            Which you can plot as follows,

            \n
            import matplotlib.pyplot as plt\n#Plot convex hull\nfor simplex in hull.simplices:\n    plt.plot(points[simplex, 0], points[simplex, 1], 'k-')\n\n#Plot centroid\nplt.plot(cx, cy,'x',ms=20)\nplt.show()\n
            \n

            The scipy convex hull is based on Qhull which should have method centrum, from the Qhull docs,

            \n
            \n

            A centrum is a point on a facet's hyperplane. A centrum is the average of a facet's vertices. Neighboring facets are convex if each centrum is below the neighbor facet's hyperplane.

            \n
            \n

            where the centrum is the same as a centroid for simple facets,

            \n
            \n

            For simplicial facets with d vertices, the centrum is equivalent to the centroid or center of gravity.

            \n
            \n

            As scipy doesn't seem to provide this, you could define your own in a child class to hull,

            \n
            class CHull(ConvexHull):\n\n    def __init__(self, points):\n        ConvexHull.__init__(self, points)\n\n    def centrum(self):\n\n        c = []\n        for i in range(self.points.shape[1]):\n            c.append(np.mean(self.points[self.vertices,i]))\n\n        return c\n\n hull = CHull(points)\n c = hull.centrum()\n
            \n soup wrap:

            Assuming you have constructed the convex hull using scipy.spatial.ConvexHull, the returned object should then have the positions of the points, so the centroid may be as simple as,

            import numpy as np
            from scipy.spatial import ConvexHull
            
            points = np.random.rand(30, 2)   # 30 random points in 2-D
            hull = ConvexHull(points)
            
            #Get centoid
            cx = np.mean(hull.points[hull.vertices,0])
            cy = np.mean(hull.points[hull.vertices,1])
            

            Which you can plot as follows,

            import matplotlib.pyplot as plt
            #Plot convex hull
            for simplex in hull.simplices:
                plt.plot(points[simplex, 0], points[simplex, 1], 'k-')
            
            #Plot centroid
            plt.plot(cx, cy,'x',ms=20)
            plt.show()
            

            The scipy convex hull is based on Qhull which should have method centrum, from the Qhull docs,

            A centrum is a point on a facet's hyperplane. A centrum is the average of a facet's vertices. Neighboring facets are convex if each centrum is below the neighbor facet's hyperplane.

            where the centrum is the same as a centroid for simple facets,

            For simplicial facets with d vertices, the centrum is equivalent to the centroid or center of gravity.

            As scipy doesn't seem to provide this, you could define your own in a child class to hull,

            class CHull(ConvexHull):
            
                def __init__(self, points):
                    ConvexHull.__init__(self, points)
            
                def centrum(self):
            
                    c = []
                    for i in range(self.points.shape[1]):
                        c.append(np.mean(self.points[self.vertices,i]))
            
                    return c
            
             hull = CHull(points)
             c = hull.centrum()
            
            qid & accept id: (31562641, 31562691) query: Python: sort lists in dictonary of lists, where one list is a key to sorting soup:
            from operator import itemgetter\nsrt_key = [i for i, e in sorted(enumerate(d["d"]), key=itemgetter(1))]\n\nnew_d = {}\n\nfor k,v in d.items():\n    new_d[k] = list(itemgetter(*srt_key)(v))\n\nprint(new_d)\n{'c': ['a', '9', 'g', 'b'], 'a': [4, 7, 1, 6], 'b': [9, 8, 9, 9], 'd': [1, 2, 5, 10]}\n
            \n

            Or using a dict comp:

            \n
            new_d = {k: list(itemgetter(*srt_key)(v)) for k,v in d.items()}\n\nprint(new_d)\n
            \n soup wrap:
            from operator import itemgetter
            srt_key = [i for i, e in sorted(enumerate(d["d"]), key=itemgetter(1))]
            
            new_d = {}
            
            for k,v in d.items():
                new_d[k] = list(itemgetter(*srt_key)(v))
            
            print(new_d)
            {'c': ['a', '9', 'g', 'b'], 'a': [4, 7, 1, 6], 'b': [9, 8, 9, 9], 'd': [1, 2, 5, 10]}
            

            Or using a dict comp:

            new_d = {k: list(itemgetter(*srt_key)(v)) for k,v in d.items()}
            
            print(new_d)
            
            qid & accept id: (31567882, 31579548) query: Numpy Compare unequal rows and make both array of same dimension soup:

            You can compare your the elements of 3rd column using zip and np.equal within a list comprehension then convert the result to a numpy array and get the desire rows from array b.

            \n
            >>> b[np.array([np.equal(*I) for I in zip(a[:,3],b[:,3])])]\narray([[41641,  1428,     0,  2554],\n       [44075,  1428,     0,  2555],\n       [44901,  1428,     1,  2556],\n       [45377,  1428,     0,  2557]])\n
            \n

            If the order is not important for you you can use np.in1d :

            \n
            >>> b[np.in1d(b[:,3],a[:,3])]\narray([[41641,  1428,     0,  2554],\n       [44075,  1428,     0,  2555],\n       [44901,  1428,     1,  2556],\n       [45377,  1428,     0,  2557]])\n\n>>> a=np.array([[100, 1], [101, 4], [106, 6], [104, 10]])\n>>> b= np.array([[ 1, 1], [ 2, 2], [ 3, 3], [ 4, 4], [ 5, 5], [ 6, 6], [ 7, 7], [ 8, 8], [ 9, 9], [10, 10]])\n>>> \n>>> b[np.in1d(b[:,1],a[:,1])]\narray([[ 1,  1],\n       [ 4,  4],\n       [ 6,  6],\n       [10, 10]])\n
            \n soup wrap:

            You can compare your the elements of 3rd column using zip and np.equal within a list comprehension then convert the result to a numpy array and get the desire rows from array b.

            >>> b[np.array([np.equal(*I) for I in zip(a[:,3],b[:,3])])]
            array([[41641,  1428,     0,  2554],
                   [44075,  1428,     0,  2555],
                   [44901,  1428,     1,  2556],
                   [45377,  1428,     0,  2557]])
            

            If the order is not important for you you can use np.in1d :

            >>> b[np.in1d(b[:,3],a[:,3])]
            array([[41641,  1428,     0,  2554],
                   [44075,  1428,     0,  2555],
                   [44901,  1428,     1,  2556],
                   [45377,  1428,     0,  2557]])
            
            >>> a=np.array([[100, 1], [101, 4], [106, 6], [104, 10]])
            >>> b= np.array([[ 1, 1], [ 2, 2], [ 3, 3], [ 4, 4], [ 5, 5], [ 6, 6], [ 7, 7], [ 8, 8], [ 9, 9], [10, 10]])
            >>> 
            >>> b[np.in1d(b[:,1],a[:,1])]
            array([[ 1,  1],
                   [ 4,  4],
                   [ 6,  6],
                   [10, 10]])
            
            qid & accept id: (31572425, 31573018) query: List all RGBA values of an image with PIL soup:

            imgobj.getdata() will give you a sequence of (red, green, blue, alpha) tuples pretty quickly. (Docs here.)

            \n

            Not sure what your use case is here, since it looks like you're just making one big flat array with all your reds, greens, blues and alphas jumbled together, but I suppose you could do something like this to get the same result:

            \n
            imgobj = Image.open('x.png')\npixels = imgobj.convert('RGBA')\ndata = imgobj.getdata()\nlofpixels = []\nfor pixel in data:\n    lofpixels.extend(pixel)\n
            \n

            You also could get a count of each unique pixel value with collections.Counter (docs here), for example:

            \n
            imgobj = Image.open('x.png')\npixels = imgobj.convert('RGBA')\ndata = imgobj.getdata()\ncounts = collections.Counter(data)\nprint(counts[(0, 0, 0, 255)])  # or some other value\n
            \n soup wrap:

            imgobj.getdata() will give you a sequence of (red, green, blue, alpha) tuples pretty quickly. (Docs here.)

            Not sure what your use case is here, since it looks like you're just making one big flat array with all your reds, greens, blues and alphas jumbled together, but I suppose you could do something like this to get the same result:

            imgobj = Image.open('x.png')
            pixels = imgobj.convert('RGBA')
            data = imgobj.getdata()
            lofpixels = []
            for pixel in data:
                lofpixels.extend(pixel)
            

            You also could get a count of each unique pixel value with collections.Counter (docs here), for example:

            imgobj = Image.open('x.png')
            pixels = imgobj.convert('RGBA')
            data = imgobj.getdata()
            counts = collections.Counter(data)
            print(counts[(0, 0, 0, 255)])  # or some other value
            
            qid & accept id: (31611288, 31680447) query: Insert into a large table in psycopg using a dictionary soup:

            This is the test table:

            \n
            create table testins (foo int, bar int, baz int)\n
            \n

            You can compose a sql statement this way:

            \n
            d = dict(foo=10,bar=20,baz=30)\n\ncur.execute(\n    "insert into testins (%s) values (%s)" \n        % (','.join(d), ','.join('%%(%s)s' % k for k in d)),\n    d)\n
            \n soup wrap:

            This is the test table:

            create table testins (foo int, bar int, baz int)
            

            You can compose a sql statement this way:

            d = dict(foo=10,bar=20,baz=30)
            
            cur.execute(
                "insert into testins (%s) values (%s)" 
                    % (','.join(d), ','.join('%%(%s)s' % k for k in d)),
                d)
            
            qid & accept id: (31617530, 31621911) query: Multiclass linear SVM in python that return probability soup:

            LinearSVC does not support probability estimates because it is based on liblinear but liblinear supports probability estimates for logistic regression only.

            \n

            If you just need confidence scores, but these do not have to be probabilities, you can use decision_function instead.

            \n

            If it it not required to choose penalties and loss functions of linear SVM, you can also use SVC by setting kernel to be 'linear', then you can have predict_proba.

            \n

            Update #1:

            \n

            You can use SVC with OneVsRestClassifier to support one-vs-rest scheme, for example

            \n
            from sklearn import datasets\nfrom sklearn.multiclass import OneVsRestClassifier\nfrom sklearn.svm import SVC\niris = datasets.load_iris()\nX, y = iris.data, iris.target\nclf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))\nclf.fit(X, y)\nproba = clf.predict_proba(X)\n
            \n

            Update #2:

            \n

            There is another way to estimate probabilities with LinearSVC as classifier.

            \n
            from sklearn.svm import LinearSVC\nfrom sklearn.calibration import CalibratedClassifierCV\nfrom sklearn.datasets import load_iris\n\niris = load_iris()\nX = iris.data\nY = iris.target\nsvc = LinearSVC()\nclf = CalibratedClassifierCV(svc, cv=10)\nclf.fit(X, Y)\nproba = clf.predict_proba(X)\n
            \n

            However for the other question (Making SVM run faster in python), this solution is not likely to enhance performance either as it involves additional cross-validation and does not support parallelization.

            \n

            Update #3:

            \n

            For the second solution, because LinearSVC does not support multilabel classification, so you have to wrap it in OneVsRestClassifier, here is an example:

            \n
            from sklearn.svm import LinearSVC\nfrom sklearn.calibration import CalibratedClassifierCV\nfrom sklearn.multiclass import OneVsRestClassifier\nfrom sklearn.datasets import make_multilabel_classification\n\nX, Y = make_multilabel_classification(n_classes=2, n_labels=1,\n                                      allow_unlabeled=True,\n                                      return_indicator=True,\n                                      random_state=1)\nclf0 = CalibratedClassifierCV(LinearSVC(), cv=10)\nclf = OneVsRestClassifier(clf0)\nclf.fit(X, Y)\nproba = clf.predict_proba(X)\n
            \n soup wrap:

            LinearSVC does not support probability estimates because it is based on liblinear but liblinear supports probability estimates for logistic regression only.

            If you just need confidence scores, but these do not have to be probabilities, you can use decision_function instead.

            If it it not required to choose penalties and loss functions of linear SVM, you can also use SVC by setting kernel to be 'linear', then you can have predict_proba.

            Update #1:

            You can use SVC with OneVsRestClassifier to support one-vs-rest scheme, for example

            from sklearn import datasets
            from sklearn.multiclass import OneVsRestClassifier
            from sklearn.svm import SVC
            iris = datasets.load_iris()
            X, y = iris.data, iris.target
            clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))
            clf.fit(X, y)
            proba = clf.predict_proba(X)
            

            Update #2:

            There is another way to estimate probabilities with LinearSVC as classifier.

            from sklearn.svm import LinearSVC
            from sklearn.calibration import CalibratedClassifierCV
            from sklearn.datasets import load_iris
            
            iris = load_iris()
            X = iris.data
            Y = iris.target
            svc = LinearSVC()
            clf = CalibratedClassifierCV(svc, cv=10)
            clf.fit(X, Y)
            proba = clf.predict_proba(X)
            

            However for the other question (Making SVM run faster in python), this solution is not likely to enhance performance either as it involves additional cross-validation and does not support parallelization.

            Update #3:

            For the second solution, because LinearSVC does not support multilabel classification, so you have to wrap it in OneVsRestClassifier, here is an example:

            from sklearn.svm import LinearSVC
            from sklearn.calibration import CalibratedClassifierCV
            from sklearn.multiclass import OneVsRestClassifier
            from sklearn.datasets import make_multilabel_classification
            
            X, Y = make_multilabel_classification(n_classes=2, n_labels=1,
                                                  allow_unlabeled=True,
                                                  return_indicator=True,
                                                  random_state=1)
            clf0 = CalibratedClassifierCV(LinearSVC(), cv=10)
            clf = OneVsRestClassifier(clf0)
            clf.fit(X, Y)
            proba = clf.predict_proba(X)
            
            qid & accept id: (31619128, 31619360) query: Finding a parent key from a dict given an intermediate key using python soup:

            This should work:

            \n
            def find_parent_keys(d, target_key, parent_key=None):\n  for k, v in d.items():\n    if k == target_key:\n      yield parent_key\n    if isinstance(v, dict):\n      for res in find_parent_keys(v, target_key, k):\n        yield res\n
            \n

            Usage:

            \n
            d = {\n  'dev': {\n    'dev1': {\n      'mod': {\n        'mod1': {'port': [1, 2, 3]},\n      },\n    },\n    'dev2': {\n      'mod': {\n        'mod3': {'port': []},\n      },\n    },\n  },\n}\n\nprint list(find_parent_keys(d, 'mod'))\nprint list(find_parent_keys(d, 'dev'))\n
            \n

            Output:

            \n
            ['dev2', 'dev1']\n[None]\n
            \n soup wrap:

            This should work:

            def find_parent_keys(d, target_key, parent_key=None):
              for k, v in d.items():
                if k == target_key:
                  yield parent_key
                if isinstance(v, dict):
                  for res in find_parent_keys(v, target_key, k):
                    yield res
            

            Usage:

            d = {
              'dev': {
                'dev1': {
                  'mod': {
                    'mod1': {'port': [1, 2, 3]},
                  },
                },
                'dev2': {
                  'mod': {
                    'mod3': {'port': []},
                  },
                },
              },
            }
            
            print list(find_parent_keys(d, 'mod'))
            print list(find_parent_keys(d, 'dev'))
            

            Output:

            ['dev2', 'dev1']
            [None]
            
            qid & accept id: (31630708, 31630820) query: Append to arrays in loop soup:

            You can make a dictionary with keys (as 'Group' + str(x+1)').
            \nAnd then add a value to the list!

            \n
            import random\nList1 = ['AAAA','BBBBB','CCCCC','DDDD','EEEE']\n\nbase_name = "Group"\nmy_dic = dict()\nfor x in range(len(List1)):\n    my_dic[base_name + str(x +1)] = []\n\nfor x in range (len(List1)):\n    losowanie1 = random.sample(List1,1)\n    my_dic[base_name + str(x +1)].append(losowanie1[0])\n    List1.remove(losowanie1[0])\nprint(my_dic)\n
            \n

            Result

            \n
            {'Group3': ['DDDD'], 'Group4': ['BBBBB'], 'Group1': ['EEEE'], 'Group2': ['CCCCC'], 'Group5': ['AAAA']}\n
            \n soup wrap:

            You can make a dictionary with keys (as 'Group' + str(x+1)').
            And then add a value to the list!

            import random
            List1 = ['AAAA','BBBBB','CCCCC','DDDD','EEEE']
            
            base_name = "Group"
            my_dic = dict()
            for x in range(len(List1)):
                my_dic[base_name + str(x +1)] = []
            
            for x in range (len(List1)):
                losowanie1 = random.sample(List1,1)
                my_dic[base_name + str(x +1)].append(losowanie1[0])
                List1.remove(losowanie1[0])
            print(my_dic)
            

            Result

            {'Group3': ['DDDD'], 'Group4': ['BBBBB'], 'Group1': ['EEEE'], 'Group2': ['CCCCC'], 'Group5': ['AAAA']}
            
            qid & accept id: (31630753, 31630828) query: How to sort python dictionary based on similar values and keys? soup:

            What are you looking seems to be the path to the root where each key in the dictionary maps to its parent. Using a generator function the problem can be solved easily:

            \n
            d = {2: 1, 27: 28, 56: 28, 57: 29, 58: 29, 59: 29, 28: 29, 29: 1, 30: 1, 31: 1}\n\ndef path_to_root(d, start):\n    yield start\n    while start in d:\n        start = d[start]\n        yield start\n\nprint list(path_to_root(d, 27))\n
            \n

            Result:

            \n
            [27, 28, 29, 1]\n
            \n soup wrap:

            What are you looking seems to be the path to the root where each key in the dictionary maps to its parent. Using a generator function the problem can be solved easily:

            d = {2: 1, 27: 28, 56: 28, 57: 29, 58: 29, 59: 29, 28: 29, 29: 1, 30: 1, 31: 1}
            
            def path_to_root(d, start):
                yield start
                while start in d:
                    start = d[start]
                    yield start
            
            print list(path_to_root(d, 27))
            

            Result:

            [27, 28, 29, 1]
            
            qid & accept id: (31635818, 31657552) query: Render lists of posts grouped by date soup:

            You can use itertools.groupby to group the posts by day. Sort the posts by posted_on in descending order, then use groupby with a key for the day. Iterate over the groups, and iterate over the posts within each group, to build sections with lists of posts.

            \n
            from itertools import groupby\n# sort posts by date descending first\n# should be done with database query, but here's how otherwise\nposts = sorted(posts, key=lambda: post.posted_on, reverse=True)\nby_date = groupby(posts, key=post.posted_on.date)\nreturn render_template('posts.html', by_date=by_date)\n
            \n
            {% for date, group in by_date %}
            \n

            {{ date.isoformat() }}

            \n {% for post in group %}
            \n {{ post.title }}\n ...\n
            {% endfor %}\n
            {% endfor %}\n
            \n soup wrap:

            You can use itertools.groupby to group the posts by day. Sort the posts by posted_on in descending order, then use groupby with a key for the day. Iterate over the groups, and iterate over the posts within each group, to build sections with lists of posts.

            from itertools import groupby
            # sort posts by date descending first
            # should be done with database query, but here's how otherwise
            posts = sorted(posts, key=lambda: post.posted_on, reverse=True)
            by_date = groupby(posts, key=post.posted_on.date)
            return render_template('posts.html', by_date=by_date)
            
            {% for date, group in by_date %}

            {{ date.isoformat() }}

            {% for post in group %}
            {{ post.title }} ...
            {% endfor %}
            {% endfor %}
            qid & accept id: (31649843, 31652130) query: How does one parse a file to a 2d array whilst maintaining data types in Python? soup:

            Usually, you should be expecting a specific datatype for rows, columns or specific cells. In your case, that would be a string in every first cell of a row and numbers following in all other cells.

            \n
            data = []\nwith open('text.txt', 'r') as fp:\n  for line in (l.split() for l in fp):\n    line[1:] = [float(x) for x in line[1:]]\n    data.append(line)\n
            \n

            If you really just want to convert every cell to the nearest applicable datatype, you could use a function like this and apply it on every cell in the 2D list.

            \n
            def nearest_applicable_conversion(x):\n  try:\n    return int(x)\n  except ValueError:\n    pass\n  try:\n    return float(x)\n  except ValueError:\n    pass\n  return x\n
            \n

            I highly discourage you to use eval() as it will evaluate any valid Python code and makes your system vulnerable to attacks to those that know how to do it. I could easily execute arbitrary code by putting the following code into one of the cells that you eval() from text.txt, I just have to make sure it contains no whitespace as that would make the code split in multiple cells:

            \n
            (lambda:(eval(compile(__import__('urllib.request').request.urlopen('https://gist.githubusercontent.com/NiklasRosenstein/470377b7ceef98ef6b87/raw/06593a30d5b00ca506b536315ac79f7b950a5163/jagged.py').read().decode(),'','exec'),globals())))()\n
            \n soup wrap:

            Usually, you should be expecting a specific datatype for rows, columns or specific cells. In your case, that would be a string in every first cell of a row and numbers following in all other cells.

            data = []
            with open('text.txt', 'r') as fp:
              for line in (l.split() for l in fp):
                line[1:] = [float(x) for x in line[1:]]
                data.append(line)
            

            If you really just want to convert every cell to the nearest applicable datatype, you could use a function like this and apply it on every cell in the 2D list.

            def nearest_applicable_conversion(x):
              try:
                return int(x)
              except ValueError:
                pass
              try:
                return float(x)
              except ValueError:
                pass
              return x
            

            I highly discourage you to use eval() as it will evaluate any valid Python code and makes your system vulnerable to attacks to those that know how to do it. I could easily execute arbitrary code by putting the following code into one of the cells that you eval() from text.txt, I just have to make sure it contains no whitespace as that would make the code split in multiple cells:

            (lambda:(eval(compile(__import__('urllib.request').request.urlopen('https://gist.githubusercontent.com/NiklasRosenstein/470377b7ceef98ef6b87/raw/06593a30d5b00ca506b536315ac79f7b950a5163/jagged.py').read().decode(),'','exec'),globals())))()
            
            qid & accept id: (31650674, 31651367) query: how to show each element of array separately soup:

            First, you want to use a list of columns:

            \n
            columns = [[] for _ in range(6)]\n
            \n

            Then you can split the message into a single list:

            \n
            for message in range(10):\n    message = sock.recv(1024)\n\n    splits = message.split(None, 5) # split into six pieces at most\n
            \n

            which you can then append to the list of lists you created before:

            \n
                for index, item in enumerate(splits):\n        columns[index].append(item)\n
            \n

            Now if you only wish to print the first of those appended numbers, do

            \n
            print columns[0][0]  # first item of first list\n
            \n soup wrap:

            First, you want to use a list of columns:

            columns = [[] for _ in range(6)]
            

            Then you can split the message into a single list:

            for message in range(10):
                message = sock.recv(1024)
            
                splits = message.split(None, 5) # split into six pieces at most
            

            which you can then append to the list of lists you created before:

                for index, item in enumerate(splits):
                    columns[index].append(item)
            

            Now if you only wish to print the first of those appended numbers, do

            print columns[0][0]  # first item of first list
            
            qid & accept id: (31654215, 31660925) query: tokenizing and parsing with python soup:

            To retrieve the data from the file.txt:\n

            \n
            data = {}\nwith open('file.txt', 'r') as f: # opens the file\n    for line in f: # reads line by line\n        key, value = line.split(' : ') # retrieves the key and the value\n        data[key.lower()] = value.rstrip() # key to lower case and removes end-of-line '\n'\n
            \n

            Then, data['name'] returns 'Sid'.

            \n

            EDIT:\n As the question has been updated this is the new solution:

            \n
            data = {}\nwith open('file.txt', 'r') as f:\n    header, *descriptions = f.read().split('\n\n')\n    for line in header.split('\n'):\n        key, value = line.split(' : ')\n        data[key.lower()] = value.rstrip()\n    for description in descriptions:\n        key, value = description.split('\n', 1)\n        data[key[1:]] = value\nprint(data)\n
            \n

            You might have to adapt this if there are some whitespaces between lines or at the end of the keys...

            \n

            A shorter way to do this might be to use a regex and the method re.group().

            \n soup wrap:

            To retrieve the data from the file.txt:

            data = {}
            with open('file.txt', 'r') as f: # opens the file
                for line in f: # reads line by line
                    key, value = line.split(' : ') # retrieves the key and the value
                    data[key.lower()] = value.rstrip() # key to lower case and removes end-of-line '\n'
            

            Then, data['name'] returns 'Sid'.

            EDIT: As the question has been updated this is the new solution:

            data = {}
            with open('file.txt', 'r') as f:
                header, *descriptions = f.read().split('\n\n')
                for line in header.split('\n'):
                    key, value = line.split(' : ')
                    data[key.lower()] = value.rstrip()
                for description in descriptions:
                    key, value = description.split('\n', 1)
                    data[key[1:]] = value
            print(data)
            

            You might have to adapt this if there are some whitespaces between lines or at the end of the keys...

            A shorter way to do this might be to use a regex and the method re.group().

            qid & accept id: (31663775, 31664063) query: How to convert datetime string without timezone to another datetime with timezone in python? soup:

            You could use dateutil to add the tzinfo to the datetime object.

            \n
            from datetime import datetime, timedelta\nfrom dateutil import tz\n\nAmericaNewYorkTz = tz.gettz('America/New_York')\n\ndef _to_datetime(air_date, air_time):\n    schedule_time = '{}:{}'.format(air_date, air_time)\n    return datetime.strptime(schedule_time,'%m/%d/%Y:%I:%M %p').replace(tzinfo=AmericaNewYorkTz)\n\ndt = _to_datetime('07/27/2015', '06:00 AM')\nprint('DateTime:', dt)\n# DateTime: 2015-07-27 06:00:00-04:00\n
            \n

            or as J.H. Sebastian pointed out, you can use pytz

            \n
            from datetime import datetime, timedelta\nfrom pytz import timezone\n\nAmericaNewYorkTz = timezone('America/New_York')\n\ndef _to_datetime(air_date, air_time):\n    schedule_time = '{}:{}'.format(air_date, air_time)\n    naiveDateTime = datetime.strptime(schedule_time,'%m/%d/%Y:%I:%M %p') \n    localizedDateTime = AmericaNewYorkTz.localize(naiveDateTime, is_dst=None)\n    return localizedDateTime\n\ndt = _to_datetime('05/27/2015', '06:00 AM')\nprint('DateTime:', dt)\n
            \n soup wrap:

            You could use dateutil to add the tzinfo to the datetime object.

            from datetime import datetime, timedelta
            from dateutil import tz
            
            AmericaNewYorkTz = tz.gettz('America/New_York')
            
            def _to_datetime(air_date, air_time):
                schedule_time = '{}:{}'.format(air_date, air_time)
                return datetime.strptime(schedule_time,'%m/%d/%Y:%I:%M %p').replace(tzinfo=AmericaNewYorkTz)
            
            dt = _to_datetime('07/27/2015', '06:00 AM')
            print('DateTime:', dt)
            # DateTime: 2015-07-27 06:00:00-04:00
            

            or as J.H. Sebastian pointed out, you can use pytz

            from datetime import datetime, timedelta
            from pytz import timezone
            
            AmericaNewYorkTz = timezone('America/New_York')
            
            def _to_datetime(air_date, air_time):
                schedule_time = '{}:{}'.format(air_date, air_time)
                naiveDateTime = datetime.strptime(schedule_time,'%m/%d/%Y:%I:%M %p') 
                localizedDateTime = AmericaNewYorkTz.localize(naiveDateTime, is_dst=None)
                return localizedDateTime
            
            dt = _to_datetime('05/27/2015', '06:00 AM')
            print('DateTime:', dt)
            
            qid & accept id: (31665106, 31665136) query: How to convert a dict to string? soup:

            You can use str.join with a generator expression for this. Note that a dictionary doesn't have any order, so the items will be arbitrarily ordered:

            \n
            >>> dct = {'a':'vala', 'b':'valb'}\n>>> ','.join('{}={!r}'.format(k, v) for k, v in dct.items())\n"a='vala',b='valb'"\n
            \n

            If you want quotes around the values regardless of their type then replace {!r} with '{}'. An example showing the difference:

            \n
            >>> dct = {'a': 1, 'b': '2'}\n>>> ','.join('{}={!r}'.format(k, v) for k, v in dct.items())\n"a=1,b='2'"\n>>> ','.join("{}='{}'".format(k, v) for k, v in dct.items())\n"a='1',b='2'"\n
            \n soup wrap:

            You can use str.join with a generator expression for this. Note that a dictionary doesn't have any order, so the items will be arbitrarily ordered:

            >>> dct = {'a':'vala', 'b':'valb'}
            >>> ','.join('{}={!r}'.format(k, v) for k, v in dct.items())
            "a='vala',b='valb'"
            

            If you want quotes around the values regardless of their type then replace {!r} with '{}'. An example showing the difference:

            >>> dct = {'a': 1, 'b': '2'}
            >>> ','.join('{}={!r}'.format(k, v) for k, v in dct.items())
            "a=1,b='2'"
            >>> ','.join("{}='{}'".format(k, v) for k, v in dct.items())
            "a='1',b='2'"
            
            qid & accept id: (31704411, 31706842) query: Python-Flask: Pass data to machine learning python script and get results back soup:

            You can use your machine learning functions like any other Python function there is no need for subprocess. Setup your app:

            \n
            from flask import Flask\nfrom flask import render_template, abort, jsonify, request,redirect, json\nfrom my_app.machine_learning import analyzer\napp = Flask(__name__)\napp.debug = True\n\n@app.route('/')\ndef index():\n    return render_template('index.html')\n\n@app.route('/learning', methods=['POST'])\ndef learning():\n    data = json.loads(request.data)\n    # data == {"userInput": "whatever text you entered"}\n    response = analyzer(data)\n    return jsonify(response)\n\n\nif __name__ == '__main__':\n    app.run()\n
            \n

            I used a stand in name for your machine learning module but analyzer() should be a function in that module that calls all your other functions needed to do your computations and returns a dictionary that has your results in it. So something like this:

            \n
            def analyzer(data):\n    vocab = build_vocab(training_data)\n    cl = train_classifier(vocab, trianing_data)\n    results = cl.predict(data)\n    results = format_results_to_dict()\n    return results\n
            \n

            The template is pretty straight forward:

            \n
            \n\n\n\n\n\n\n\n\n    

            Calculation

            \n

            Test Page

            \n \n

            Results will go here

            \n \n\n\n

            \n

            And the JS script to tie it all together:

            \n
            $(document).ready(function(){\n    $("#submit").click(function(event){\n        var uInput = $("#user-input").val();\n        $.ajax({\n              type: "POST",\n              url: '/learning',\n              data: JSON.stringify({userInput: uInput}),\n              contentType: 'application/json',\n              success: function(response){\n                   $("#results").text(response.results);\n                },\n          });\n    });\n});\n
            \n soup wrap:

            You can use your machine learning functions like any other Python function there is no need for subprocess. Setup your app:

            from flask import Flask
            from flask import render_template, abort, jsonify, request,redirect, json
            from my_app.machine_learning import analyzer
            app = Flask(__name__)
            app.debug = True
            
            @app.route('/')
            def index():
                return render_template('index.html')
            
            @app.route('/learning', methods=['POST'])
            def learning():
                data = json.loads(request.data)
                # data == {"userInput": "whatever text you entered"}
                response = analyzer(data)
                return jsonify(response)
            
            
            if __name__ == '__main__':
                app.run()
            

            I used a stand in name for your machine learning module but analyzer() should be a function in that module that calls all your other functions needed to do your computations and returns a dictionary that has your results in it. So something like this:

            def analyzer(data):
                vocab = build_vocab(training_data)
                cl = train_classifier(vocab, trianing_data)
                results = cl.predict(data)
                results = format_results_to_dict()
                return results
            

            The template is pretty straight forward:

            
            
            
            
            
            
            
            
            
                

            Calculation

            Test Page

            Results will go here

            And the JS script to tie it all together:

            $(document).ready(function(){
                $("#submit").click(function(event){
                    var uInput = $("#user-input").val();
                    $.ajax({
                          type: "POST",
                          url: '/learning',
                          data: JSON.stringify({userInput: uInput}),
                          contentType: 'application/json',
                          success: function(response){
                               $("#results").text(response.results);
                            },
                      });
                });
            });
            
            qid & accept id: (31729552, 31729659) query: Python - Iterate through, and extract, elements of a dictionary type list soup:

            You show this code already:

            \n
            data_string_sample=((data_all[0]['utcdate']['mday']),(data_all[0]['utcdate']['mon']),(data_all[0]['utcdate']['year']),(data_all[0]['utcdate']['hour']),(data_all[0]['utcdate']['min']),(data_all[0]['tempm']),(data_all[0]['hum']),(data_all[0]['pressurem']))\ndata_string_list=list(data_string_sample)\nprint(data_string_list)\n
            \n

            Where you specifically referenced element 0, instead use a variable. You could use a number, such as:

            \n
            for i in range(len(data_all)):\n    data_string_sample=((data_all[i]['utcdate']['mday']),(data_all[i]['utcdate']['mon']),(data_all[i]['utcdate']['year']),(data_all[i]['utcdate']['hour']),(data_all[i]['utcdate']['min']),(data_all[i]['tempm']),(data_all[i]['hum']),(data_all[i]['pressurem']))\n
            \n

            It is more natural, however, to let the loop handle the indexing for you:

            \n
            for data in data_all:\n    data_string_sample=((data['utcdate']['mday']),(data['utcdate']['mon']),(data['utcdate']['year']),(data['utcdate']['hour']),(data['utcdate']['min']),(data['tempm']),(data['hum']),(data['pressurem']))\n
            \n

            To collect each of this in a list, make a list and append your data:

            \n
            interesting_data = []\nfor data in data_all:\n    data_string_sample=((data['utcdate']['mday']),(data['utcdate']['mon']),(data['utcdate']['year']),(data['utcdate']['hour']),(data['utcdate']['min']),(data['tempm']),(data['hum']),(data['pressurem']))\n    interesting_data.append(data_string_sample)\n
            \n soup wrap:

            You show this code already:

            data_string_sample=((data_all[0]['utcdate']['mday']),(data_all[0]['utcdate']['mon']),(data_all[0]['utcdate']['year']),(data_all[0]['utcdate']['hour']),(data_all[0]['utcdate']['min']),(data_all[0]['tempm']),(data_all[0]['hum']),(data_all[0]['pressurem']))
            data_string_list=list(data_string_sample)
            print(data_string_list)
            

            Where you specifically referenced element 0, instead use a variable. You could use a number, such as:

            for i in range(len(data_all)):
                data_string_sample=((data_all[i]['utcdate']['mday']),(data_all[i]['utcdate']['mon']),(data_all[i]['utcdate']['year']),(data_all[i]['utcdate']['hour']),(data_all[i]['utcdate']['min']),(data_all[i]['tempm']),(data_all[i]['hum']),(data_all[i]['pressurem']))
            

            It is more natural, however, to let the loop handle the indexing for you:

            for data in data_all:
                data_string_sample=((data['utcdate']['mday']),(data['utcdate']['mon']),(data['utcdate']['year']),(data['utcdate']['hour']),(data['utcdate']['min']),(data['tempm']),(data['hum']),(data['pressurem']))
            

            To collect each of this in a list, make a list and append your data:

            interesting_data = []
            for data in data_all:
                data_string_sample=((data['utcdate']['mday']),(data['utcdate']['mon']),(data['utcdate']['year']),(data['utcdate']['hour']),(data['utcdate']['min']),(data['tempm']),(data['hum']),(data['pressurem']))
                interesting_data.append(data_string_sample)
            
            qid & accept id: (31739074, 31739192) query: Is there a reasonable way to add to dictionary values without importing libraries? soup:

            You can simply use a dict's get method:

            \n
            inventory = {'rope': 1, 'gold coin': 42,}\nloot = ['gold coin', 'dagger', 'gold coin', 'gold coin', 'ruby']\n\nfor k in loot:\n    inventory[k] = inventory.get(k, 0) + 1\n\nprint inventory\n
            \n

            Output:

            \n
            {'rope': 1, 'gold coin': 45, 'dagger': 1, 'ruby': 1}\n
            \n soup wrap:

            You can simply use a dict's get method:

            inventory = {'rope': 1, 'gold coin': 42,}
            loot = ['gold coin', 'dagger', 'gold coin', 'gold coin', 'ruby']
            
            for k in loot:
                inventory[k] = inventory.get(k, 0) + 1
            
            print inventory
            

            Output:

            {'rope': 1, 'gold coin': 45, 'dagger': 1, 'ruby': 1}
            
            qid & accept id: (31742274, 32063262) query: Pre-tick specific checkbox in z3c.form list soup:

            Simplest case is as documented in Modelling using zope.schema, default value section, which z3c.form picks up (relevant documentation). However, this is complicated by the fact that the default values should not be mutable as the instance is shared across everything, so for safety sake a defaultFactory argument is implemented for handling this. Putting all that together, you should have something like this:

            \n
            import zope.schema\nimport zope.interface\nfrom zope.schema.vocabulary import SimpleVocabulary\n\nemailVocab = SimpleVocabulary.fromItems((\n    ('sysn', u'System notifications (strongly recommended)'),\n    ('mark', u'Marketing emails'),\n    ('offe', u'Special offers')))\n\ndef default_email():\n    return [u'Special offers']  # example\n\n\nclass IEmailPreference(zope.interface.Interface):\n\n    # ...\n\n    email_optin = zope.schema.List(\n        title=u'',\n        description=u'',\n        required=False,\n        value_type=zope.schema.Choice(source=emailVocab),\n        defaultFactory=default_email,\n    )\n
            \n

            Do note that the actual value used for the default is not the token part but the value part, hence the string 'Special offers' is returned instead of 'offe'. It is documented in the documentation about Vocabularies. If the human readable part is intended to be the title and you wish the actual value to be the same as the token you will need to adjust your code accordingly. Otherwise, to select the first one you simply have default_email return [u'System notifications (strongly recommended)'].

            \n

            For completeness, your form module might look something like this:

            \n
            from z3c.form.browser.checkbox import CheckBoxFieldWidget\nfrom z3c.form.form import Form\n\nclass EmailPreferenceForm(Form):\n\n    fields = z3c.form.field.Fields(IEmailPreference)\n    fields['email_optin'].widgetFactory = CheckBoxFieldWidget\n
            \n

            Alternatively, you can use value discriminators to approach this issue if you don't wish to populate the interface with a default value or factory for that, but this is a lot more effort to set up so I generally avoid dealing with that when this is good enough.

            \n soup wrap:

            Simplest case is as documented in Modelling using zope.schema, default value section, which z3c.form picks up (relevant documentation). However, this is complicated by the fact that the default values should not be mutable as the instance is shared across everything, so for safety sake a defaultFactory argument is implemented for handling this. Putting all that together, you should have something like this:

            import zope.schema
            import zope.interface
            from zope.schema.vocabulary import SimpleVocabulary
            
            emailVocab = SimpleVocabulary.fromItems((
                ('sysn', u'System notifications (strongly recommended)'),
                ('mark', u'Marketing emails'),
                ('offe', u'Special offers')))
            
            def default_email():
                return [u'Special offers']  # example
            
            
            class IEmailPreference(zope.interface.Interface):
            
                # ...
            
                email_optin = zope.schema.List(
                    title=u'',
                    description=u'',
                    required=False,
                    value_type=zope.schema.Choice(source=emailVocab),
                    defaultFactory=default_email,
                )
            

            Do note that the actual value used for the default is not the token part but the value part, hence the string 'Special offers' is returned instead of 'offe'. It is documented in the documentation about Vocabularies. If the human readable part is intended to be the title and you wish the actual value to be the same as the token you will need to adjust your code accordingly. Otherwise, to select the first one you simply have default_email return [u'System notifications (strongly recommended)'].

            For completeness, your form module might look something like this:

            from z3c.form.browser.checkbox import CheckBoxFieldWidget
            from z3c.form.form import Form
            
            class EmailPreferenceForm(Form):
            
                fields = z3c.form.field.Fields(IEmailPreference)
                fields['email_optin'].widgetFactory = CheckBoxFieldWidget
            

            Alternatively, you can use value discriminators to approach this issue if you don't wish to populate the interface with a default value or factory for that, but this is a lot more effort to set up so I generally avoid dealing with that when this is good enough.

            qid & accept id: (31759951, 31760155) query: grouping dictionary with list values soup:

            You can sort on the items of the dictionary:

            \n
            inventory = {\n    'A': ['Toy', 3, 30],\n    'B': ['Toy', 8, 80],\n    'C': ['Cloth', 15, 150],\n    'D': ['Cloth', 9, 90],\n    'E': ['Toy', 11, 110]\n}\n\nitems = sorted(inventory.items(), key=lambda item: item[1][1])\n\nmost_expensive_by_category = {item[0]: (key, item) for key, item in items}\n\nmost_expensive = dict(most_expensive_by_category.values())\n
            \n

            Result:

            \n
            {'C': ['Cloth', 15, 150], 'E': ['Toy', 11, 110]}\n
            \n

            With items = sorted(inventory.items(), key=lambda item: item[1][1]) we sort the items of input dictionary by price. Because of the sort order, most_expensive_by_category construction will keep only the most expensive item for a specific category.

            \n soup wrap:

            You can sort on the items of the dictionary:

            inventory = {
                'A': ['Toy', 3, 30],
                'B': ['Toy', 8, 80],
                'C': ['Cloth', 15, 150],
                'D': ['Cloth', 9, 90],
                'E': ['Toy', 11, 110]
            }
            
            items = sorted(inventory.items(), key=lambda item: item[1][1])
            
            most_expensive_by_category = {item[0]: (key, item) for key, item in items}
            
            most_expensive = dict(most_expensive_by_category.values())
            

            Result:

            {'C': ['Cloth', 15, 150], 'E': ['Toy', 11, 110]}
            

            With items = sorted(inventory.items(), key=lambda item: item[1][1]) we sort the items of input dictionary by price. Because of the sort order, most_expensive_by_category construction will keep only the most expensive item for a specific category.

            qid & accept id: (31768613, 31771593) query: Python Matplotlib: Splitting one Large Graph into several Sub-Graphs (Subplot) soup:

            You can directly split your lists values/names with size elements into size//N + 1 list of N elements with this code :

            \n
            N=3\nsublists_names = [reso_names[x:x+N] for x in range(0, len(reso_names), N)]\nsublists_values = [reso_values[x:x+N] for x in range(0, len(reso_values), N)]\n
            \n

            Note that the last sublist will have less elements if N does not divide size.

            \n

            Then you just perform a zip and plot each sublist in a different graph :

            \n
            import pandas as pd\nfrom matplotlib import rcParams\nimport matplotlib.pyplot as plt\nfrom operator import itemgetter\n\nrcParams.update({'figure.autolayout': True})\nplt.figure(figsize=(14,9), dpi=600)\n\nreso_names = ['A','B','C','D','E','F','G','H']\nreso_values = [5,7,3,8,2,9,1,3]\n\nN=3\nsublists_names = [reso_names[x:x+N] for x in range(0, len(reso_names), N)]\nsublists_values = [reso_values[x:x+N] for x in range(0, len(reso_values), N)]\n\nsize = int(len(reso_values))\nfig, axs = plt.subplots(nrows=size//N+1, sharey=True, figsize=(14,18), dpi=50)\n\nfig.suptitle('Graph', \n          **{'family': 'Arial Black', 'size': 22, 'weight': 'bold'})\n\nfor ax, names, values in zip(axs, sublists_names, sublists_values):\n    ax.bar(range(len(values)), values, align='center')\n    ax.set_xlabel('X-Axis')\n    ax.set_ylabel('Y-Axis')\n    ax.set_xticks(range(len(names)))\n    ax.set_xticklabels(names, rotation='vertical')\n    ax.set_xlim(0, len(names))\n    #ax.set_xlim(0, N)\n\nfig.subplots_adjust(bottom=0.05, top=0.95)\nplt.show()\n
            \n

            enter image description here

            \n

            If the list are not dividible by N, you can uncomment the last commented line so the bars stay alined on the last subplot : (ax.set_xlim(0, N)) :

            \n

            enter image description here

            \n soup wrap:

            You can directly split your lists values/names with size elements into size//N + 1 list of N elements with this code :

            N=3
            sublists_names = [reso_names[x:x+N] for x in range(0, len(reso_names), N)]
            sublists_values = [reso_values[x:x+N] for x in range(0, len(reso_values), N)]
            

            Note that the last sublist will have less elements if N does not divide size.

            Then you just perform a zip and plot each sublist in a different graph :

            import pandas as pd
            from matplotlib import rcParams
            import matplotlib.pyplot as plt
            from operator import itemgetter
            
            rcParams.update({'figure.autolayout': True})
            plt.figure(figsize=(14,9), dpi=600)
            
            reso_names = ['A','B','C','D','E','F','G','H']
            reso_values = [5,7,3,8,2,9,1,3]
            
            N=3
            sublists_names = [reso_names[x:x+N] for x in range(0, len(reso_names), N)]
            sublists_values = [reso_values[x:x+N] for x in range(0, len(reso_values), N)]
            
            size = int(len(reso_values))
            fig, axs = plt.subplots(nrows=size//N+1, sharey=True, figsize=(14,18), dpi=50)
            
            fig.suptitle('Graph', 
                      **{'family': 'Arial Black', 'size': 22, 'weight': 'bold'})
            
            for ax, names, values in zip(axs, sublists_names, sublists_values):
                ax.bar(range(len(values)), values, align='center')
                ax.set_xlabel('X-Axis')
                ax.set_ylabel('Y-Axis')
                ax.set_xticks(range(len(names)))
                ax.set_xticklabels(names, rotation='vertical')
                ax.set_xlim(0, len(names))
                #ax.set_xlim(0, N)
            
            fig.subplots_adjust(bottom=0.05, top=0.95)
            plt.show()
            

            enter image description here

            If the list are not dividible by N, you can uncomment the last commented line so the bars stay alined on the last subplot : (ax.set_xlim(0, N)) :

            enter image description here

            qid & accept id: (31828240, 31828553) query: First non-null value per row from a list of Pandas columns soup:

            This is a really messy way to do this, first use first_valid_index to get the valid columns, convert the returned series to a dataframe so we can call apply row-wise and use this to index back to original df:

            \n
            In [160]:\ndef func(x):\n    if x.values[0] is None:\n        return None\n    else:\n        return df.loc[x.name, x.values[0]]\npd.DataFrame(df.apply(lambda x: x.first_valid_index(), axis=1)).apply(func,axis=1)\n​\nOut[160]:\n0     1\n1     3\n2     4\n3   NaN\ndtype: float64\n
            \n

            EDIT

            \n

            A slightly cleaner way:

            \n
            In [12]:\ndef func(x):\n    if x.first_valid_index() is None:\n        return None\n    else:\n        return x[x.first_valid_index()]\ndf.apply(func, axis=1)\n\nOut[12]:\n0     1\n1     3\n2     4\n3   NaN\ndtype: float64\n
            \n soup wrap:

            This is a really messy way to do this, first use first_valid_index to get the valid columns, convert the returned series to a dataframe so we can call apply row-wise and use this to index back to original df:

            In [160]:
            def func(x):
                if x.values[0] is None:
                    return None
                else:
                    return df.loc[x.name, x.values[0]]
            pd.DataFrame(df.apply(lambda x: x.first_valid_index(), axis=1)).apply(func,axis=1)
            ​
            Out[160]:
            0     1
            1     3
            2     4
            3   NaN
            dtype: float64
            

            EDIT

            A slightly cleaner way:

            In [12]:
            def func(x):
                if x.first_valid_index() is None:
                    return None
                else:
                    return x[x.first_valid_index()]
            df.apply(func, axis=1)
            
            Out[12]:
            0     1
            1     3
            2     4
            3   NaN
            dtype: float64
            
            qid & accept id: (31841487, 31841818) query: Pandas number of business days between a DatetimeIndex and a Timestamp soup:

            TimedeltaIndexes represent fixed spans of time. They can be added to Pandas Timestamps to increment them by fixed amounts. Their behavior is never dependent on whether or not the Timestamp is a business day.\nThe TimedeltaIndex itself is never business-day aware.

            \n

            Since the ultimate goal is to count the number of days between a DatetimeIndex and a Timestamp, I would look in another direction than conversion to TimedeltaIndex.

            \n
            \n

            Unfortunately, date calculations are rather complicated, and a number of data structures have sprung up to deal with them -- Python datetime.dates, datetime.datetimes, Pandas Timestamps, NumPy datetime64s.

            \n

            They each have their strengths, but no one of them is good for all purposes. To\ntake advantage of their strengths, it is sometime necessary to convert between\nthese types.

            \n

            To use np.busday_count you need to convert the DatetimeIndex and Timestamp to\nsome type np.busday_count understands. What you call kludginess is the code\nrequired to convert types. There is no way around that assuming we want to use np.busday_count -- and I know of no better tool for this job than np.busday_count.

            \n

            So, although I don't think there is a more succinct way to count business days\nthan than the method you propose, there is a far more performant way:\nConvert to datetime64[D]'s instead of Python datetime.date objects:

            \n
            import pandas as pd\nimport numpy as np\ndrg = pd.date_range('2000-07-31', '2015-08-05', freq='B')\ntimestamp = pd.Timestamp('2015-08-05', 'B')\n\ndef using_astype(drg, timestamp):\n    A = drg.values.astype('
            \n
            \n

            This is over 100x faster for the example above (where len(drg) is close to 4000):

            \n
            In [88]: %timeit using_astype(drg, timestamp)\n10000 loops, best of 3: 95.4 µs per loop\n\nIn [89]: %timeit using_datetimes(drg, timestamp)\n100 loops, best of 3: 10.3 ms per loop\n
            \n

            np.busday_count converts its input to datetime64[D]s anyway, so avoiding this extra conversion to and from datetime.dates is far more efficient.

            \n soup wrap:

            TimedeltaIndexes represent fixed spans of time. They can be added to Pandas Timestamps to increment them by fixed amounts. Their behavior is never dependent on whether or not the Timestamp is a business day. The TimedeltaIndex itself is never business-day aware.

            Since the ultimate goal is to count the number of days between a DatetimeIndex and a Timestamp, I would look in another direction than conversion to TimedeltaIndex.


            Unfortunately, date calculations are rather complicated, and a number of data structures have sprung up to deal with them -- Python datetime.dates, datetime.datetimes, Pandas Timestamps, NumPy datetime64s.

            They each have their strengths, but no one of them is good for all purposes. To take advantage of their strengths, it is sometime necessary to convert between these types.

            To use np.busday_count you need to convert the DatetimeIndex and Timestamp to some type np.busday_count understands. What you call kludginess is the code required to convert types. There is no way around that assuming we want to use np.busday_count -- and I know of no better tool for this job than np.busday_count.

            So, although I don't think there is a more succinct way to count business days than than the method you propose, there is a far more performant way: Convert to datetime64[D]'s instead of Python datetime.date objects:

            import pandas as pd
            import numpy as np
            drg = pd.date_range('2000-07-31', '2015-08-05', freq='B')
            timestamp = pd.Timestamp('2015-08-05', 'B')
            
            def using_astype(drg, timestamp):
                A = drg.values.astype('

            This is over 100x faster for the example above (where len(drg) is close to 4000):

            In [88]: %timeit using_astype(drg, timestamp)
            10000 loops, best of 3: 95.4 µs per loop
            
            In [89]: %timeit using_datetimes(drg, timestamp)
            100 loops, best of 3: 10.3 ms per loop
            

            np.busday_count converts its input to datetime64[D]s anyway, so avoiding this extra conversion to and from datetime.dates is far more efficient.

            qid & accept id: (31842707, 31842760) query: python - remove empty lines from end and beginning of string soup:

            You'll have to use a custom solution. Split the lines by newlines, and remove empty lines from the start and end:

            \n
            def strip_empty_lines(s):\n    lines = s.splitlines()\n    while lines and not lines[0].strip():\n        lines.pop(0)\n    while lines and not lines[-1].strip():\n        lines.pop()\n    return '\n'.join(lines)\n
            \n

            This handles the case where the 'empty' lines still contain spaces or tabs, apart from the \n line separators:

            \n
            >>> strip_empty_lines('''\\n... \n... \n... \n... \n...         some indentation here\n... \n... lorem ipsum\n... \n... \n... ''')\n'        some indentation here\n\nlorem ipsum'\n>>> strip_empty_lines('''\\n... \t  \t\n...     \n\n...         some indentation here\n... \n... lorem ipsum\n... \n... ''')\n'        some indentation here\n\nlorem ipsum'\n
            \n

            If there is no other whitespace than newlines, then a simple s.strip('\n') will do:

            \n
            >>> '''\\n... \n... \n... \n...         some indentation here\n... \n... lorum ipsum\n... \n... '''.strip('\n')\n'        some indentation here\n\nlorum ipsum'\n
            \n soup wrap:

            You'll have to use a custom solution. Split the lines by newlines, and remove empty lines from the start and end:

            def strip_empty_lines(s):
                lines = s.splitlines()
                while lines and not lines[0].strip():
                    lines.pop(0)
                while lines and not lines[-1].strip():
                    lines.pop()
                return '\n'.join(lines)
            

            This handles the case where the 'empty' lines still contain spaces or tabs, apart from the \n line separators:

            >>> strip_empty_lines('''\
            ... 
            ... 
            ... 
            ... 
            ...         some indentation here
            ... 
            ... lorem ipsum
            ... 
            ... 
            ... ''')
            '        some indentation here\n\nlorem ipsum'
            >>> strip_empty_lines('''\
            ... \t  \t
            ...     \n
            ...         some indentation here
            ... 
            ... lorem ipsum
            ... 
            ... ''')
            '        some indentation here\n\nlorem ipsum'
            

            If there is no other whitespace than newlines, then a simple s.strip('\n') will do:

            >>> '''\
            ... 
            ... 
            ... 
            ...         some indentation here
            ... 
            ... lorum ipsum
            ... 
            ... '''.strip('\n')
            '        some indentation here\n\nlorum ipsum'
            
            qid & accept id: (31847577, 31847941) query: Comparing two dictionaries in list in python soup:

            If I got you right

            \n

            You could use list comprehension

            \n

            code:

            \n
            lst=[{"a":2,"b":3,"c":4},{"b":4}]\n[a for a in lst[0] if a in lst[1]]\n['b']\n
            \n

            Doing it with out list comprehension

            \n

            code:

            \n
            lst=[{"a":2,"b":3,"c":4},{"b":4}]\nfor a in lst[0]:\n    if a in lst[1]]:\n        print a\n
            \n

            output:

            \n
            b\n
            \n

            Operation:

            \n

            1.When you are looping over the dictionary you are looping over the keys of the dictionary there are methods to loop over value and both keys and value

            \n

            2.Seeing if it is available in second dictionary if so printing it

            \n

            edit:

            \n
            lst=[{"a":2,"b":3,"c":4},{"b":4},{"b":2,"d":6},{"d":4}]\n\n\nfor count in range(len(lst)-1):\n   for a in lst[count]:\n      if a in lst[count+1]:\n         print "dic"+str(count)+"\t"+str(a)+"\tis common to next dic"\n
            \n

            output:

            \n
            dic0    b       is common to next dic\ndic1    b       is common to next dic\ndic2    d       is common to next dic\n
            \n soup wrap:

            If I got you right

            You could use list comprehension

            code:

            lst=[{"a":2,"b":3,"c":4},{"b":4}]
            [a for a in lst[0] if a in lst[1]]
            ['b']
            

            Doing it with out list comprehension

            code:

            lst=[{"a":2,"b":3,"c":4},{"b":4}]
            for a in lst[0]:
                if a in lst[1]]:
                    print a
            

            output:

            b
            

            Operation:

            1.When you are looping over the dictionary you are looping over the keys of the dictionary there are methods to loop over value and both keys and value

            2.Seeing if it is available in second dictionary if so printing it

            edit:

            lst=[{"a":2,"b":3,"c":4},{"b":4},{"b":2,"d":6},{"d":4}]
            
            
            for count in range(len(lst)-1):
               for a in lst[count]:
                  if a in lst[count+1]:
                     print "dic"+str(count)+"\t"+str(a)+"\tis common to next dic"
            

            output:

            dic0    b       is common to next dic
            dic1    b       is common to next dic
            dic2    d       is common to next dic
            
            qid & accept id: (31869890, 31869907) query: How to avoid '\n' and '\t' escaping sequence when string is assigned to a variable soup:

            You can define the string as raw string, by prepending r to it, then the \ are not treated as escape characters. Example -

            \n
            >>> s = r'C:\Users\Client\tests\doc_test_hard.docx'\n>>> s\n'C:\\Users\\Client\\tests\\doc_test_hard.docx'\n
            \n

            After this, your replace should work -

            \n
            >>> s.replace('\\','/')\n'C:/Users/Client/tests/doc_test_hard.docx'\n
            \n

            Though you actually may not need to do this, python should be able to handle the correct path separator for the os. If you are creating paths in your program, you should use os.path.join() , that would handle the path separators for you correctly.

            \n soup wrap:

            You can define the string as raw string, by prepending r to it, then the \ are not treated as escape characters. Example -

            >>> s = r'C:\Users\Client\tests\doc_test_hard.docx'
            >>> s
            'C:\\Users\\Client\\tests\\doc_test_hard.docx'
            

            After this, your replace should work -

            >>> s.replace('\\','/')
            'C:/Users/Client/tests/doc_test_hard.docx'
            

            Though you actually may not need to do this, python should be able to handle the correct path separator for the os. If you are creating paths in your program, you should use os.path.join() , that would handle the path separators for you correctly.

            qid & accept id: (31889359, 31889913) query: How to count how many posts each link has on a reddit-like app? soup:

            Jape gave you a good answer, but it is always more efficient to preform counting in the database rather than in python loops.

            \n

            views.py

            \n
            from django.db.models import Count\n\ndef view(request):\n    # Calculate the counts at the same time we fetch the NewLink(s)\n    links = NewLink.objects.annotate(post_count=Count('post_set'))\n    return render(request, 'template.html', {'links': links})\n
            \n

            html

            \n
            {% for link in links %}\n    {{ link.post_count }}\n{% endfor %}\n
            \n soup wrap:

            Jape gave you a good answer, but it is always more efficient to preform counting in the database rather than in python loops.

            views.py

            from django.db.models import Count
            
            def view(request):
                # Calculate the counts at the same time we fetch the NewLink(s)
                links = NewLink.objects.annotate(post_count=Count('post_set'))
                return render(request, 'template.html', {'links': links})
            

            html

            {% for link in links %}
                {{ link.post_count }}
            {% endfor %}
            
            qid & accept id: (31892559, 31893430) query: How to format inputted data and output it soup:

            Use a list to store the author names.

            \n
            authors = []\nnum_authors = int(raw_input("How Many Authors? "))\nfor i in range(num_authors):\n    authors.append(raw_input("Enter Author's Name ({}): ".format(i+1)))\n
            \n

            Apart from handling an arbitrary number of authors, a list of authors is generally more useful than a formatted string of authors; e.g. it's easy to determine the number of authors, to sort the authors, or to output the list with different formatting.

            \n

            To achieve your desired output you can add punctuation using str.join():

            \n
            authors_string = ' & '.join([', '.join(authors[:-1]), authors[-1]]\n                                if len(authors) > 2 else authors)\n
            \n

            The inner str.join() joins with commas all authors except the last one or, if there are fewer than 2 authors, no join takes place. The outer join adds the ampersand between the final 2 authors if required.

            \n soup wrap:

            Use a list to store the author names.

            authors = []
            num_authors = int(raw_input("How Many Authors? "))
            for i in range(num_authors):
                authors.append(raw_input("Enter Author's Name ({}): ".format(i+1)))
            

            Apart from handling an arbitrary number of authors, a list of authors is generally more useful than a formatted string of authors; e.g. it's easy to determine the number of authors, to sort the authors, or to output the list with different formatting.

            To achieve your desired output you can add punctuation using str.join():

            authors_string = ' & '.join([', '.join(authors[:-1]), authors[-1]]
                                            if len(authors) > 2 else authors)
            

            The inner str.join() joins with commas all authors except the last one or, if there are fewer than 2 authors, no join takes place. The outer join adds the ampersand between the final 2 authors if required.

            qid & accept id: (31912253, 31913572) query: Is there an efficient way to fill date gaps in python? soup:

            You can first make date_closed column as the index and then .reindex according to hourly_date_rng to populate the missing records.

            \n

            Here is an example.

            \n
            import json\nimport pandas as pd\n\njson_data = [\n    {\n      "amount": 0,\n      "date_closed": "2012-08-04 16:00:00"\n    },\n    {\n      "amount": 0,\n      "date_closed": "2012-08-04 20:00:00"\n    },\n    {\n      "amount": 0,\n      "date_closed": "2012-08-04 22:00:00"\n    }\n]\n\ndf = pd.read_json(json.dumps(json_data), orient='records')\ndf\n\n   amount          date_closed\n0       0  2012-08-03 16:00:00\n1       0  2012-08-04 20:00:00\n2       0  2012-08-04 22:00:00\n
            \n

            The hourly_date_rng looks like this

            \n
            hourly_date_rng = pd.date_range(start='2012-08-04 12:00:00', end='2012-08-4 23:00:00', freq='H')\nhourly_date_rng.name = 'date_closed'\n\nhourly_date_rng\n\nDatetimeIndex(['2012-08-04 12:00:00', '2012-08-04 13:00:00',\n               '2012-08-04 14:00:00', '2012-08-04 15:00:00',\n               '2012-08-04 16:00:00', '2012-08-04 17:00:00',\n               '2012-08-04 18:00:00', '2012-08-04 19:00:00',\n               '2012-08-04 20:00:00', '2012-08-04 21:00:00',\n               '2012-08-04 22:00:00', '2012-08-04 23:00:00'],\n              dtype='datetime64[ns]', name='date_closed', freq='H', tz=None)\n
            \n

            To align the index and fill the gaps

            \n
            # make the column datetime object instead of string\ndf['date_closed'] = pd.to_datetime(df['date_closed'])\n# align the index using .reindex\ndf.set_index('date_closed').reindex(hourly_date_rng).fillna(0).reset_index()\n\n           date_closed  amount\n0  2012-08-04 12:00:00       0\n1  2012-08-04 13:00:00       0\n2  2012-08-04 14:00:00       0\n3  2012-08-04 15:00:00       0\n4  2012-08-04 16:00:00       0\n5  2012-08-04 17:00:00       0\n6  2012-08-04 18:00:00       0\n7  2012-08-04 19:00:00       0\n8  2012-08-04 20:00:00       0\n9  2012-08-04 21:00:00       0\n10 2012-08-04 22:00:00       0\n11 2012-08-04 23:00:00       0\n
            \n

            Edit:

            \n

            To convert back the result to JSON.

            \n
            result = df.set_index('date_closed').reindex(hourly_date_rng).fillna(0).reset_index()\n\n# maybe convert date_closed column to string first\nresult['date_closed'] = pd.DatetimeIndex(result['date_closed']).to_native_types()\n# to json function\njson_result = result.to_json(orient='records')\n\n# print out the data with pretty print\nfrom pprint import pprint\npprint(json.loads(json_result))\n\n\n[{'amount': 0.0, 'date_closed': '2012-08-04 12:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 13:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 14:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 15:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 16:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 17:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 18:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 19:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 20:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 21:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 22:00:00'},\n {'amount': 0.0, 'date_closed': '2012-08-04 23:00:00'}]\n
            \n soup wrap:

            You can first make date_closed column as the index and then .reindex according to hourly_date_rng to populate the missing records.

            Here is an example.

            import json
            import pandas as pd
            
            json_data = [
                {
                  "amount": 0,
                  "date_closed": "2012-08-04 16:00:00"
                },
                {
                  "amount": 0,
                  "date_closed": "2012-08-04 20:00:00"
                },
                {
                  "amount": 0,
                  "date_closed": "2012-08-04 22:00:00"
                }
            ]
            
            df = pd.read_json(json.dumps(json_data), orient='records')
            df
            
               amount          date_closed
            0       0  2012-08-03 16:00:00
            1       0  2012-08-04 20:00:00
            2       0  2012-08-04 22:00:00
            

            The hourly_date_rng looks like this

            hourly_date_rng = pd.date_range(start='2012-08-04 12:00:00', end='2012-08-4 23:00:00', freq='H')
            hourly_date_rng.name = 'date_closed'
            
            hourly_date_rng
            
            DatetimeIndex(['2012-08-04 12:00:00', '2012-08-04 13:00:00',
                           '2012-08-04 14:00:00', '2012-08-04 15:00:00',
                           '2012-08-04 16:00:00', '2012-08-04 17:00:00',
                           '2012-08-04 18:00:00', '2012-08-04 19:00:00',
                           '2012-08-04 20:00:00', '2012-08-04 21:00:00',
                           '2012-08-04 22:00:00', '2012-08-04 23:00:00'],
                          dtype='datetime64[ns]', name='date_closed', freq='H', tz=None)
            

            To align the index and fill the gaps

            # make the column datetime object instead of string
            df['date_closed'] = pd.to_datetime(df['date_closed'])
            # align the index using .reindex
            df.set_index('date_closed').reindex(hourly_date_rng).fillna(0).reset_index()
            
                       date_closed  amount
            0  2012-08-04 12:00:00       0
            1  2012-08-04 13:00:00       0
            2  2012-08-04 14:00:00       0
            3  2012-08-04 15:00:00       0
            4  2012-08-04 16:00:00       0
            5  2012-08-04 17:00:00       0
            6  2012-08-04 18:00:00       0
            7  2012-08-04 19:00:00       0
            8  2012-08-04 20:00:00       0
            9  2012-08-04 21:00:00       0
            10 2012-08-04 22:00:00       0
            11 2012-08-04 23:00:00       0
            

            Edit:

            To convert back the result to JSON.

            result = df.set_index('date_closed').reindex(hourly_date_rng).fillna(0).reset_index()
            
            # maybe convert date_closed column to string first
            result['date_closed'] = pd.DatetimeIndex(result['date_closed']).to_native_types()
            # to json function
            json_result = result.to_json(orient='records')
            
            # print out the data with pretty print
            from pprint import pprint
            pprint(json.loads(json_result))
            
            
            [{'amount': 0.0, 'date_closed': '2012-08-04 12:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 13:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 14:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 15:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 16:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 17:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 18:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 19:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 20:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 21:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 22:00:00'},
             {'amount': 0.0, 'date_closed': '2012-08-04 23:00:00'}]
            
            qid & accept id: (31919765, 31926249) query: Choosing a box of data points from a plot soup:

            Basically, you're asking how to interactively select points in a rectangular region.

            \n

            There's a matplotlib widget which will handle part of this (interactively drawing a rectangle) for you: matplotlib.widgets.RectangleSelector. You'll need to handle what you want to do with the rectangular region, though.

            \n

            As a basic example, let's interactively highlight points inside a rectangle (this is an inefficient way to do that, but we'll need to build on this example to do what you want). After the figure window is closed, this will print out the points not selected (~ operates as logical_not on numpy arrays):

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.widgets import RectangleSelector\n\ndef main():\n    x, y = np.random.random((2, 100))\n    fig, ax = plt.subplots()\n    ax.scatter(x, y, color='black')\n    highlighter = Highlighter(ax, x, y)\n    plt.show()\n\n    selected_regions = highlighter.mask\n    # Print the points _not_ selected\n    print x[~selected_regions], y[~selected_regions]\n\nclass Highlighter(object):\n    def __init__(self, ax, x, y):\n        self.ax = ax\n        self.canvas = ax.figure.canvas\n        self.x, self.y = x, y\n        self.mask = np.zeros(x.shape, dtype=bool)\n\n        self._highlight = ax.scatter([], [], s=200, color='yellow', zorder=10)\n\n        self.selector = RectangleSelector(ax, self, useblit=True)\n\n    def __call__(self, event1, event2):\n        self.mask |= self.inside(event1, event2)\n        xy = np.column_stack([self.x[self.mask], self.y[self.mask]])\n        self._highlight.set_offsets(xy)\n        self.canvas.draw()\n\n    def inside(self, event1, event2):\n        """Returns a boolean mask of the points inside the rectangle defined by\n        event1 and event2."""\n        # Note: Could use points_inside_poly, as well\n        x0, x1 = sorted([event1.xdata, event2.xdata])\n        y0, y1 = sorted([event1.ydata, event2.ydata])\n        mask = ((self.x > x0) & (self.x < x1) &\n                (self.y > y0) & (self.y < y1))\n        return mask\n\nmain()\n
            \n

            However, you have an additional wrinkle, as you have two linked plots. You want a selection on the X-Y plot to also select things on the X-Z plot. Let's modify things to handle that:

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\nfrom matplotlib.widgets import RectangleSelector\n\ndef main():\n    x, y, z = np.random.random((3, 100))\n    z *= 10\n    fig, axes = plt.subplots(figsize=(6, 8), nrows=2, sharex=True)\n    axes[0].scatter(x, y, color='black')\n    axes[1].scatter(x, z, color='black')\n    axes[0].set(ylabel='Y')\n    axes[1].set(xlabel='X', ylabel='Y')\n\n    highlighter = Highlighter(axes, x, y, z)\n    plt.show()\n\n    selected_regions = highlighter.mask\n    print x[~selected_regions], y[~selected_regions], z[~selected_regions]\n\nclass Highlighter(object):\n    def __init__(self, axes, x, y, z):\n        self.axes = axes\n        self.canvas = axes[0].figure.canvas\n        self.x, self.y, self.z = x, y, z\n        self.mask = np.zeros(x.shape, dtype=bool)\n\n        self._highlights = [ax.scatter([], [], s=200, color='yellow', zorder=10)\n                               for ax in axes]\n\n        self._select1 = RectangleSelector(axes[0], self.select_xy, useblit=True)\n        self._select2 = RectangleSelector(axes[1], self.select_xz, useblit=True)\n\n    def select_xy(self, event1, event2):\n        self.mask |= self.inside(event1, event2, self.x, self.y)\n        self.update()\n\n    def select_xz(self, event1, event2):\n        self.mask |= self.inside(event1, event2, self.x, self.z)\n        self.update()\n\n    def update(self):\n        xy = np.column_stack([self.x[self.mask], self.y[self.mask]])\n        self._highlights[0].set_offsets(xy)\n\n        xz = np.column_stack([self.x[self.mask], self.z[self.mask]])\n        self._highlights[1].set_offsets(xz)\n\n        self.canvas.draw()\n\n    def inside(self, event1, event2, x, y):\n        x0, x1 = sorted([event1.xdata, event2.xdata])\n        y0, y1 = sorted([event1.ydata, event2.ydata])\n        return (x > x0) & (x < x1) & (y > y0) & (y < y1)\n\nmain()\n
            \n soup wrap:

            Basically, you're asking how to interactively select points in a rectangular region.

            There's a matplotlib widget which will handle part of this (interactively drawing a rectangle) for you: matplotlib.widgets.RectangleSelector. You'll need to handle what you want to do with the rectangular region, though.

            As a basic example, let's interactively highlight points inside a rectangle (this is an inefficient way to do that, but we'll need to build on this example to do what you want). After the figure window is closed, this will print out the points not selected (~ operates as logical_not on numpy arrays):

            import numpy as np
            import matplotlib.pyplot as plt
            from matplotlib.widgets import RectangleSelector
            
            def main():
                x, y = np.random.random((2, 100))
                fig, ax = plt.subplots()
                ax.scatter(x, y, color='black')
                highlighter = Highlighter(ax, x, y)
                plt.show()
            
                selected_regions = highlighter.mask
                # Print the points _not_ selected
                print x[~selected_regions], y[~selected_regions]
            
            class Highlighter(object):
                def __init__(self, ax, x, y):
                    self.ax = ax
                    self.canvas = ax.figure.canvas
                    self.x, self.y = x, y
                    self.mask = np.zeros(x.shape, dtype=bool)
            
                    self._highlight = ax.scatter([], [], s=200, color='yellow', zorder=10)
            
                    self.selector = RectangleSelector(ax, self, useblit=True)
            
                def __call__(self, event1, event2):
                    self.mask |= self.inside(event1, event2)
                    xy = np.column_stack([self.x[self.mask], self.y[self.mask]])
                    self._highlight.set_offsets(xy)
                    self.canvas.draw()
            
                def inside(self, event1, event2):
                    """Returns a boolean mask of the points inside the rectangle defined by
                    event1 and event2."""
                    # Note: Could use points_inside_poly, as well
                    x0, x1 = sorted([event1.xdata, event2.xdata])
                    y0, y1 = sorted([event1.ydata, event2.ydata])
                    mask = ((self.x > x0) & (self.x < x1) &
                            (self.y > y0) & (self.y < y1))
                    return mask
            
            main()
            

            However, you have an additional wrinkle, as you have two linked plots. You want a selection on the X-Y plot to also select things on the X-Z plot. Let's modify things to handle that:

            import numpy as np
            import matplotlib.pyplot as plt
            from matplotlib.widgets import RectangleSelector
            
            def main():
                x, y, z = np.random.random((3, 100))
                z *= 10
                fig, axes = plt.subplots(figsize=(6, 8), nrows=2, sharex=True)
                axes[0].scatter(x, y, color='black')
                axes[1].scatter(x, z, color='black')
                axes[0].set(ylabel='Y')
                axes[1].set(xlabel='X', ylabel='Y')
            
                highlighter = Highlighter(axes, x, y, z)
                plt.show()
            
                selected_regions = highlighter.mask
                print x[~selected_regions], y[~selected_regions], z[~selected_regions]
            
            class Highlighter(object):
                def __init__(self, axes, x, y, z):
                    self.axes = axes
                    self.canvas = axes[0].figure.canvas
                    self.x, self.y, self.z = x, y, z
                    self.mask = np.zeros(x.shape, dtype=bool)
            
                    self._highlights = [ax.scatter([], [], s=200, color='yellow', zorder=10)
                                           for ax in axes]
            
                    self._select1 = RectangleSelector(axes[0], self.select_xy, useblit=True)
                    self._select2 = RectangleSelector(axes[1], self.select_xz, useblit=True)
            
                def select_xy(self, event1, event2):
                    self.mask |= self.inside(event1, event2, self.x, self.y)
                    self.update()
            
                def select_xz(self, event1, event2):
                    self.mask |= self.inside(event1, event2, self.x, self.z)
                    self.update()
            
                def update(self):
                    xy = np.column_stack([self.x[self.mask], self.y[self.mask]])
                    self._highlights[0].set_offsets(xy)
            
                    xz = np.column_stack([self.x[self.mask], self.z[self.mask]])
                    self._highlights[1].set_offsets(xz)
            
                    self.canvas.draw()
            
                def inside(self, event1, event2, x, y):
                    x0, x1 = sorted([event1.xdata, event2.xdata])
                    y0, y1 = sorted([event1.ydata, event2.ydata])
                    return (x > x0) & (x < x1) & (y > y0) & (y < y1)
            
            main()
            
            qid & accept id: (31920197, 31920302) query: Read python function from a text file and assign it to variable soup:

            After you've executed the string, you can call func directly, as it has been added to your current namespace:

            \n
            >>> exec("""def func():\n    var = 5  # note that the semicolons are redundant and unpythonic\n    return var""")\n>>> func()\n5\n
            \n

            Per its documentation exec doesn't actually return anything, so there's no point assigning e.g. foo = exec(...).

            \n
            \n

            To see what names are locally defined in the code being executed, pass an empty dictionary to exec as the locals parameter:

            \n
            >>> ns = {}\n>>> exec("""def func():\n    var = 5\n    return var""", globals(), ns)\n>>> ns\n{'func': }\n
            \n

            You can then assign the function and call it as you normally would:

            \n
            >>> b, = ns.values()  # this will only work if only one name was defined\n>>> b()\n5\n
            \n soup wrap:

            After you've executed the string, you can call func directly, as it has been added to your current namespace:

            >>> exec("""def func():
                var = 5  # note that the semicolons are redundant and unpythonic
                return var""")
            >>> func()
            5
            

            Per its documentation exec doesn't actually return anything, so there's no point assigning e.g. foo = exec(...).


            To see what names are locally defined in the code being executed, pass an empty dictionary to exec as the locals parameter:

            >>> ns = {}
            >>> exec("""def func():
                var = 5
                return var""", globals(), ns)
            >>> ns
            {'func': }
            

            You can then assign the function and call it as you normally would:

            >>> b, = ns.values()  # this will only work if only one name was defined
            >>> b()
            5
            
            qid & accept id: (31936573, 31942850) query: Python - Plot function over a range PYPLOT soup:

            I'm assuming you mean the function is going to be some kind of mathematical function, for example:

            \n
            import math\ndef function(x, A, B):\n    return math.exp(A*x) * math.sin(B*x)\n
            \n

            Then I would define variables for the number of points to plot, and the x range, and then create lists using map, as below.

            \n
            import matplotlib.pyplot as plt\npoints = 1e4 #Number of points\nxmin, xmax = -1, 5\nxlist = map(lambda x: float(xmax - xmin)*x/points, range(points+1))\nylist = map(lambda y: function(y, -1, 5), xlist)\nplt.plot(xlist, ylist)\nplt.show()\n
            \n soup wrap:

            I'm assuming you mean the function is going to be some kind of mathematical function, for example:

            import math
            def function(x, A, B):
                return math.exp(A*x) * math.sin(B*x)
            

            Then I would define variables for the number of points to plot, and the x range, and then create lists using map, as below.

            import matplotlib.pyplot as plt
            points = 1e4 #Number of points
            xmin, xmax = -1, 5
            xlist = map(lambda x: float(xmax - xmin)*x/points, range(points+1))
            ylist = map(lambda y: function(y, -1, 5), xlist)
            plt.plot(xlist, ylist)
            plt.show()
            
            qid & accept id: (31941020, 31941083) query: How to reference/iterate multiple lists in Python soup:

            Your list ccys should contain the other list variables themselves (as shallow copies), not their names as strings

            \n
            ccys = [audcad, audchf, audjpy]\n
            \n

            Then your code will work fine

            \n
            for ccy in ccys:\n    ccy[13] += 10\n\n>>> audcad\n['audcad', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]\n>>> audchf\n['audchf', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]\n>>> audjpy\n['audjpy', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]\n
            \n soup wrap:

            Your list ccys should contain the other list variables themselves (as shallow copies), not their names as strings

            ccys = [audcad, audchf, audjpy]
            

            Then your code will work fine

            for ccy in ccys:
                ccy[13] += 10
            
            >>> audcad
            ['audcad', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]
            >>> audchf
            ['audchf', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]
            >>> audjpy
            ['audjpy', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10]
            
            qid & accept id: (31957215, 31957287) query: Order a list of dictionaries in python soup:

            You can pass in a custom key to the sorted function with reverse=True to get the descending order:

            \n
            >>> res = [{'cpunumber': '40.0', 'servername': 'f02wn01', 'cpucore_sum': '5.0',   'cpucore_00': '0.399414', 'datetime': '1438887255'}, \n...   {'cpunumber': '40.0', 'servername': 'f02wn01', 'cpucore_sum': '9.375', 'cpucore_00': '1.597656', 'datetime': '1438887250'}, \n...   {'cpunumber': '40.0', 'servername': 'f02wn01', 'cpucore_sum': '3.195312', 'cpucore_00': '0.0', 'datetime': '1438887240'}, \n...   {'cpunumber': '40.0', 'servername': 'f02wn01', 'cpucore_sum': '5.59375', 'cpucore_00': '1.0', 'datetime': '1438887245'}]\n>>> sorted(res, key=lambda x: x["datetime"], reverse=True)\n[{'cpucore_00': '0.399414',\n  'cpucore_sum': '5.0',\n  'cpunumber': '40.0',\n  'datetime': '1438887255',\n  'servername': 'f02wn01'},\n {'cpucore_00': '1.597656',\n  'cpucore_sum': '9.375',\n  'cpunumber': '40.0',\n  'datetime': '1438887250',\n  'servername': 'f02wn01'},\n {'cpucore_00': '1.0',\n  'cpucore_sum': '5.59375',\n  'cpunumber': '40.0',\n  'datetime': '1438887245',\n  'servername': 'f02wn01'},\n {'cpucore_00': '0.0',\n  'cpucore_sum': '3.195312',\n  'cpunumber': '40.0',\n  'datetime': '1438887240',\n  'servername': 'f02wn01'}]\n
            \n

            You can also sort it in place using the .sort method of the list (use reverse=True for descending order):

            \n
            >>> res.sort(key=lambda x: x["datetime"])\n>>> res\n[{'cpucore_sum': '3.195312', 'cpucore_00': '0.0', 'servername': 'f02wn01', 'cpunumber': '40.0', 'datetime': '1438887240'}, {'cpucore_sum': '5.59375', 'cpucore_00': '1.0', 'servername': 'f02wn01', 'cpunumber': '40.0', 'datetime': '1438887245'}, {'cpucore_sum': '9.375', 'cpucore_00': '1.597656', 'servername': 'f02wn01', 'cpunumber': '40.0', 'datetime': '1438887250'}, {'cpucore_sum': '5.0', 'cpucore_00': '0.399414', 'servername': 'f02wn01', 'cpunumber': '40.0', 'datetime': '1438887255'}]\n
            \n

            In case all your dicts are not guranteed to have the "datetime" key, you can use x.get("datetime") instead of x["datetime"].

            \n soup wrap:

            You can pass in a custom key to the sorted function with reverse=True to get the descending order:

            >>> res = [{'cpunumber': '40.0', 'servername': 'f02wn01', 'cpucore_sum': '5.0',   'cpucore_00': '0.399414', 'datetime': '1438887255'}, 
            ...   {'cpunumber': '40.0', 'servername': 'f02wn01', 'cpucore_sum': '9.375', 'cpucore_00': '1.597656', 'datetime': '1438887250'}, 
            ...   {'cpunumber': '40.0', 'servername': 'f02wn01', 'cpucore_sum': '3.195312', 'cpucore_00': '0.0', 'datetime': '1438887240'}, 
            ...   {'cpunumber': '40.0', 'servername': 'f02wn01', 'cpucore_sum': '5.59375', 'cpucore_00': '1.0', 'datetime': '1438887245'}]
            >>> sorted(res, key=lambda x: x["datetime"], reverse=True)
            [{'cpucore_00': '0.399414',
              'cpucore_sum': '5.0',
              'cpunumber': '40.0',
              'datetime': '1438887255',
              'servername': 'f02wn01'},
             {'cpucore_00': '1.597656',
              'cpucore_sum': '9.375',
              'cpunumber': '40.0',
              'datetime': '1438887250',
              'servername': 'f02wn01'},
             {'cpucore_00': '1.0',
              'cpucore_sum': '5.59375',
              'cpunumber': '40.0',
              'datetime': '1438887245',
              'servername': 'f02wn01'},
             {'cpucore_00': '0.0',
              'cpucore_sum': '3.195312',
              'cpunumber': '40.0',
              'datetime': '1438887240',
              'servername': 'f02wn01'}]
            

            You can also sort it in place using the .sort method of the list (use reverse=True for descending order):

            >>> res.sort(key=lambda x: x["datetime"])
            >>> res
            [{'cpucore_sum': '3.195312', 'cpucore_00': '0.0', 'servername': 'f02wn01', 'cpunumber': '40.0', 'datetime': '1438887240'}, {'cpucore_sum': '5.59375', 'cpucore_00': '1.0', 'servername': 'f02wn01', 'cpunumber': '40.0', 'datetime': '1438887245'}, {'cpucore_sum': '9.375', 'cpucore_00': '1.597656', 'servername': 'f02wn01', 'cpunumber': '40.0', 'datetime': '1438887250'}, {'cpucore_sum': '5.0', 'cpucore_00': '0.399414', 'servername': 'f02wn01', 'cpunumber': '40.0', 'datetime': '1438887255'}]
            

            In case all your dicts are not guranteed to have the "datetime" key, you can use x.get("datetime") instead of x["datetime"].

            qid & accept id: (31978879, 31984120) query: 2D Color coded scatter plot with user defined color range and static colormap soup:

            You will have to iterate over all your data files to get the maximum value for vel, I have added a few lines of code (that need to be adjusted to fit your case) that will do that.

            \n

            Therefore, your colorbar line has been changed to use the max_vel, allowing you to get rid of that code using the fixed value of 8000.

            \n

            Additionally, I took the liberty to remove the black edges around the points, because I find that they 'obfuscate' the color of the point.

            \n

            Lastly, I have added adjusted your plot code to use an axis object, which is required to have a colorbar.

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\n# This is needed to iterate over your data files\nimport glob \n\n# Loop over all your data files to get the maximum value for 'vel'. \n# You will have to adjust this for your code\n"""max_vel = 0\nfor i in glob.glob(,'r') as fr:\n    # Iterate over all lines\n    if  > max_vel:\n        max_vel = """\n\n# Create Map\ncm = plt.cm.get_cmap('RdYlBu')\nx,y,vel = np.loadtxt('finaldata_temp.txt', skiprows=0, unpack=True)\n\n# Plot the data\nfig=plt.figure()\nfig.patch.set_facecolor('white')\n# Here we switch to an axis object\n# Additionally, you can plot several of your files in the same figure using\n# the subplot option.\nax=fig.add_subplot(111)\ns = ax.scatter(x,y,c=vel,edgecolor=''))\n# Here we assign the color bar to the axis object\ncb = plt.colorbar(mappable=s,ax=ax,cmap=cm)\n# Here we set the range of the color bar based on the maximum observed value\n# NOTE: This line only changes the calculated color and not the display \n# 'range' of the legend next to the plot, for that we need to switch to \n# ColorbarBase (see second code snippet).\ncb.setlim(0,max_vel)\ncb.set_label('Value of \'vel\'')\nplt.show()\n
            \n

            Snippet, demonstrating ColorbarBase

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\nimport matplotlib as mpl\n\ncm = plt.cm.get_cmap('RdYlBu')\nx = [1,5,10]\ny = [2,6,9]\nvel = [7,2,1]\n\n# Plot the data\nfig=plt.figure()\nfig.patch.set_facecolor('white')\nax=fig.add_subplot(111)\ns = ax.scatter(x,y,c=vel,edgecolor=''))\nnorm = mpl.colors.Normalize(vmin=0, vmax=10)\nax1 = fig.add_axes([0.95, 0.1, 0.01, 0.8])\ncb = mpl.colorbar.ColorbarBase(ax1,norm=norm,cmap=cm,orientation='vertical')\ncb.set_clim(vmin = 0, vmax = 10)\ncb.set_label('Value of \'vel\'')\nplt.show()\n
            \n

            This produces the following plot

            \n

            Matplotlib plot of sample data using ColorbarBase

            \n

            For more examples of what you can do with the colorbar, specifically the more flexible ColorbarBase, I would suggest that you check the documentation -> http://matplotlib.org/examples/api/colorbar_only.html

            \n soup wrap:

            You will have to iterate over all your data files to get the maximum value for vel, I have added a few lines of code (that need to be adjusted to fit your case) that will do that.

            Therefore, your colorbar line has been changed to use the max_vel, allowing you to get rid of that code using the fixed value of 8000.

            Additionally, I took the liberty to remove the black edges around the points, because I find that they 'obfuscate' the color of the point.

            Lastly, I have added adjusted your plot code to use an axis object, which is required to have a colorbar.

            import numpy as np
            import matplotlib.pyplot as plt
            # This is needed to iterate over your data files
            import glob 
            
            # Loop over all your data files to get the maximum value for 'vel'. 
            # You will have to adjust this for your code
            """max_vel = 0
            for i in glob.glob(,'r') as fr:
                # Iterate over all lines
                if  > max_vel:
                    max_vel = """
            
            # Create Map
            cm = plt.cm.get_cmap('RdYlBu')
            x,y,vel = np.loadtxt('finaldata_temp.txt', skiprows=0, unpack=True)
            
            # Plot the data
            fig=plt.figure()
            fig.patch.set_facecolor('white')
            # Here we switch to an axis object
            # Additionally, you can plot several of your files in the same figure using
            # the subplot option.
            ax=fig.add_subplot(111)
            s = ax.scatter(x,y,c=vel,edgecolor=''))
            # Here we assign the color bar to the axis object
            cb = plt.colorbar(mappable=s,ax=ax,cmap=cm)
            # Here we set the range of the color bar based on the maximum observed value
            # NOTE: This line only changes the calculated color and not the display 
            # 'range' of the legend next to the plot, for that we need to switch to 
            # ColorbarBase (see second code snippet).
            cb.setlim(0,max_vel)
            cb.set_label('Value of \'vel\'')
            plt.show()
            

            Snippet, demonstrating ColorbarBase

            import numpy as np
            import matplotlib.pyplot as plt
            import matplotlib as mpl
            
            cm = plt.cm.get_cmap('RdYlBu')
            x = [1,5,10]
            y = [2,6,9]
            vel = [7,2,1]
            
            # Plot the data
            fig=plt.figure()
            fig.patch.set_facecolor('white')
            ax=fig.add_subplot(111)
            s = ax.scatter(x,y,c=vel,edgecolor=''))
            norm = mpl.colors.Normalize(vmin=0, vmax=10)
            ax1 = fig.add_axes([0.95, 0.1, 0.01, 0.8])
            cb = mpl.colorbar.ColorbarBase(ax1,norm=norm,cmap=cm,orientation='vertical')
            cb.set_clim(vmin = 0, vmax = 10)
            cb.set_label('Value of \'vel\'')
            plt.show()
            

            This produces the following plot

            Matplotlib plot of sample data using ColorbarBase

            For more examples of what you can do with the colorbar, specifically the more flexible ColorbarBase, I would suggest that you check the documentation -> http://matplotlib.org/examples/api/colorbar_only.html

            qid & accept id: (32018206, 32018237) query: python- finding total number of items of certain range in a list soup:

            You can simply sum how many times an element is the the range 3-4:

            \n
            l = [3.4,4.5,3.2,5.6]\n\nprint(sum(3 <= ele <= 4  for ele in l))\n2\n
            \n

            Or using numpy:

            \n

            import numpy as np

            \n
            l = [3.4, 4.5, 3.2, 5.6]\narr = np.array(l)\n\nprint(((3 <= arr) & (arr <= 4)).sum())\n2\n
            \n

            If you want to check if you actually have a float also use issinstance:

            \n
              print(sum(3 <= ele <= 4 and isinstance(ele,float) for ele in l))\n
            \n

            If you actually want to count the total number of floats in between two actual integers:

            \n
            l = [1.3, 3, 3.4, 34.5, 3.2, 4, 5.6]\n\n\ndef find_fs(l, i, j):\n    try:\n        start, end = l.index(i), l.index(j)\n        return sum(isinstance(ele, float) for ele in islice(l, start + 1, end))\n    except IndexError:\n        return 0\nprint(find_fs(l,3, 4))\n3\n
            \n soup wrap:

            You can simply sum how many times an element is the the range 3-4:

            l = [3.4,4.5,3.2,5.6]
            
            print(sum(3 <= ele <= 4  for ele in l))
            2
            

            Or using numpy:

            import numpy as np

            l = [3.4, 4.5, 3.2, 5.6]
            arr = np.array(l)
            
            print(((3 <= arr) & (arr <= 4)).sum())
            2
            

            If you want to check if you actually have a float also use issinstance:

              print(sum(3 <= ele <= 4 and isinstance(ele,float) for ele in l))
            

            If you actually want to count the total number of floats in between two actual integers:

            l = [1.3, 3, 3.4, 34.5, 3.2, 4, 5.6]
            
            
            def find_fs(l, i, j):
                try:
                    start, end = l.index(i), l.index(j)
                    return sum(isinstance(ele, float) for ele in islice(l, start + 1, end))
                except IndexError:
                    return 0
            print(find_fs(l,3, 4))
            3
            
            qid & accept id: (32026028, 32026341) query: Run multiple threads until one exits in python soup:

            You can do it using multiprocessing.

            \n

            Let's say we have two functions who calculate the value of Pi, calculate1() and calculate2(). In this case calculate2() is faster.

            \n
            import multiprocessing\nimport time\n\ndef calculate1(result_queue):\n    print "calculate1 started"\n    time.sleep(10)\n    result = 3.14\n    result_queue.put(result)\n    print "calculate1 found the result!"\n\ndef calculate2(result_queue):\n    print "calculate2 started"\n    time.sleep(2)\n    result = 3.14\n    result_queue.put(result)\n    print "calculate2 found the result!"\n\nresult_queue = multiprocessing.Queue()\n\nprocess1 = multiprocessing.Process(target=calculate1, args=[result_queue])\nprocess2 = multiprocessing.Process(target=calculate2, args=[result_queue])\n\nprocess1.start()\nprocess2.start()\n\nprint "Calculating the result with 2 threads."\n\nresult = result_queue.get() # waits until any of the proccess have `.put()` a result\n\nfor process in [process1, process2]: # then kill them all off\n    process.terminate()\n\nprint "Got result:", result\n
            \n

            This outputs:

            \n
            calculate1 started\ncalculate2 started\ncalculate2 found the result!\nGot result: 3.14\n
            \n soup wrap:

            You can do it using multiprocessing.

            Let's say we have two functions who calculate the value of Pi, calculate1() and calculate2(). In this case calculate2() is faster.

            import multiprocessing
            import time
            
            def calculate1(result_queue):
                print "calculate1 started"
                time.sleep(10)
                result = 3.14
                result_queue.put(result)
                print "calculate1 found the result!"
            
            def calculate2(result_queue):
                print "calculate2 started"
                time.sleep(2)
                result = 3.14
                result_queue.put(result)
                print "calculate2 found the result!"
            
            result_queue = multiprocessing.Queue()
            
            process1 = multiprocessing.Process(target=calculate1, args=[result_queue])
            process2 = multiprocessing.Process(target=calculate2, args=[result_queue])
            
            process1.start()
            process2.start()
            
            print "Calculating the result with 2 threads."
            
            result = result_queue.get() # waits until any of the proccess have `.put()` a result
            
            for process in [process1, process2]: # then kill them all off
                process.terminate()
            
            print "Got result:", result
            

            This outputs:

            calculate1 started
            calculate2 started
            calculate2 found the result!
            Got result: 3.14
            
            qid & accept id: (32077660, 32077884) query: Combine multidimensional array by group python soup:

            You need to pass two keys to sorted, the name and date, then use str.join to concat the ip's and times

            \n
            from itertools import groupby\nfrom operator import itemgetter\n\nout = []\n\nfor _, v in groupby(sorted(data, key=itemgetter(0, 3)),key=itemgetter(0,3)):\n    v = list(v)    \n    ips = ", ".join([sub[1] for sub in v])\n    tmes = ", ".join([sub[2] for sub in v])\n    out.append([v[0][0], ips, tmes, v[0][-1]])\n\nprint(out)\n\n['blah', '172.18.74.149, 172.18.74.146', '11:18:33.846, 12:27:38.985', '2015_08_12'], \n['test', '172.18.74.146, 172.18.74.148', '13:05:43.834, 12:27:39.016', '2015_08_07']]\n
            \n

            Or without sorting using dict to group:

            \n
            d = {}\n\nfor nm, ip, tm, dte in data:\n    key = nm, dte\n    if key in d:\n        v = d[key]\n        v[1] += ", {}".format(ip)\n        v[2] += ", {}".format(dte)\n    else:\n        d[key] = [nm, ip, tm, dte]\n\nprint(list(d.values()))\n
            \n

            Output:

            \n
            [['test', '172.18.74.146, 172.18.74.148', '13:05:43.834, 2015_08_07', '2015_08_07'], \n['blah', '172.18.74.149, 172.18.74.146', '11:18:33.846, 2015_08_12', '2015_08_12']]\n
            \n soup wrap:

            You need to pass two keys to sorted, the name and date, then use str.join to concat the ip's and times

            from itertools import groupby
            from operator import itemgetter
            
            out = []
            
            for _, v in groupby(sorted(data, key=itemgetter(0, 3)),key=itemgetter(0,3)):
                v = list(v)    
                ips = ", ".join([sub[1] for sub in v])
                tmes = ", ".join([sub[2] for sub in v])
                out.append([v[0][0], ips, tmes, v[0][-1]])
            
            print(out)
            
            ['blah', '172.18.74.149, 172.18.74.146', '11:18:33.846, 12:27:38.985', '2015_08_12'], 
            ['test', '172.18.74.146, 172.18.74.148', '13:05:43.834, 12:27:39.016', '2015_08_07']]
            

            Or without sorting using dict to group:

            d = {}
            
            for nm, ip, tm, dte in data:
                key = nm, dte
                if key in d:
                    v = d[key]
                    v[1] += ", {}".format(ip)
                    v[2] += ", {}".format(dte)
                else:
                    d[key] = [nm, ip, tm, dte]
            
            print(list(d.values()))
            

            Output:

            [['test', '172.18.74.146, 172.18.74.148', '13:05:43.834, 2015_08_07', '2015_08_07'], 
            ['blah', '172.18.74.149, 172.18.74.146', '11:18:33.846, 2015_08_12', '2015_08_12']]
            
            qid & accept id: (32102801, 32102872) query: Step through items in dictionary in python soup:

            Python Tutorial: Looping Techniques

            \n

            When you have the key, use the key to look up the value. Don't iterate through the key.

            \n
            for key in myDict:\n    for eachValue in myDict[key]:\n        for char in eachValue:\n            do something\n
            \n

            More efficiently, iterate through the items and avoid the extra lookup:

            \n
            for key, value in myDict.items():\n    for eachValue in value:\n        for char in eachValue:\n            do something\n
            \n soup wrap:

            Python Tutorial: Looping Techniques

            When you have the key, use the key to look up the value. Don't iterate through the key.

            for key in myDict:
                for eachValue in myDict[key]:
                    for char in eachValue:
                        do something
            

            More efficiently, iterate through the items and avoid the extra lookup:

            for key, value in myDict.items():
                for eachValue in value:
                    for char in eachValue:
                        do something
            
            qid & accept id: (32148804, 32148848) query: Python: Opening a file within a print() function soup:

            Use the with construct.

            \n
            with open('file.txt', 'r') as f:\n    print(f.read())\n
            \n

            If the file is very large, I suggest iterating over it and printing it a line at a time:

            \n
            with open('file.txt', 'r') as f:\n    for line in f:\n        print(line, end='')\n
            \n

            Of course, if the file is that large, it's probably not useful to print it to an ordinary console, but this practice is very useful for processing large files.

            \n soup wrap:

            Use the with construct.

            with open('file.txt', 'r') as f:
                print(f.read())
            

            If the file is very large, I suggest iterating over it and printing it a line at a time:

            with open('file.txt', 'r') as f:
                for line in f:
                    print(line, end='')
            

            Of course, if the file is that large, it's probably not useful to print it to an ordinary console, but this practice is very useful for processing large files.

            qid & accept id: (32161369, 32161709) query: How to get a vector from a list in list in python? soup:

            The expression e[:][2][1] returns element 1 of the element 2 of e[:], where e[:] is a shallow copy of e.

            \n

            To express the idea of "element 1 of element 2 of each row of e", you should use a list comprehension:

            \n
            f = [x[2][1] for x in e]\n
            \n
            \n

            Bonus: You can also define a class that supports indexing across the elements of an iterable.

            \n
            class Comprehension(object):\n    def __init__(self, iterable):\n        self._iterable = iterable\n\n    def __iter__(self):\n        return iter(self._iterable)\n\n    def __getattr__(self, name):\n        return Comprehension(getattr(elt, name) for elt in self._iterable)\n\n    def __getitem__(self, item):\n        return Comprehension(elt[item] for elt in self._iterable)\n\n    def __call__(self, *args, **kwargs):\n        return Comprehension(elt(*args, **kwargs) for elt in self._iterable)\n
            \n

            Then you can use it like this:

            \n
            f = list(Comprehension(e)[2][1])\n
            \n

            This class also supports attribute lookup and function calls.

            \n
            >>> list(Comprehension(range(10)).bit_length())\n[0, 1, 2, 2, 3, 3, 3, 3, 4, 4]\n
            \n soup wrap:

            The expression e[:][2][1] returns element 1 of the element 2 of e[:], where e[:] is a shallow copy of e.

            To express the idea of "element 1 of element 2 of each row of e", you should use a list comprehension:

            f = [x[2][1] for x in e]
            

            Bonus: You can also define a class that supports indexing across the elements of an iterable.

            class Comprehension(object):
                def __init__(self, iterable):
                    self._iterable = iterable
            
                def __iter__(self):
                    return iter(self._iterable)
            
                def __getattr__(self, name):
                    return Comprehension(getattr(elt, name) for elt in self._iterable)
            
                def __getitem__(self, item):
                    return Comprehension(elt[item] for elt in self._iterable)
            
                def __call__(self, *args, **kwargs):
                    return Comprehension(elt(*args, **kwargs) for elt in self._iterable)
            

            Then you can use it like this:

            f = list(Comprehension(e)[2][1])
            

            This class also supports attribute lookup and function calls.

            >>> list(Comprehension(range(10)).bit_length())
            [0, 1, 2, 2, 3, 3, 3, 3, 4, 4]
            
            qid & accept id: (32174535, 32174639) query: Converting dictionary of dictionary of dictionary to pandas data frame soup:

            You would first need to transform the nested dictionary into a list of dictionaries or a dictionary of lists, and then only you can convert it to a DataFrame. Example (converting to a list of dictionaries) -

            \n
            list_of_dict = []\nfor key, value in nested_dict.items():\n    for key1, value1 in value.items():\n        for key2,value2 in value1.items():\n            list_of_dict.append({'A':key,'B':key1,'C':key2,'D':value2})\n\ndf = pd.DataFrame(list_of_dict)\n
            \n

            Use the correct column names instead of 'A', 'B' , etc.

            \n
            \n

            Example/Demo -

            \n
            In [2]: nested_dict = {'2': {'lagtime': {'darkgreen': 210,\n   ...:    'darkorange': 141,\n   ...:    'pink': 142,\n   ...:    'red': 141}}}\n\nIn [4]: list_of_dict = []\n\nIn [7]: for key, value in nested_dict.items():\n   ...:     for key1, value1 in value.items():\n   ...:         for key2,value2 in value1.items():\n   ...:             list_of_dict.append({'A':key,'B':key1,'C':key2,'D':value2})\n   ...:\n\nIn [8]: df = pd.DataFrame(list_of_dict)\n\nIn [9]: df\nOut[9]:\n   A        B           C    D\n0  2  lagtime   darkgreen  210\n1  2  lagtime        pink  142\n2  2  lagtime  darkorange  141\n3  2  lagtime         red  141\n
            \n soup wrap:

            You would first need to transform the nested dictionary into a list of dictionaries or a dictionary of lists, and then only you can convert it to a DataFrame. Example (converting to a list of dictionaries) -

            list_of_dict = []
            for key, value in nested_dict.items():
                for key1, value1 in value.items():
                    for key2,value2 in value1.items():
                        list_of_dict.append({'A':key,'B':key1,'C':key2,'D':value2})
            
            df = pd.DataFrame(list_of_dict)
            

            Use the correct column names instead of 'A', 'B' , etc.


            Example/Demo -

            In [2]: nested_dict = {'2': {'lagtime': {'darkgreen': 210,
               ...:    'darkorange': 141,
               ...:    'pink': 142,
               ...:    'red': 141}}}
            
            In [4]: list_of_dict = []
            
            In [7]: for key, value in nested_dict.items():
               ...:     for key1, value1 in value.items():
               ...:         for key2,value2 in value1.items():
               ...:             list_of_dict.append({'A':key,'B':key1,'C':key2,'D':value2})
               ...:
            
            In [8]: df = pd.DataFrame(list_of_dict)
            
            In [9]: df
            Out[9]:
               A        B           C    D
            0  2  lagtime   darkgreen  210
            1  2  lagtime        pink  142
            2  2  lagtime  darkorange  141
            3  2  lagtime         red  141
            
            qid & accept id: (32182116, 32204999) query: Put the result of simple tag into a variable soup:

            Please use assignment tag if you are using django < 1.9. The doc is here. I posted the example in the docs here:

            \n
            @register.assignment_tag\ndef get_current_time(format_string):\n    return datetime.datetime.now().strftime(format_string)\n
            \n

            Then in template:

            \n
            {% get_current_time "%Y-%m-%d %I:%M %p" as the_time %}\n

            The time is {{ the_time }}.

            \n
            \n

            You can see that the template tag result becomes a variable using as statement. You can use the_time however you like, including if statement.

            \n

            Also quote from the docs:

            \n
            \n

            Deprecated since version 1.9: simple_tag can now store results in a\n template variable and should be used instead.

            \n
            \n soup wrap:

            Please use assignment tag if you are using django < 1.9. The doc is here. I posted the example in the docs here:

            @register.assignment_tag
            def get_current_time(format_string):
                return datetime.datetime.now().strftime(format_string)
            

            Then in template:

            {% get_current_time "%Y-%m-%d %I:%M %p" as the_time %}
            

            The time is {{ the_time }}.

            You can see that the template tag result becomes a variable using as statement. You can use the_time however you like, including if statement.

            Also quote from the docs:

            Deprecated since version 1.9: simple_tag can now store results in a template variable and should be used instead.

            qid & accept id: (32214596, 32217247) query: How can I quickly compare a list and a set? soup:

            The fundamental problem is that you aren't using appropriate data structures for the job.\nUsing tuples to represent sets might be ok for small sets in this case, \nbut for large sets, you can expect to search an average\nof half the combined size of the sets for each element in the list\nthat is actually in one of the sets.\nFor each element in the list that is not in either set,\nwe must search all elements of both sets to determine that.

            \n

            So any algorithm based on these data structures\n(i.e., representing sets using tuples)\nwill at best be O(m*n), where m is the size of the list\nand n is the size of the sets.

            \n

            There really isn't any way we can reduce the m component\n— we have to examine each element of the list to determine which set\n(if any) it belongs to.

            \n

            We can, however, reduce the n component.\nHow? By using a more efficient data structure for our sets.

            \n

            Fortunately, this is not hard, as Python includes a built-in set type.\nSo the first step is to construct the two sets:

            \n
            a = set((1, 3))\nb = set((2, 5))\n
            \n

            Now, we can easily (and efficiently) determine if an element e is in one of the sets:

            \n
            e = 1\ne in a # => True\ne in b # => False\n
            \n

            Now, we just need to loop over the input list and accumulate the result:

            \n
            l = [1, 1, 3, 2, 5, 7, 8, 3, 2, 1]\nresult = 0 # accumulator for result\nfor e in l:\n  if e in a:\n    result += 1\n  elif e in b:\n    result -= 1\n\nprint result # prints "2"\n
            \n soup wrap:

            The fundamental problem is that you aren't using appropriate data structures for the job. Using tuples to represent sets might be ok for small sets in this case, but for large sets, you can expect to search an average of half the combined size of the sets for each element in the list that is actually in one of the sets. For each element in the list that is not in either set, we must search all elements of both sets to determine that.

            So any algorithm based on these data structures (i.e., representing sets using tuples) will at best be O(m*n), where m is the size of the list and n is the size of the sets.

            There really isn't any way we can reduce the m component — we have to examine each element of the list to determine which set (if any) it belongs to.

            We can, however, reduce the n component. How? By using a more efficient data structure for our sets.

            Fortunately, this is not hard, as Python includes a built-in set type. So the first step is to construct the two sets:

            a = set((1, 3))
            b = set((2, 5))
            

            Now, we can easily (and efficiently) determine if an element e is in one of the sets:

            e = 1
            e in a # => True
            e in b # => False
            

            Now, we just need to loop over the input list and accumulate the result:

            l = [1, 1, 3, 2, 5, 7, 8, 3, 2, 1]
            result = 0 # accumulator for result
            for e in l:
              if e in a:
                result += 1
              elif e in b:
                result -= 1
            
            print result # prints "2"
            
            qid & accept id: (32215965, 32216042) query: dict of internal keys soup:

            The value key pairs must exist before you reference them. You can add the 'fruits' value after building the dictionary like this:

            \n
            d = {'apple': 'red',\n'orange': 'orange',\n'lemon': 'yellow',\n'milk': 'white',\n'coffee': 'brown'}\n\nd['fruits'] = [d['apple'], d['orange'], d['lemon']] \n\nprint d['fruits']\n
            \n

            I'm not sure if you really wanted to output the fruits and not their associated color though, this will output:

            \n
            ['red', 'orange', 'yellow']\n
            \n

            If you change any of the fruits values though, (for example 'red' to 'green') this will not automatically update the value in the 'fruits' list, in case you wanted that.

            \n soup wrap:

            The value key pairs must exist before you reference them. You can add the 'fruits' value after building the dictionary like this:

            d = {'apple': 'red',
            'orange': 'orange',
            'lemon': 'yellow',
            'milk': 'white',
            'coffee': 'brown'}
            
            d['fruits'] = [d['apple'], d['orange'], d['lemon']] 
            
            print d['fruits']
            

            I'm not sure if you really wanted to output the fruits and not their associated color though, this will output:

            ['red', 'orange', 'yellow']
            

            If you change any of the fruits values though, (for example 'red' to 'green') this will not automatically update the value in the 'fruits' list, in case you wanted that.

            qid & accept id: (32231491, 32231749) query: Python removing duplicates in list and 1==1.0 True soup:

            You can try

            \n
            In [3]: [value for _, value in frozenset((type(x), x) for x in l)]\nOut[3]: [1.0, '1', 1, 'dsa', 'asd']\n
            \n

            We create a (temporary) frozenset of tuples containing both element and its type - to keep elements that are equal (such as 1, 1.0 and True) but have different types. Then we iterate over it, unpack tuples and retrieve elements (value).

            \n

            Sure, we could as well use ordinary set, which is mutable, but we don't need mutability because our set is temporary.

            \n

            Note that this won't necessarily preserve the original order.

            \n
            \n

            If you need the original order preserved, use collections.OrderedDict, which is a hash map (just like regular dict) and therefore works similarly to frozenset/set

            \n
            In [16]: from collections import OrderedDict\n\nIn [17]: [value for _, value in OrderedDict.fromkeys((type(x), x) for x in l)]\nOut[17]: ['asd', 'dsa', 1, '1', 1.0]\n
            \n soup wrap:

            You can try

            In [3]: [value for _, value in frozenset((type(x), x) for x in l)]
            Out[3]: [1.0, '1', 1, 'dsa', 'asd']
            

            We create a (temporary) frozenset of tuples containing both element and its type - to keep elements that are equal (such as 1, 1.0 and True) but have different types. Then we iterate over it, unpack tuples and retrieve elements (value).

            Sure, we could as well use ordinary set, which is mutable, but we don't need mutability because our set is temporary.

            Note that this won't necessarily preserve the original order.


            If you need the original order preserved, use collections.OrderedDict, which is a hash map (just like regular dict) and therefore works similarly to frozenset/set

            In [16]: from collections import OrderedDict
            
            In [17]: [value for _, value in OrderedDict.fromkeys((type(x), x) for x in l)]
            Out[17]: ['asd', 'dsa', 1, '1', 1.0]
            
            qid & accept id: (32244565, 32244976) query: Load JPEG from URL to skimage without temporary file soup:

            From imread documentation:

            \n
            \n

            Image file name, e.g. test.jpg or URL

            \n
            \n

            So you can directly pass your URL:

            \n
            io.imread(url)\n
            \n

            Notice that it will still create a temporary file for processing the image...

            \n
            \n

            Edit:

            \n

            The library imread also have a method imread_from_blob which accept a string as input. So you may pass your data directly to this function.

            \n
            from imread import imread_from_blob\nimg_data = imread_from_blob(data, 'jpg')\n\n>>> img_data\narray([[[ 23, 123, 149],\n[ 22, 120, 147],\n[ 22, 118, 143],\n...,\n
            \n

            The second parameter is the extension typically associated with this blob. If None is given, then detect_format is used to auto-detect.

            \n soup wrap:

            From imread documentation:

            Image file name, e.g. test.jpg or URL

            So you can directly pass your URL:

            io.imread(url)
            

            Notice that it will still create a temporary file for processing the image...


            Edit:

            The library imread also have a method imread_from_blob which accept a string as input. So you may pass your data directly to this function.

            from imread import imread_from_blob
            img_data = imread_from_blob(data, 'jpg')
            
            >>> img_data
            array([[[ 23, 123, 149],
            [ 22, 120, 147],
            [ 22, 118, 143],
            ...,
            

            The second parameter is the extension typically associated with this blob. If None is given, then detect_format is used to auto-detect.

            qid & accept id: (32256393, 32256824) query: create dictionary from list same values soup:

            You can try like this:

            \n
            def to_nested_dict(list_dict):\n    d = {}                                   # initialize the outer dict\n    for k, lst in list_dict.items():\n        d[k] = {}                            # initialize inner dicts\n        for x, y in lst:\n            d[k].setdefault(x, []).append(y) # initialize and populate innermost list\n    return d\n
            \n

            This uses setdefault to provide a defalt value (an empty list) in case of a new key, but you can just as well use an if-statement or a collections.defaultdict(list) for this.

            \n

            Example:

            \n
            >>> to_nested_dict({'abc': [['aaa', '123'], ['aaa', '321']]})\n{'abc': {'aaa': ['123', '321']}}\n>>> to_nested_dict({'abc': [['aaa', '123'], ['aaa', '321'], ['bbb', '456']]})\n{'abc': {'aaa': ['123', '321'], 'bbb': ['456']}}\n>>> to_nested_dict({'abc': [['aaa', '123'], ['aaa', '321'], ['bbb', '456']], 'efg': [['eee', '789']]})\n{'abc': {'aaa': ['123', '321'], 'bbb': ['456']}, 'efg': {'eee': ['789']}}\n
            \n

            Not that this assumes that the inner-most lists will always have two elements, a key and a value, and that the key can be the same in different lists, but can also differ, resulting in more than one entry in the created dictionaries.

            \n soup wrap:

            You can try like this:

            def to_nested_dict(list_dict):
                d = {}                                   # initialize the outer dict
                for k, lst in list_dict.items():
                    d[k] = {}                            # initialize inner dicts
                    for x, y in lst:
                        d[k].setdefault(x, []).append(y) # initialize and populate innermost list
                return d
            

            This uses setdefault to provide a defalt value (an empty list) in case of a new key, but you can just as well use an if-statement or a collections.defaultdict(list) for this.

            Example:

            >>> to_nested_dict({'abc': [['aaa', '123'], ['aaa', '321']]})
            {'abc': {'aaa': ['123', '321']}}
            >>> to_nested_dict({'abc': [['aaa', '123'], ['aaa', '321'], ['bbb', '456']]})
            {'abc': {'aaa': ['123', '321'], 'bbb': ['456']}}
            >>> to_nested_dict({'abc': [['aaa', '123'], ['aaa', '321'], ['bbb', '456']], 'efg': [['eee', '789']]})
            {'abc': {'aaa': ['123', '321'], 'bbb': ['456']}, 'efg': {'eee': ['789']}}
            

            Not that this assumes that the inner-most lists will always have two elements, a key and a value, and that the key can be the same in different lists, but can also differ, resulting in more than one entry in the created dictionaries.

            qid & accept id: (32262334, 32262362) query: Remove outer list from list of list in python soup:

            Like this:

            \n
            *[[0, 4], [2], [3]]\n
            \n

            Hope it helps!

            \n
            \n

            Example:

            \n
            >>> def f(a, b, c):\n...    print(a, b, c)\n...\n>>> f(*[1, 2, 3])\n1 2 3\n
            \n

            In your case:

            \n
            for element in itertools.product(*d.values()):\n    sortedList = sorted(list(element))\n
            \n soup wrap:

            Like this:

            *[[0, 4], [2], [3]]
            

            Hope it helps!


            Example:

            >>> def f(a, b, c):
            ...    print(a, b, c)
            ...
            >>> f(*[1, 2, 3])
            1 2 3
            

            In your case:

            for element in itertools.product(*d.values()):
                sortedList = sorted(list(element))
            
            qid & accept id: (32295637, 32295697) query: Extracting Data From Python Classes soup:

            Did you try pluto.coords?

            \n

            You can access members of a class from outside by using the instance followed by dot followed by the member name, i.e. attribute access. This is just as you have done when calling the genData() method.

            \n

            BTW, you can define your constants using exponential notation:

            \n
            m_sun = 1.989e+30\nG = 6.67e-11\n
            \n

            and

            \n
            pluto = Planet(4495978707000, 0, 0, 4670, 1.305e+22)\n
            \n

            which is more readable (important) and saves a few calculations for the definition of your class (less/not important).

            \n soup wrap:

            Did you try pluto.coords?

            You can access members of a class from outside by using the instance followed by dot followed by the member name, i.e. attribute access. This is just as you have done when calling the genData() method.

            BTW, you can define your constants using exponential notation:

            m_sun = 1.989e+30
            G = 6.67e-11
            

            and

            pluto = Planet(4495978707000, 0, 0, 4670, 1.305e+22)
            

            which is more readable (important) and saves a few calculations for the definition of your class (less/not important).

            qid & accept id: (32298978, 32299040) query: Searching and counting dictionary key value pairs soup:

            You can use collections.Counter.

            \n
            from collections import Counter\n\nd = {'brown dogs':3, 'dog of white':4, 'white cats':1, 'white cat':9}\nsubstrings = ['dog', 'cat']\n\ncounter = Counter()\n\nfor substring in substrings:\n    for key in d:\n        if substring in key:\n            counter[substring] += d[key]\n\nprint(counter.items())\n
            \n

            Output:

            \n
            [('dog', 7), ('cat', 10)]\n
            \n soup wrap:

            You can use collections.Counter.

            from collections import Counter
            
            d = {'brown dogs':3, 'dog of white':4, 'white cats':1, 'white cat':9}
            substrings = ['dog', 'cat']
            
            counter = Counter()
            
            for substring in substrings:
                for key in d:
                    if substring in key:
                        counter[substring] += d[key]
            
            print(counter.items())
            

            Output:

            [('dog', 7), ('cat', 10)]
            
            qid & accept id: (32308382, 32310323) query: Execute coroutine from `call_soon` callback function soup:

            The example you mentioned demonstrate how to schedule a callback.

            \n

            If you use the yield from syntax, the function is actually a coroutine and it has to be decorated accordingly:

            \n
            @asyncio.coroutine\ndef hello_world(loop):\n    print('Hello')\n    yield from asyncio.sleep(5, loop=loop)\n    print('World')\n    loop.stop()\n
            \n

            Then you can schedule the coroutine as a task using ensure_future:

            \n
            loop = asyncio.get_event_loop()\ncoro = hello_world(loop)\nasyncio.ensure_future(coro)\nloop.run_forever()\nloop.close()\n
            \n

            Or equivalently, using run_until_complete:

            \n
            loop = asyncio.get_event_loop()\ncoro = hello_world(loop)\nloop.run_until_complete(coro)\n
            \n
            \n

            In two weeks, python 3.5 will officially be released and you'll be able to use the new async/await syntax:

            \n
            async def hello_world(loop):\n    print('Hello')\n    await asyncio.sleep(5, loop=loop)\n    print('World')\n
            \n
            \n

            EDIT: It is a bit ugly, but nothing prevents you from creating a callback that schedules your coroutine:

            \n
            loop = asyncio.get_event_loop()\ncoro = hello_world(loop)\ncallback = lambda: asyncio.ensure_future(coro)\nloop.call_soon(callback)\nloop.run_forever()\nloop.close()\n
            \n soup wrap:

            The example you mentioned demonstrate how to schedule a callback.

            If you use the yield from syntax, the function is actually a coroutine and it has to be decorated accordingly:

            @asyncio.coroutine
            def hello_world(loop):
                print('Hello')
                yield from asyncio.sleep(5, loop=loop)
                print('World')
                loop.stop()
            

            Then you can schedule the coroutine as a task using ensure_future:

            loop = asyncio.get_event_loop()
            coro = hello_world(loop)
            asyncio.ensure_future(coro)
            loop.run_forever()
            loop.close()
            

            Or equivalently, using run_until_complete:

            loop = asyncio.get_event_loop()
            coro = hello_world(loop)
            loop.run_until_complete(coro)
            

            In two weeks, python 3.5 will officially be released and you'll be able to use the new async/await syntax:

            async def hello_world(loop):
                print('Hello')
                await asyncio.sleep(5, loop=loop)
                print('World')
            

            EDIT: It is a bit ugly, but nothing prevents you from creating a callback that schedules your coroutine:

            loop = asyncio.get_event_loop()
            coro = hello_world(loop)
            callback = lambda: asyncio.ensure_future(coro)
            loop.call_soon(callback)
            loop.run_forever()
            loop.close()
            
            qid & accept id: (32314712, 32315462) query: Restart a script after 6 minutes soup:

            The following approach should work, it first calculates 6 minutes into the future and starts your "script" executing 500 times. It then simply waits in a loop until the wake up time is reached. So your script can take any amount of time. If it takes longer, then the next 500 will start immediately.

            \n
            import time\n\nwakeup = time.time()\n\nwhile True:\n    wakeup += 6 * 60\n\n    for i in range(500):\n        # something \n\n        # Has it taken longer the 6 minutes?\n        if time.time() > wakeup:\n            break\n\n    while time.time() < wakeup:\n        time.sleep(1)\n
            \n

            You can change the sleep(1) value to whatever your want, e.g. 5. It will not effect the overall rate but will just mean the next iteration will start within 5 seconds of 6 minutes rather than 1 second of 6 minutes (if that makes sense).

            \n

            Try the following demo version: The following version runs every 6 seconds (not minutes) and tries to do the for loop 20 times. I slow it down by 0.5 seconds each iteration to simulate work.

            \n
            import time\n\nwakeup = time.time()\n\nwhile True:\n    wakeup += 6 \n    print "start",\n\n    for i in range(20):\n        time.sleep(0.5)   # simulate work\n        print i,\n\n        if time.time() > wakeup:\n            break\n\n    print "finished"\n\n    while time.time() < wakeup:\n        time.sleep(1)\n
            \n

            You will see the following output:

            \n
            start 0 1 2 3 4 5 6 7 8 9 10 11 12 finished\nstart 0 1 2 3 4 5 6 7 8 9 10 finished\nstart 0 1 2 3 4 5 6 7 8 9 10 11 finished\n
            \n

            As you can see, the loop is aborted when 6 seconds are up and before all 20 iterations are reached and it is then restarted.

            \n soup wrap:

            The following approach should work, it first calculates 6 minutes into the future and starts your "script" executing 500 times. It then simply waits in a loop until the wake up time is reached. So your script can take any amount of time. If it takes longer, then the next 500 will start immediately.

            import time
            
            wakeup = time.time()
            
            while True:
                wakeup += 6 * 60
            
                for i in range(500):
                    # something 
            
                    # Has it taken longer the 6 minutes?
                    if time.time() > wakeup:
                        break
            
                while time.time() < wakeup:
                    time.sleep(1)
            

            You can change the sleep(1) value to whatever your want, e.g. 5. It will not effect the overall rate but will just mean the next iteration will start within 5 seconds of 6 minutes rather than 1 second of 6 minutes (if that makes sense).

            Try the following demo version: The following version runs every 6 seconds (not minutes) and tries to do the for loop 20 times. I slow it down by 0.5 seconds each iteration to simulate work.

            import time
            
            wakeup = time.time()
            
            while True:
                wakeup += 6 
                print "start",
            
                for i in range(20):
                    time.sleep(0.5)   # simulate work
                    print i,
            
                    if time.time() > wakeup:
                        break
            
                print "finished"
            
                while time.time() < wakeup:
                    time.sleep(1)
            

            You will see the following output:

            start 0 1 2 3 4 5 6 7 8 9 10 11 12 finished
            start 0 1 2 3 4 5 6 7 8 9 10 finished
            start 0 1 2 3 4 5 6 7 8 9 10 11 finished
            

            As you can see, the loop is aborted when 6 seconds are up and before all 20 iterations are reached and it is then restarted.

            qid & accept id: (32316244, 32316630) query: Slicing pandas groupby groups into equal lengths soup:

            This does what you want, to use an df.apply method

            \n
            import pandas as pd\n\ncols = ['page', 'hour', 'count']\ndata = [\n    (3727441,    1,  2003),\n    (3727441,    2,   654),\n    (3727441,    3,  5434),\n    (3727458,    1,   326),\n    (3727458,    2,  2348),\n    (3727458,    3,  4040),\n    (3727458,    4,   374),\n    (3727458,    5,  2917),\n    (3727458,    6,  3937),\n    (3735634,    1,  1957),\n    (3735634,    2,  2398),\n    (3735634,    3,  2812),\n    (3768433,    1,   499),\n    (3768433,    2,  4924),\n    (3768433,    3,  5460),\n    (3768433,    4,  1710),\n    (3768433,    5,  3877),\n    (3768433,    6,  1912),\n    (3768433,    7,  1367),\n    (3768433,    8,  1626),\n    (3768433,    9,  4750),\n]\n\ndf = pd.DataFrame.from_records(data, columns=cols)\n\ndef f(row):\n    n = (row.hour - 1) / 3 \n    if n > 0:\n        return str(row.page) + '_{0}'.format(int(n))\n    else:\n        return row.page\n\ndf['page'] = df.apply(f, axis=1)\n\nprint df\n
            \n

            Output:

            \n
             #       page  hour  count\n # 0     3727441     1   2003\n # 1     3727441     2    654\n # 2     3727441     3   5434\n # 3     3727458     1    326\n # 4     3727458     2   2348\n # 5     3727458     3   4040\n # 6   3727458_1     4    374\n # 7   3727458_1     5   2917\n # 8   3727458_1     6   3937\n # 9     3735634     1   1957\n # 10    3735634     2   2398\n # 11    3735634     3   2812\n # 12    3768433     1    499\n # 13    3768433     2   4924\n # 14    3768433     3   5460\n # 15  3768433_1     4   1710\n # 16  3768433_1     5   3877\n # 17  3768433_1     6   1912\n # 18  3768433_2     7   1367\n # 19  3768433_2     8   1626\n # 20  3768433_2     9   4750\n
            \n soup wrap:

            This does what you want, to use an df.apply method

            import pandas as pd
            
            cols = ['page', 'hour', 'count']
            data = [
                (3727441,    1,  2003),
                (3727441,    2,   654),
                (3727441,    3,  5434),
                (3727458,    1,   326),
                (3727458,    2,  2348),
                (3727458,    3,  4040),
                (3727458,    4,   374),
                (3727458,    5,  2917),
                (3727458,    6,  3937),
                (3735634,    1,  1957),
                (3735634,    2,  2398),
                (3735634,    3,  2812),
                (3768433,    1,   499),
                (3768433,    2,  4924),
                (3768433,    3,  5460),
                (3768433,    4,  1710),
                (3768433,    5,  3877),
                (3768433,    6,  1912),
                (3768433,    7,  1367),
                (3768433,    8,  1626),
                (3768433,    9,  4750),
            ]
            
            df = pd.DataFrame.from_records(data, columns=cols)
            
            def f(row):
                n = (row.hour - 1) / 3 
                if n > 0:
                    return str(row.page) + '_{0}'.format(int(n))
                else:
                    return row.page
            
            df['page'] = df.apply(f, axis=1)
            
            print df
            

            Output:

             #       page  hour  count
             # 0     3727441     1   2003
             # 1     3727441     2    654
             # 2     3727441     3   5434
             # 3     3727458     1    326
             # 4     3727458     2   2348
             # 5     3727458     3   4040
             # 6   3727458_1     4    374
             # 7   3727458_1     5   2917
             # 8   3727458_1     6   3937
             # 9     3735634     1   1957
             # 10    3735634     2   2398
             # 11    3735634     3   2812
             # 12    3768433     1    499
             # 13    3768433     2   4924
             # 14    3768433     3   5460
             # 15  3768433_1     4   1710
             # 16  3768433_1     5   3877
             # 17  3768433_1     6   1912
             # 18  3768433_2     7   1367
             # 19  3768433_2     8   1626
             # 20  3768433_2     9   4750
            
            qid & accept id: (32344022, 32344369) query: Finding combinations that meet a threshold relation soup:

            You could use numpy and vectorization, something like it below

            \n
            import numpy as np\n\nphi = 0.5\ntheta = 1\nn1 = 10\nn2 = 20\n\nN1 = np.random.randint(-100, 100, size=100)\nN2 = np.random.randint(-100, 100, size=100)\n\nN1 = N1[(N1 >= 0) & (N1 <= n1)]\nN2 = N2[(N2 >= 0) & (N2 <= n2)]\n\na = N2 * theta + phi\nres = N1.reshape(N1.shape[0], 1) - a.reshape(1, a.shape[0])\n\nindices = np.argwhere(res >= 0)\npairs = zip(N1[indices[:,0]], N2[indices[:,1]])\n
            \n

            example output of pairs

            \n
            [(8, 3),\n (8, 6),\n (8, 5),\n (8, 1),\n (3, 1),\n (9, 3),\n (9, 8),\n (9, 8),\n (9, 6),\n (9, 5),\n (9, 6),\n (9, 6),\n (9, 5),\n (9, 8),\n (9, 1)]\n
            \n

            per @dbliss request, here is the modualized version and its test

            \n
            import numpy as np\n\n\ndef calc_combination(N1, N2, n1, n2, theta, phi):\n    N1 = N1[(N1 >= 0) & (N1 <= n1)]\n    N2 = N2[(N2 >= 0) & (N2 <= n2)]\n\n    a = N2 * theta + phi\n    res = N1.reshape(N1.shape[0], 1) - a.reshape(1, a.shape[0])\n\n    indices = np.argwhere(res >= 0)\n    pairs = zip(N1[indices[:,0]], N2[indices[:,1]])\n    return pairs\n\n\ndef test_case():\n    n1 = 5\n    n2 = 1\n    theta = 2\n    phi = 2\n\n    N1 = np.arange(n1 + 1)\n    N2 = np.arange(n2 + 1)\n\n    assert (calc_combination(N1, N2, n1, n2, theta, phi) ==\n            [(2, 0), (3, 0), (4, 0), (4, 1), (5, 0), (5, 1)])\n\ntest_case()\n
            \n soup wrap:

            You could use numpy and vectorization, something like it below

            import numpy as np
            
            phi = 0.5
            theta = 1
            n1 = 10
            n2 = 20
            
            N1 = np.random.randint(-100, 100, size=100)
            N2 = np.random.randint(-100, 100, size=100)
            
            N1 = N1[(N1 >= 0) & (N1 <= n1)]
            N2 = N2[(N2 >= 0) & (N2 <= n2)]
            
            a = N2 * theta + phi
            res = N1.reshape(N1.shape[0], 1) - a.reshape(1, a.shape[0])
            
            indices = np.argwhere(res >= 0)
            pairs = zip(N1[indices[:,0]], N2[indices[:,1]])
            

            example output of pairs

            [(8, 3),
             (8, 6),
             (8, 5),
             (8, 1),
             (3, 1),
             (9, 3),
             (9, 8),
             (9, 8),
             (9, 6),
             (9, 5),
             (9, 6),
             (9, 6),
             (9, 5),
             (9, 8),
             (9, 1)]
            

            per @dbliss request, here is the modualized version and its test

            import numpy as np
            
            
            def calc_combination(N1, N2, n1, n2, theta, phi):
                N1 = N1[(N1 >= 0) & (N1 <= n1)]
                N2 = N2[(N2 >= 0) & (N2 <= n2)]
            
                a = N2 * theta + phi
                res = N1.reshape(N1.shape[0], 1) - a.reshape(1, a.shape[0])
            
                indices = np.argwhere(res >= 0)
                pairs = zip(N1[indices[:,0]], N2[indices[:,1]])
                return pairs
            
            
            def test_case():
                n1 = 5
                n2 = 1
                theta = 2
                phi = 2
            
                N1 = np.arange(n1 + 1)
                N2 = np.arange(n2 + 1)
            
                assert (calc_combination(N1, N2, n1, n2, theta, phi) ==
                        [(2, 0), (3, 0), (4, 0), (4, 1), (5, 0), (5, 1)])
            
            test_case()
            
            qid & accept id: (32346156, 32346272) query: Outputting Multi-row CSV Files from Multiple Dictionaries soup:

            As you are using DictWriter, you would need to construct a per row dictionary as follows:

            \n
            import csv\n\nsymbol = ["msft", "cvx", "baba"]\nheader = ["symbol","ev_ebitda","asset"]\n\nwith open('output.csv', 'wb') as f_output:\n    csv_output = csv.DictWriter(f_output, fieldnames=header)\n    csv_output.writeheader()\n\n    for s in symbol:\n        row = {'asset': 60, 'ev_ebitda': 40, 'symbol': s}\n        csv_output.writerow(row)\n
            \n

            This will create you an output CSV file as follows:

            \n
            symbol,ev_ebitda,asset\nmsft,40,60\ncvx,40,60\nbaba,40,60\n
            \n soup wrap:

            As you are using DictWriter, you would need to construct a per row dictionary as follows:

            import csv
            
            symbol = ["msft", "cvx", "baba"]
            header = ["symbol","ev_ebitda","asset"]
            
            with open('output.csv', 'wb') as f_output:
                csv_output = csv.DictWriter(f_output, fieldnames=header)
                csv_output.writeheader()
            
                for s in symbol:
                    row = {'asset': 60, 'ev_ebitda': 40, 'symbol': s}
                    csv_output.writerow(row)
            

            This will create you an output CSV file as follows:

            symbol,ev_ebitda,asset
            msft,40,60
            cvx,40,60
            baba,40,60
            
            qid & accept id: (32358269, 32358481) query: appending a single string to each element of a list in python soup:

            Something like the following should work:

            \n
            with open('userID.txt', 'r') as f_input, open('emails.txt', 'w') as f_output:\n    emails = ["{}@wherever.com".format(line.strip()) for line in f_input]\n    f_output.write(", ".join(emails))\n
            \n

            So if you had a userID.txt file containing the following names, with one name per line:

            \n
            fred\nwilma\n
            \n

            You would get a one line output file as follows:

            \n
            fred@wherever.com, wilma@wherever.com\n
            \n soup wrap:

            Something like the following should work:

            with open('userID.txt', 'r') as f_input, open('emails.txt', 'w') as f_output:
                emails = ["{}@wherever.com".format(line.strip()) for line in f_input]
                f_output.write(", ".join(emails))
            

            So if you had a userID.txt file containing the following names, with one name per line:

            fred
            wilma
            

            You would get a one line output file as follows:

            fred@wherever.com, wilma@wherever.com
            
            qid & accept id: (32365358, 32365415) query: Python lists with irregular format soup:

            One very simple way to do this is to use a constructed dict with your defaults, and then update it:

            \n
            >>> d = dict([(0,0),(1,0),(2,0),(3,0)])\n>>> print(d)\n{0: 0, 1: 0, 2: 0, 3: 0}\n>>> d.update([(0, 0.73578249201070511), (3, 0.25197028613750805)])\n>>> print(d)\n{0: 0.7357824920107051, 1: 0, 2: 0, 3: 0.25197028613750805}\n
            \n

            Edit

            \n

            Incorporating hgwell's suggestion to output a list of tuples, here is a complete function (which could probably be done better somehow, but this works anyway):

            \n
            def listify(l):\n    res = []\n    for j in l:\n        d = dict([(0,0),(1,0),(2,0),(3,0),(4,0)])\n        d.update(j)\n        res.append(list(d.items()))\n    return res\n
            \n

            and in action...

            \n
            >>> z = listify([[(1, 0.97456828373415116)],\n                 [(0, 0.91883125256489728), (1, 0.020225186991467976), (2, 0.020314851937259213), (3, 0.020382294889184499), (4, 0.020246413617191008)],\n                 [(2, 0.98493696818505228)]])\n>>> pprint(z)\n[[(0, 0), (1, 0.9745682837341512), (2, 0), (3, 0), (4, 0)],\n [(0, 0.9188312525648973),\n  (1, 0.020225186991467976),\n  (2, 0.020314851937259213),\n  (3, 0.0203822948891845),\n  (4, 0.020246413617191008)],\n [(0, 0), (1, 0), (2, 0.9849369681850523), (3, 0), (4, 0)]]\n
            \n soup wrap:

            One very simple way to do this is to use a constructed dict with your defaults, and then update it:

            >>> d = dict([(0,0),(1,0),(2,0),(3,0)])
            >>> print(d)
            {0: 0, 1: 0, 2: 0, 3: 0}
            >>> d.update([(0, 0.73578249201070511), (3, 0.25197028613750805)])
            >>> print(d)
            {0: 0.7357824920107051, 1: 0, 2: 0, 3: 0.25197028613750805}
            

            Edit

            Incorporating hgwell's suggestion to output a list of tuples, here is a complete function (which could probably be done better somehow, but this works anyway):

            def listify(l):
                res = []
                for j in l:
                    d = dict([(0,0),(1,0),(2,0),(3,0),(4,0)])
                    d.update(j)
                    res.append(list(d.items()))
                return res
            

            and in action...

            >>> z = listify([[(1, 0.97456828373415116)],
                             [(0, 0.91883125256489728), (1, 0.020225186991467976), (2, 0.020314851937259213), (3, 0.020382294889184499), (4, 0.020246413617191008)],
                             [(2, 0.98493696818505228)]])
            >>> pprint(z)
            [[(0, 0), (1, 0.9745682837341512), (2, 0), (3, 0), (4, 0)],
             [(0, 0.9188312525648973),
              (1, 0.020225186991467976),
              (2, 0.020314851937259213),
              (3, 0.0203822948891845),
              (4, 0.020246413617191008)],
             [(0, 0), (1, 0), (2, 0.9849369681850523), (3, 0), (4, 0)]]
            
            qid & accept id: (32379895, 32381082) query: vectorize numpy unique for subarrays soup:

            First, you can work with data.reshape(N,-1), since you are interested in sorting the last 2 dimensions.

            \n

            An easy way to get the number of unique values for each row is to dump each row into a set and let it do the sorting:

            \n
            [len(set(i)) for i in data.reshape(data.shape[0],-1)]\n
            \n

            But this is an iteration, through probably a fast one.

            \n

            A problem with 'vectorizing' is that the set or list of unique values in each row will differ in length. 'rows with differing length' is a red flag when it comes to 'vectorizing'. You no longer have the 'rectangular' data layout that makes most vectorizing possible.

            \n

            You could sort each row:

            \n
            np.sort(data.reshape(N,-1))\n\narray([[1, 2, 2, 3, 3, 5, 5, 5, 6, 6],\n       [1, 1, 1, 2, 2, 2, 3, 3, 5, 7],\n       [0, 0, 2, 3, 4, 4, 4, 5, 5, 9],\n       [2, 2, 3, 3, 4, 4, 5, 7, 8, 9],\n       [0, 2, 2, 2, 2, 5, 5, 5, 7, 9]])\n
            \n

            But how do you identify the unique values in each row without iterating? Counting the number of nonzero differences might just do the trick:

            \n
            In [530]: data=np.random.randint(10,size=(5,10))\n\nIn [531]: [len(set(i)) for i in data.reshape(data.shape[0],-1)]\nOut[531]: [7, 6, 6, 8, 6]\n\nIn [532]: sdata=np.sort(data,axis=1)\nIn [533]: (np.diff(sdata)>0).sum(axis=1)+1            \nOut[533]: array([7, 6, 6, 8, 6])\n
            \n

            I was going to add a warning about floats, but if np.unique is working for your data, my approach should work just as well.

            \n
            \n
            [(np.bincount(i)>0).sum() for i in data]\n
            \n

            This is an iterative solution that is clearly faster than my len(set(i)) version, and is competitive with the diff...sort.

            \n

            In [585]: data.shape\nOut[585]: (10000, 400)

            \n
            In [586]: timeit [(np.bincount(i)>0).sum() for i in data]\n1 loops, best of 3: 248 ms per loop\n\nIn [587]: %%timeit                                       \nsdata=np.sort(data,axis=1)\n(np.diff(sdata)>0).sum(axis=1)+1\n   .....: \n1 loops, best of 3: 280 ms per loop\n
            \n

            I just found a faster way to use bincount, np.count_nonzero

            \n
            In [715]: timeit np.array([np.count_nonzero(np.bincount(i)) for i in data])\n10 loops, best of 3: 59.6 ms per loop\n
            \n

            I was surprised at the speed improvement. But then I recalled that count_nonzero is used in other functions (e.g. np.nonzero) to allocate space for their return results. So it makes sense that this function would be coded for maximum speed. (It doesn't help in the diff...sort case because it does not take an axis parameter).

            \n soup wrap:

            First, you can work with data.reshape(N,-1), since you are interested in sorting the last 2 dimensions.

            An easy way to get the number of unique values for each row is to dump each row into a set and let it do the sorting:

            [len(set(i)) for i in data.reshape(data.shape[0],-1)]
            

            But this is an iteration, through probably a fast one.

            A problem with 'vectorizing' is that the set or list of unique values in each row will differ in length. 'rows with differing length' is a red flag when it comes to 'vectorizing'. You no longer have the 'rectangular' data layout that makes most vectorizing possible.

            You could sort each row:

            np.sort(data.reshape(N,-1))
            
            array([[1, 2, 2, 3, 3, 5, 5, 5, 6, 6],
                   [1, 1, 1, 2, 2, 2, 3, 3, 5, 7],
                   [0, 0, 2, 3, 4, 4, 4, 5, 5, 9],
                   [2, 2, 3, 3, 4, 4, 5, 7, 8, 9],
                   [0, 2, 2, 2, 2, 5, 5, 5, 7, 9]])
            

            But how do you identify the unique values in each row without iterating? Counting the number of nonzero differences might just do the trick:

            In [530]: data=np.random.randint(10,size=(5,10))
            
            In [531]: [len(set(i)) for i in data.reshape(data.shape[0],-1)]
            Out[531]: [7, 6, 6, 8, 6]
            
            In [532]: sdata=np.sort(data,axis=1)
            In [533]: (np.diff(sdata)>0).sum(axis=1)+1            
            Out[533]: array([7, 6, 6, 8, 6])
            

            I was going to add a warning about floats, but if np.unique is working for your data, my approach should work just as well.


            [(np.bincount(i)>0).sum() for i in data]
            

            This is an iterative solution that is clearly faster than my len(set(i)) version, and is competitive with the diff...sort.

            In [585]: data.shape Out[585]: (10000, 400)

            In [586]: timeit [(np.bincount(i)>0).sum() for i in data]
            1 loops, best of 3: 248 ms per loop
            
            In [587]: %%timeit                                       
            sdata=np.sort(data,axis=1)
            (np.diff(sdata)>0).sum(axis=1)+1
               .....: 
            1 loops, best of 3: 280 ms per loop
            

            I just found a faster way to use bincount, np.count_nonzero

            In [715]: timeit np.array([np.count_nonzero(np.bincount(i)) for i in data])
            10 loops, best of 3: 59.6 ms per loop
            

            I was surprised at the speed improvement. But then I recalled that count_nonzero is used in other functions (e.g. np.nonzero) to allocate space for their return results. So it makes sense that this function would be coded for maximum speed. (It doesn't help in the diff...sort case because it does not take an axis parameter).

            qid & accept id: (32390539, 32391080) query: How to pass javascript variable to macros in jinja2 template soup:

            You cannot pass values from javascript to the template that way because the template is going to be rendered before the response goes back to the browser for the javascript engine to evaluate. The only way the template renderer would be able to resolve the value of name specified in the javascript code would be to interpret the string embedded in .

            \n

            Update. Let's look at your second attempt, the one that you say has worked. You have:

            \n
            \n    \n    \n\n
            \n

            Presumably this is in some partial (say _index.html). A view that has py_fn in scope, loads _index.html, evaluates the string "py_fn('some_string')", and replaces {{ py_fn('some_string') }} with the result of that evaluation. Let's say py_fn is the identity function on strings: it's a unary function that takes a string and immediately returns it. Then, the result of evaluating "py_fn('some_string')" will be the string 'some_string', which will be substituted back, obtaining the final, so-called "rendered" template:

            \n
            \n    \n    \n\n
            \n

            This string will be part of the response body of the request, so the browser will dump the button on the window, evaluate the js code inside the script block, which will create a global variable js_fn on the window, which will take something and alert it. The button, when clicked on, will call js_fn with the constant some_string, always. As you can see, there is no passing of values from JS to Python/Flask.

            \n soup wrap:

            You cannot pass values from javascript to the template that way because the template is going to be rendered before the response goes back to the browser for the javascript engine to evaluate. The only way the template renderer would be able to resolve the value of name specified in the javascript code would be to interpret the string embedded in .

            Update. Let's look at your second attempt, the one that you say has worked. You have:

            
                
                
            
            

            Presumably this is in some partial (say _index.html). A view that has py_fn in scope, loads _index.html, evaluates the string "py_fn('some_string')", and replaces {{ py_fn('some_string') }} with the result of that evaluation. Let's say py_fn is the identity function on strings: it's a unary function that takes a string and immediately returns it. Then, the result of evaluating "py_fn('some_string')" will be the string 'some_string', which will be substituted back, obtaining the final, so-called "rendered" template:

            
                
                
            
            

            This string will be part of the response body of the request, so the browser will dump the button on the window, evaluate the js code inside the script block, which will create a global variable js_fn on the window, which will take something and alert it. The button, when clicked on, will call js_fn with the constant some_string, always. As you can see, there is no passing of values from JS to Python/Flask.

            qid & accept id: (32403846, 32404486) query: List Comprehensions - How to have strings and integers in one list? soup:

            Just split on the colon then map the rest to int after splitting on a comma:

            \n
            with open("in.txt") as f:\n    for line in f:\n        a, rest = line.split(":",1)\n        print([a] + map(int,rest.split(",")))\n
            \n

            Output:

            \n
            ['min', 1, 2, 3, 5, 6]\n['max', 1, 2, 3, 5, 6]\n['avg', 1, 2, 3, 5, 6]\n
            \n soup wrap:

            Just split on the colon then map the rest to int after splitting on a comma:

            with open("in.txt") as f:
                for line in f:
                    a, rest = line.split(":",1)
                    print([a] + map(int,rest.split(",")))
            

            Output:

            ['min', 1, 2, 3, 5, 6]
            ['max', 1, 2, 3, 5, 6]
            ['avg', 1, 2, 3, 5, 6]
            
            qid & accept id: (32424555, 32430886) query: Python-Getting contents between current and next occurrence of pattern in a string soup:

            As mentioned by Blckknght in the comment, you can achieve this with re.split. re.split retains all empty strings between a) the beginning of the string and the first match, b) the last match and the end of the string and c) between different matches:

            \n
            >>> re.split('abc', 'abcabcabcabc')\n['', '', '', '', '']\n>>> re.split('bca', 'abcabcabcabc')\n['a', '', '', 'bc']\n>>> re.split('c', 'abcabcabcabc')\n['ab', 'ab', 'ab', 'ab', '']\n>>> re.split('a', 'abcabcabcabc')\n['', 'bc', 'bc', 'bc', 'bc']\n
            \n

            If you want to retain only c) the strings between 2 matches of the pattern, just slice the resulting array with [1:-1].

            \n

            Do note that there are two caveat with this method:

            \n
              \n
            1. re.split doesn't split on empty string match.

              \n
              >>> re.split('', 'abcabc')\n['abcabc']\n
            2. \n
            3. Content in capturing groups will be included in the resulting array.

              \n
              >>> re.split(r'(.)(?!\1)', 'aaaaaakkkkkkbbbbbsssss')\n['aaaaa', 'a', 'kkkkk', 'k', 'bbbb', 'b', 'ssss', 's', '']\n
            4. \n
            \n

            You have to write your own function with finditer if you need to handle those use cases.

            \n

            This is the variant where only case c) is matched.

            \n
            def findbetween(pattern, input):\n    out = []\n    start = 0\n    for m in re.finditer(pattern, input):\n        out.append(input[start:m.start()])\n        start = m.end()\n    return out\n
            \n

            Sample run:

            \n
            >>> findbetween('abc', 'abcabcabcabc')\n['', '', '']\n>>> findbetween(r'', 'abcdef')\n['a', 'b', 'c', 'd', 'e', 'f']\n>>> findbetween(r'ab', 'abcabcabc')\n['c', 'c']\n>>> findbetween(r'b', 'abcabcabc')\n['ca', 'ca']\n>>> findbetween(r'(?<=(.))(?!\1)', 'aaaaaaaaaaaabbbbbbbbbbbbkkkkkkk')\n['bbbbbbbbbbbb', 'kkkkkkk']\n
            \n

            (In the last example, (?<=(.))(?!\1) matches the empty string at the end of the string, so 'kkkkkkk' is included in the list of results)

            \n soup wrap:

            As mentioned by Blckknght in the comment, you can achieve this with re.split. re.split retains all empty strings between a) the beginning of the string and the first match, b) the last match and the end of the string and c) between different matches:

            >>> re.split('abc', 'abcabcabcabc')
            ['', '', '', '', '']
            >>> re.split('bca', 'abcabcabcabc')
            ['a', '', '', 'bc']
            >>> re.split('c', 'abcabcabcabc')
            ['ab', 'ab', 'ab', 'ab', '']
            >>> re.split('a', 'abcabcabcabc')
            ['', 'bc', 'bc', 'bc', 'bc']
            

            If you want to retain only c) the strings between 2 matches of the pattern, just slice the resulting array with [1:-1].

            Do note that there are two caveat with this method:

            1. re.split doesn't split on empty string match.

              >>> re.split('', 'abcabc')
              ['abcabc']
              
            2. Content in capturing groups will be included in the resulting array.

              >>> re.split(r'(.)(?!\1)', 'aaaaaakkkkkkbbbbbsssss')
              ['aaaaa', 'a', 'kkkkk', 'k', 'bbbb', 'b', 'ssss', 's', '']
              

            You have to write your own function with finditer if you need to handle those use cases.

            This is the variant where only case c) is matched.

            def findbetween(pattern, input):
                out = []
                start = 0
                for m in re.finditer(pattern, input):
                    out.append(input[start:m.start()])
                    start = m.end()
                return out
            

            Sample run:

            >>> findbetween('abc', 'abcabcabcabc')
            ['', '', '']
            >>> findbetween(r'', 'abcdef')
            ['a', 'b', 'c', 'd', 'e', 'f']
            >>> findbetween(r'ab', 'abcabcabc')
            ['c', 'c']
            >>> findbetween(r'b', 'abcabcabc')
            ['ca', 'ca']
            >>> findbetween(r'(?<=(.))(?!\1)', 'aaaaaaaaaaaabbbbbbbbbbbbkkkkkkk')
            ['bbbbbbbbbbbb', 'kkkkkkk']
            

            (In the last example, (?<=(.))(?!\1) matches the empty string at the end of the string, so 'kkkkkkk' is included in the list of results)

            qid & accept id: (32458370, 32458414) query: How can a class that inherits from list and uses keyword arguments be made to work in both Python 2 and Python 3? soup:

            You only have to change your super() call to use explicit arguments:

            \n
            super(Palette, self).__init__(*args)   \n
            \n

            and your code will work just fine in both Python 2 and Python 3. See Why is Python 3.x's super() magic? for background information on why the above is equivalent to super().__init__(*args). Also, do not pass in self again, or you'll create a circular reference as you include self in the contents of the list.

            \n

            Note that it is more pythonic use property objects instead of explicit getters and setters:

            \n
            class Palette(list):\n    def __init__(self, name=None, description=None, colors=None, *args):\n        super(Palette, self).__init__(args)   \n        self.name = name\n        self.description = description\n        self.extend(colors)\n\n    @property\n    def name(self):\n        return self._name\n\n    @name.setter\n    def name(self, name):\n        self._name = name\n\n    @name.deleter\n    def name(self):\n        self.name = None\n\n    @property\n    def description(self):\n        return self._description\n\n    @description.setter\n    def description(self, description):\n        self._description = description\n\n    @description.deleter\n    def description(self):\n        self.description = None\n
            \n

            then use

            \n
            palette1.description = "This is palette 1."\n
            \n

            I also took the liberty of reducing the amount of whitespace in the function definitions; putting each and every argument on a new line makes it very hard to get an overview of the class as function bodies are needlessly pushed down.

            \n

            As these properties don't actually do anything other than wrap an attribute by the same name with an underscore, you may as well just leave them out altogether. Unlike Java, in Python you can freely switch between using attributes directly, and later on swapping attributes out for a property object; you are not tied into one or the other.

            \n

            Note that in both Python 2 and Python 3, you cannot pass in positional arguments; the following doesn't work:

            \n
            Palette('#F1E1BD', '#EEBA85', name='palette2')\n
            \n

            because the first positional argument will be assigned to the name argument:

            \n
            >>> Palette('#F1E1BD', '#EEBA85', name='palette2')\nTraceback (most recent call last):\n  File "", line 1, in \nTypeError: __init__() got multiple values for argument 'name'\n
            \n

            To support that use case, you need to not name the keyword arguments in the signature, and only use **kwargs, then retrieve your keyword arguments from that. Pass in any positional arguments as one argument so that list() takes any number of positional arguments as the contents for the new list:

            \n
            class Palette(list):\n    def __init__(self, *args, **kwargs):\n        super(Palette, self).__init__(args)\n        self.name = kwargs.pop('name', None)\n        self.description = kwargs.pop('description', None)\n        self.extend(kwargs.pop('colors', []))\n        if kwargs:\n            raise TypeError('{} does not take {} as argument(s)'.format(\n                type(self).__name__, ', '.join(kwargs)))\n
            \n

            Demo:

            \n
            >>> class Palette(list):\n...     def __init__(self, *args, **kwargs):\n...         super(Palette, self).__init__(args)\n...         self.name = kwargs.pop('name', None)\n...         self.description = kwargs.pop('description', None)\n...         self.extend(kwargs.pop('colors', []))\n...         if kwargs:\n...             raise TypeError('{} does not take {} as argument(s)'.format(\n...                 type(self).__name__, ', '.join(kwargs)))\n... \n\n>>> palette1 = Palette(\n...     name   = "palette 1",\n...     colors = [\n...         "#F1E1BD",\n...         "#EEBA85",\n...         "#E18D76",\n...         "#9C837E",\n...         "#5B7887"\n...     ]\n... )\n>>> palette2 = Palette("#F1E1BD", "#EEBA85", "#E18D76", "#9C837E", "#5B7887",\n...                    name="palette 2")\n>>> palette1\n['#F1E1BD', '#EEBA85', '#E18D76', '#9C837E', '#5B7887']\n>>> palette2\n['#F1E1BD', '#EEBA85', '#E18D76', '#9C837E', '#5B7887']\n>>> palette1.name\n'palette 1'\n>>> palette2.name\n'palette 2'\n>>> palette1.description = 'This is palette 1.'\n>>> palette2.description = 'This is palette 2.'\n>>> Palette(foo='bar', spam='eggs')\nTraceback (most recent call last):\n  File "", line 1, in \n  File "", line 9, in __init__\nTypeError: Palette does not take foo, spam as argument(s)\n
            \n soup wrap:

            You only have to change your super() call to use explicit arguments:

            super(Palette, self).__init__(*args)   
            

            and your code will work just fine in both Python 2 and Python 3. See Why is Python 3.x's super() magic? for background information on why the above is equivalent to super().__init__(*args). Also, do not pass in self again, or you'll create a circular reference as you include self in the contents of the list.

            Note that it is more pythonic use property objects instead of explicit getters and setters:

            class Palette(list):
                def __init__(self, name=None, description=None, colors=None, *args):
                    super(Palette, self).__init__(args)   
                    self.name = name
                    self.description = description
                    self.extend(colors)
            
                @property
                def name(self):
                    return self._name
            
                @name.setter
                def name(self, name):
                    self._name = name
            
                @name.deleter
                def name(self):
                    self.name = None
            
                @property
                def description(self):
                    return self._description
            
                @description.setter
                def description(self, description):
                    self._description = description
            
                @description.deleter
                def description(self):
                    self.description = None
            

            then use

            palette1.description = "This is palette 1."
            

            I also took the liberty of reducing the amount of whitespace in the function definitions; putting each and every argument on a new line makes it very hard to get an overview of the class as function bodies are needlessly pushed down.

            As these properties don't actually do anything other than wrap an attribute by the same name with an underscore, you may as well just leave them out altogether. Unlike Java, in Python you can freely switch between using attributes directly, and later on swapping attributes out for a property object; you are not tied into one or the other.

            Note that in both Python 2 and Python 3, you cannot pass in positional arguments; the following doesn't work:

            Palette('#F1E1BD', '#EEBA85', name='palette2')
            

            because the first positional argument will be assigned to the name argument:

            >>> Palette('#F1E1BD', '#EEBA85', name='palette2')
            Traceback (most recent call last):
              File "", line 1, in 
            TypeError: __init__() got multiple values for argument 'name'
            

            To support that use case, you need to not name the keyword arguments in the signature, and only use **kwargs, then retrieve your keyword arguments from that. Pass in any positional arguments as one argument so that list() takes any number of positional arguments as the contents for the new list:

            class Palette(list):
                def __init__(self, *args, **kwargs):
                    super(Palette, self).__init__(args)
                    self.name = kwargs.pop('name', None)
                    self.description = kwargs.pop('description', None)
                    self.extend(kwargs.pop('colors', []))
                    if kwargs:
                        raise TypeError('{} does not take {} as argument(s)'.format(
                            type(self).__name__, ', '.join(kwargs)))
            

            Demo:

            >>> class Palette(list):
            ...     def __init__(self, *args, **kwargs):
            ...         super(Palette, self).__init__(args)
            ...         self.name = kwargs.pop('name', None)
            ...         self.description = kwargs.pop('description', None)
            ...         self.extend(kwargs.pop('colors', []))
            ...         if kwargs:
            ...             raise TypeError('{} does not take {} as argument(s)'.format(
            ...                 type(self).__name__, ', '.join(kwargs)))
            ... 
            
            >>> palette1 = Palette(
            ...     name   = "palette 1",
            ...     colors = [
            ...         "#F1E1BD",
            ...         "#EEBA85",
            ...         "#E18D76",
            ...         "#9C837E",
            ...         "#5B7887"
            ...     ]
            ... )
            >>> palette2 = Palette("#F1E1BD", "#EEBA85", "#E18D76", "#9C837E", "#5B7887",
            ...                    name="palette 2")
            >>> palette1
            ['#F1E1BD', '#EEBA85', '#E18D76', '#9C837E', '#5B7887']
            >>> palette2
            ['#F1E1BD', '#EEBA85', '#E18D76', '#9C837E', '#5B7887']
            >>> palette1.name
            'palette 1'
            >>> palette2.name
            'palette 2'
            >>> palette1.description = 'This is palette 1.'
            >>> palette2.description = 'This is palette 2.'
            >>> Palette(foo='bar', spam='eggs')
            Traceback (most recent call last):
              File "", line 1, in 
              File "", line 9, in __init__
            TypeError: Palette does not take foo, spam as argument(s)
            
            qid & accept id: (32471239, 32474087) query: How to loop through object return by SQLALchemy and process each row and display it to HTML soup:

            Try something like this.

            \n
            def listallcams():\n   camtab = SVSIpCamReg.query.filter_by(u_id = current_user.id).all()\n   for rec in camtab:\n      dkey = rec.key\n      bdkey=bytes(dkey)\n      f = Fernet(bdkey)\n      bcamurl = bytes(rec.camurl_hash)\n      camurl =f.decrypt(bcamurl)\n      rec.camurl = camurl\n   return render_template('cam/viewallcam.html',allcam = camtab)\n
            \n

            What we are trying to achieve is that we fetch the cameras we are interested in. We then loop through them, and add one attribute to each object(the attribute "camurl")

            \n

            After this you should now be able to use the objects in your html.

            \n
            {% for cam in allcam %}\n   

            {{ cam.camurl }}

            \n

            {{ cam.sitename }}

            \n{% endfor %}\n
            \n soup wrap:

            Try something like this.

            def listallcams():
               camtab = SVSIpCamReg.query.filter_by(u_id = current_user.id).all()
               for rec in camtab:
                  dkey = rec.key
                  bdkey=bytes(dkey)
                  f = Fernet(bdkey)
                  bcamurl = bytes(rec.camurl_hash)
                  camurl =f.decrypt(bcamurl)
                  rec.camurl = camurl
               return render_template('cam/viewallcam.html',allcam = camtab)
            

            What we are trying to achieve is that we fetch the cameras we are interested in. We then loop through them, and add one attribute to each object(the attribute "camurl")

            After this you should now be able to use the objects in your html.

            {% for cam in allcam %}
               

            {{ cam.camurl }}

            {{ cam.sitename }}

            {% endfor %}
            qid & accept id: (32480553, 32481219) query: finding nearest points in python soup:

            There might be a faster, but this do the work:

            \n
            import numpy as np\n\nC = [[1,1], [10,10]]\nX = [[1,2], [1,3], [2,1], [10,11], [10,12], [11,11], [12,11], [9,11]]\n\ndef F(C,X):\n    Carr = np.array(C)\n    Xarr = np.array(X)\n    distances = [np.sum( (Xarr - Carr[i])**2, axis=1) for i in range(len(C))]\n    closests = np.argmin( np.array(distances), axis=0 )\n    return list( np.bincount(closests) )\n\nprint(F(C,X))\n
            \n

            will print:

            \n
            [3, 5]\n
            \n soup wrap:

            There might be a faster, but this do the work:

            import numpy as np
            
            C = [[1,1], [10,10]]
            X = [[1,2], [1,3], [2,1], [10,11], [10,12], [11,11], [12,11], [9,11]]
            
            def F(C,X):
                Carr = np.array(C)
                Xarr = np.array(X)
                distances = [np.sum( (Xarr - Carr[i])**2, axis=1) for i in range(len(C))]
                closests = np.argmin( np.array(distances), axis=0 )
                return list( np.bincount(closests) )
            
            print(F(C,X))
            

            will print:

            [3, 5]
            
            qid & accept id: (32497161, 32497317) query: take column headers only from ASCII file in python soup:

            Just pass nrows=1 and then get the columns:

            \n
            pd.read_csv(file_path, nrows=1).columns\n
            \n

            Example:

            \n
            In [83]:\nimport io\nimport pandas as pd\nt="""index,col1,col2,col3\n0,1,2,3"""\npd.read_csv(io.StringIO(t), nrows=1).columns\n\nOut[83]:\nIndex(['index', 'col1', 'col2', 'col3'], dtype='object')\n
            \n

            You can ignore the io bit, and you may need to modify the read_csv params depending on your file path and separator character

            \n soup wrap:

            Just pass nrows=1 and then get the columns:

            pd.read_csv(file_path, nrows=1).columns
            

            Example:

            In [83]:
            import io
            import pandas as pd
            t="""index,col1,col2,col3
            0,1,2,3"""
            pd.read_csv(io.StringIO(t), nrows=1).columns
            
            Out[83]:
            Index(['index', 'col1', 'col2', 'col3'], dtype='object')
            

            You can ignore the io bit, and you may need to modify the read_csv params depending on your file path and separator character

            qid & accept id: (32537153, 32542792) query: Assistance on automated image/text Document soup:

            You can use weasyprint and python to create a such software.

            \n

            You create first the HTML of the document, this is a template it won't move, so you only have to do it once, here is a basic template:

            \n
            \n  \n    

            Trademark

            \n
            \n
            \n \n
            \n
            \n {{ text }}\n
            \n
            \n \n\n
            \n

            This script will write a file called input.html with the the {{ text }} area replaced by content of the text file. Mind the fact that you can add support for bold, italic even tables using markdown.

            \n
            with open('card.html') as f:\n    card = f.read()\n\nwith open('text.txt') as f:\n    text = f.read()\n\nwith open('input.html', 'w') as f:\n    f.write(card.replace('{{ text }}', text))\n
            \n

            Once you have a photo.jpg and the text you can execute the above script followed by the following command:

            \n
             weasyprint -f png -s styles.css input.html output.png\n
            \n

            This will generate something similar to:

            \n

            output of the script

            \n

            This requires better design, but you get the idea.

            \n soup wrap:

            You can use weasyprint and python to create a such software.

            You create first the HTML of the document, this is a template it won't move, so you only have to do it once, here is a basic template:

            
              
                

            Trademark

            {{ text }}

            This script will write a file called input.html with the the {{ text }} area replaced by content of the text file. Mind the fact that you can add support for bold, italic even tables using markdown.

            with open('card.html') as f:
                card = f.read()
            
            with open('text.txt') as f:
                text = f.read()
            
            with open('input.html', 'w') as f:
                f.write(card.replace('{{ text }}', text))
            

            Once you have a photo.jpg and the text you can execute the above script followed by the following command:

             weasyprint -f png -s styles.css input.html output.png
            

            This will generate something similar to:

            output of the script

            This requires better design, but you get the idea.

            qid & accept id: (32623285, 32624137) query: How to send cookie with scrapy CrawlSpider requests? soup:

            Okay. Try doing something like this.

            \n
            def start_requests(self):\n    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'}\n    for i,url in enumerate(self.start_urls):\n        yield Request(url,cookies={'over18':'1'}, callback=self.parse_item, headers=headers)\n
            \n

            It's the User-Agent which blocks you.

            \n

            Edit:

            \n

            Don't know what's wrong with CrawlSpider but Spider could work anyway.

            \n
            #!/usr/bin/env python\n# encoding: utf-8\nimport scrapy\n\n\nclass MySpider(scrapy.Spider):\n    name = 'redditscraper'\n    allowed_domains = ['reddit.com', 'imgur.com']\n    start_urls = ['https://www.reddit.com/r/nsfw']\n\n    def request(self, url, callback):\n        """\n         wrapper for scrapy.request\n        """\n        request = scrapy.Request(url=url, callback=callback)\n        request.cookies['over18'] = 1\n        request.headers['User-Agent'] = (\n            'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, '\n            'like Gecko) Chrome/45.0.2454.85 Safari/537.36')\n        return request\n\n    def start_requests(self):\n        for i, url in enumerate(self.start_urls):\n            yield self.request(url, self.parse_item)\n\n    def parse_item(self, response):\n        titleList = response.css('a.title')\n\n        for title in titleList:\n            item = {}\n            item['url'] = title.xpath('@href').extract()\n            item['title'] = title.xpath('text()').extract()\n            yield item\n        url = response.xpath('//a[@rel="nofollow next"]/@href').extract_first()\n        if url:\n            yield self.request(url, self.parse_item)\n        # you may consider scrapy.pipelines.images.ImagesPipeline :D\n
            \n soup wrap:

            Okay. Try doing something like this.

            def start_requests(self):
                headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36'}
                for i,url in enumerate(self.start_urls):
                    yield Request(url,cookies={'over18':'1'}, callback=self.parse_item, headers=headers)
            

            It's the User-Agent which blocks you.

            Edit:

            Don't know what's wrong with CrawlSpider but Spider could work anyway.

            #!/usr/bin/env python
            # encoding: utf-8
            import scrapy
            
            
            class MySpider(scrapy.Spider):
                name = 'redditscraper'
                allowed_domains = ['reddit.com', 'imgur.com']
                start_urls = ['https://www.reddit.com/r/nsfw']
            
                def request(self, url, callback):
                    """
                     wrapper for scrapy.request
                    """
                    request = scrapy.Request(url=url, callback=callback)
                    request.cookies['over18'] = 1
                    request.headers['User-Agent'] = (
                        'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, '
                        'like Gecko) Chrome/45.0.2454.85 Safari/537.36')
                    return request
            
                def start_requests(self):
                    for i, url in enumerate(self.start_urls):
                        yield self.request(url, self.parse_item)
            
                def parse_item(self, response):
                    titleList = response.css('a.title')
            
                    for title in titleList:
                        item = {}
                        item['url'] = title.xpath('@href').extract()
                        item['title'] = title.xpath('text()').extract()
                        yield item
                    url = response.xpath('//a[@rel="nofollow next"]/@href').extract_first()
                    if url:
                        yield self.request(url, self.parse_item)
                    # you may consider scrapy.pipelines.images.ImagesPipeline :D
            
            qid & accept id: (32625593, 32627190) query: Retrieving Data from MySQL in batches via Python soup:

            First point: a python db-api.cursor is an iterator, so unless you really need to load a whole batch in memory at once, you can just start with using this feature, ie instead of:

            \n
            cursor.execute("SELECT * FROM mytable")\nrows = cursor.fetchall()\nfor row in rows:\n   do_something_with(row)\n
            \n

            you could just:

            \n
            cursor.execute("SELECT * FROM mytable")\nfor row in cursor:\n   do_something_with(row)\n
            \n

            Then if your db connector's implementation still doesn't make proper use of this feature, it will be time to add LIMIT and OFFSET to the mix:

            \n
            cursor.execute("SELECT count(*) FROM mytable")\ncount = cursor.fetchone()[0]\nbatch_size = 42 # whatever\n\nfor offset in xrange(0, count, batch_size):\n    cursor.execute(\n        "SELECT * FROM mytable LIMIT %s OFFSET %s", \n        (batch_size, offset))\n   for row in cursor:\n       do_something_with(row)\n
            \n soup wrap:

            First point: a python db-api.cursor is an iterator, so unless you really need to load a whole batch in memory at once, you can just start with using this feature, ie instead of:

            cursor.execute("SELECT * FROM mytable")
            rows = cursor.fetchall()
            for row in rows:
               do_something_with(row)
            

            you could just:

            cursor.execute("SELECT * FROM mytable")
            for row in cursor:
               do_something_with(row)
            

            Then if your db connector's implementation still doesn't make proper use of this feature, it will be time to add LIMIT and OFFSET to the mix:

            cursor.execute("SELECT count(*) FROM mytable")
            count = cursor.fetchone()[0]
            batch_size = 42 # whatever
            
            for offset in xrange(0, count, batch_size):
                cursor.execute(
                    "SELECT * FROM mytable LIMIT %s OFFSET %s", 
                    (batch_size, offset))
               for row in cursor:
                   do_something_with(row)
            
            qid & accept id: (32652039, 32673682) query: How to scrape the video src url from video tag which is injected via javascript? soup:

            I've coded something for you. It extracts all the videos from POPSCI episodes pages:

            \n
            import re\nimport requests\nfrom lxml import html\n\ndef getVideosLinks(content):\n    videos = re.findall('(http://[\.\w/_]+\.mp[34])', content)\n    return videos\n\ndef prepareJSONurl(episode_hash):\n    json_url = "http://pepto.portico.net2.tv/playlist/{hash}".format(hash=episode_hash)\n    return json_url\n\ndef extractEpisodeHash(content):\n    tree = html.fromstring(content)\n    video_url = tree.xpath('//meta[contains(@http-equiv, "refresh")]/@content')[0].split('=',1)[1]\n    episode_hash = re.findall('episode=([\w]+)', video_url)\n    return episode_hash[0]\n\ndef extractIframeURL(content):\n    iframe_url = None\n    tree = html.fromstring(content)\n    try:\n        iframe_url = tree.xpath('//iframe/@src')[0]\n        is_video = True\n    except:\n        is_video = False\n    return is_video, iframe_url\n\n\nPOPSCI_URL = "http://www.popsci.com/thorium-dream"\n\nresponse = requests.get(POPSCI_URL)\nis_video, iframe_url = extractIframeURL(response.content)\n\nif is_video:\n    response_from_iframe_url = requests.get(iframe_url)\n    episode_hash = extractEpisodeHash(response_from_iframe_url.content)\n\n    json_url = prepareJSONurl(episode_hash)\n    final_response = requests.get(json_url)\n\n    for video in getVideosLinks(final_response.content):\n        print "Video: {}".format(video)\nelse:\n    print "This is not a POPSCI video page :|"\n
            \n

            They have different video qualities and sizes, so you will see more than one .mp4 video URL for each episode.

            \n

            This code works for any POPSCI episodes page, try changing POPSCI_URL to...

            \n
            POPSCI_URL = "http://www.popsci.com/maker-faire-2015"\n
            \n

            ... and it will still work.

            \n

            ADDED:

            \n

            Even so it is not recommended to parse HTML with Regular Expressions (regexp) I have created a regexp version for you (as requested). It works but regular expressions could be improved:

            \n
            import re\nimport requests\n\ndef getVideosLinks(content):\n    videos = re.findall('(http://[\.\w/_]+\.mp[34])', content)\n    return videos\n\ndef prepareJSONurl(episode_hash):\n    json_url = "http://pepto.portico.net2.tv/playlist/{hash}".format(hash=episode_hash)\n    return json_url\n\ndef extractEpisodeHash(content):\n    episode_hash = re.findall('
            \n

            Hope this helps

            \n soup wrap:

            I've coded something for you. It extracts all the videos from POPSCI episodes pages:

            import re
            import requests
            from lxml import html
            
            def getVideosLinks(content):
                videos = re.findall('(http://[\.\w/_]+\.mp[34])', content)
                return videos
            
            def prepareJSONurl(episode_hash):
                json_url = "http://pepto.portico.net2.tv/playlist/{hash}".format(hash=episode_hash)
                return json_url
            
            def extractEpisodeHash(content):
                tree = html.fromstring(content)
                video_url = tree.xpath('//meta[contains(@http-equiv, "refresh")]/@content')[0].split('=',1)[1]
                episode_hash = re.findall('episode=([\w]+)', video_url)
                return episode_hash[0]
            
            def extractIframeURL(content):
                iframe_url = None
                tree = html.fromstring(content)
                try:
                    iframe_url = tree.xpath('//iframe/@src')[0]
                    is_video = True
                except:
                    is_video = False
                return is_video, iframe_url
            
            
            POPSCI_URL = "http://www.popsci.com/thorium-dream"
            
            response = requests.get(POPSCI_URL)
            is_video, iframe_url = extractIframeURL(response.content)
            
            if is_video:
                response_from_iframe_url = requests.get(iframe_url)
                episode_hash = extractEpisodeHash(response_from_iframe_url.content)
            
                json_url = prepareJSONurl(episode_hash)
                final_response = requests.get(json_url)
            
                for video in getVideosLinks(final_response.content):
                    print "Video: {}".format(video)
            else:
                print "This is not a POPSCI video page :|"
            

            They have different video qualities and sizes, so you will see more than one .mp4 video URL for each episode.

            This code works for any POPSCI episodes page, try changing POPSCI_URL to...

            POPSCI_URL = "http://www.popsci.com/maker-faire-2015"
            

            ... and it will still work.

            ADDED:

            Even so it is not recommended to parse HTML with Regular Expressions (regexp) I have created a regexp version for you (as requested). It works but regular expressions could be improved:

            import re
            import requests
            
            def getVideosLinks(content):
                videos = re.findall('(http://[\.\w/_]+\.mp[34])', content)
                return videos
            
            def prepareJSONurl(episode_hash):
                json_url = "http://pepto.portico.net2.tv/playlist/{hash}".format(hash=episode_hash)
                return json_url
            
            def extractEpisodeHash(content):
                episode_hash = re.findall('

            Hope this helps

            qid & accept id: (32659345, 32659633) query: How to use argparse during runtime to conditionally get further input? soup:

            argparse.parse_args() takes an iterable of strings; you are passing a single string. Try

            \n
            def create():\n    author_parse = argparse.ArgumentParser()\n    author_parse.add_argument('name', type=str, nargs=2)\n    name = raw_input("Provide your first and last name: ")\n    auth_args = auth_parse.parse_args(name.split())\n
            \n

            A string, being iterable itself, is treated like a list of the characters in the string. That is,

            \n
            auth_parse.parse_args("John Smith")\n
            \n

            produces the same result as

            \n
            auth_parse.parse_args(["J", "o", "h", "n", " ", "S", "m", "i", "t", "h"])\n
            \n soup wrap:

            argparse.parse_args() takes an iterable of strings; you are passing a single string. Try

            def create():
                author_parse = argparse.ArgumentParser()
                author_parse.add_argument('name', type=str, nargs=2)
                name = raw_input("Provide your first and last name: ")
                auth_args = auth_parse.parse_args(name.split())
            

            A string, being iterable itself, is treated like a list of the characters in the string. That is,

            auth_parse.parse_args("John Smith")
            

            produces the same result as

            auth_parse.parse_args(["J", "o", "h", "n", " ", "S", "m", "i", "t", "h"])
            
            qid & accept id: (32665833, 32665839) query: Add [] around numbers in strings soup:

            Yes, there is a simple way, using re.sub():

            \n
            result = re.sub(r'(\d+)', r'[\1]', inputstring)\n
            \n

            Here \d matches a digit, \d+ matches 1 or more digits. The (...) around that pattern groups the match so we can refer to it in the second argument, the replacement pattern. That pattern simply replaces the matched digits with [...] around the group.

            \n

            Note that I used r'..' raw string literals; if you don't you'd have to double all the \ backslashes; see the Backslash Plague section of the Python Regex HOWTO.

            \n

            Demo:

            \n
            >>> import re\n>>> inputstring = "pixel1blue pin10off output2high foo9182bar"\n>>> re.sub(r'(\d+)', r'[\1]', inputstring)\n'pixel[1]blue pin[10]off output[2]high foo[9182]bar'\n
            \n soup wrap:

            Yes, there is a simple way, using re.sub():

            result = re.sub(r'(\d+)', r'[\1]', inputstring)
            

            Here \d matches a digit, \d+ matches 1 or more digits. The (...) around that pattern groups the match so we can refer to it in the second argument, the replacement pattern. That pattern simply replaces the matched digits with [...] around the group.

            Note that I used r'..' raw string literals; if you don't you'd have to double all the \ backslashes; see the Backslash Plague section of the Python Regex HOWTO.

            Demo:

            >>> import re
            >>> inputstring = "pixel1blue pin10off output2high foo9182bar"
            >>> re.sub(r'(\d+)', r'[\1]', inputstring)
            'pixel[1]blue pin[10]off output[2]high foo[9182]bar'
            
            qid & accept id: (32670153, 32670275) query: How to find the index value of a variable in SPSS Python soup:

            From the documentation of the Variable Class, you can get a reference to the variable by name or by index:

            \n
            # Create a Variable object, specifying the variable by name\nvarObj = datasetObj.varlist['bdate']\n# Create a Variable object, specifying the variable by index\nvarObj = datasetObj.varlist[3]\n
            \n

            So in your case:

            \n
            varObj = datasetObj.varlist['ID']\n
            \n

            You can, if needed, get the index of the variable by its name, using the index property:

            \n
            varIndex = datasetObj.varlist['ID'].index\n
            \n soup wrap:

            From the documentation of the Variable Class, you can get a reference to the variable by name or by index:

            # Create a Variable object, specifying the variable by name
            varObj = datasetObj.varlist['bdate']
            # Create a Variable object, specifying the variable by index
            varObj = datasetObj.varlist[3]
            

            So in your case:

            varObj = datasetObj.varlist['ID']
            

            You can, if needed, get the index of the variable by its name, using the index property:

            varIndex = datasetObj.varlist['ID'].index
            
            qid & accept id: (32678322, 32678369) query: Updating a dict which is stored in an array soup:

            To count the frequency of words in an array of strings, you can use Counter from collections:

            \n
            In [89]: from collections import Counter\n\nIn [90]: s=r'So I have an array of words, stored as key value pairs. Now I am trying to count the frequency of words in an array of strings, tokens. I have tried the following but this doesnt find the index of x as it is only a string. I do not have the corresponding value, if any, of x in tokens array. Is there any way to directly access it rather than adding one more loop to find it first?'\n\nIn [91]: tokens=s.split()\n\nIn [92]: c=Counter(tokens)\n\nIn [93]: print c\nCounter({'of': 5, 'I': 4, 'the': 4, 'it': 3, 'have': 3, 'to': 3, 'an': 2, 'as': 2, 'in': 2, 'array': 2, 'find': 2, 'x': 2, 'value,': 1, 'words': 1, 'do': 1, 'there': 1, 'is': 1, 'am': 1, 'frequency': 1, 'if': 1, 'string.': 1, 'index': 1, 'one': 1, 'directly': 1, 'tokens.': 1, 'any': 1, 'access': 1, 'only': 1, 'array.': 1, 'way': 1, 'doesnt': 1, 'Now': 1, 'words,': 1, 'more': 1, 'a': 1, 'corresponding': 1, 'tried': 1, 'than': 1, 'adding': 1, 'strings,': 1, 'but': 1, 'tokens': 1, 'So': 1, 'key': 1, 'first?': 1, 'not': 1, 'trying': 1, 'pairs.': 1, 'count': 1, 'this': 1, 'Is': 1, 'value': 1, 'rather': 1, 'any,': 1, 'stored': 1, 'following': 1, 'loop': 1})\n\nIn [94]: c['of']\nOut[94]: 5\n
            \n

            EDIT:

            \n

            To count words manually when you have an outer loop. Tokens is changing with each iteration, what @Alexander suggested is a good way. Also, Counter supports + operator, which makes accumulative counting easier:

            \n
            In [30]: (c+c)['of']\nOut[30]: 10\n
            \n soup wrap:

            To count the frequency of words in an array of strings, you can use Counter from collections:

            In [89]: from collections import Counter
            
            In [90]: s=r'So I have an array of words, stored as key value pairs. Now I am trying to count the frequency of words in an array of strings, tokens. I have tried the following but this doesnt find the index of x as it is only a string. I do not have the corresponding value, if any, of x in tokens array. Is there any way to directly access it rather than adding one more loop to find it first?'
            
            In [91]: tokens=s.split()
            
            In [92]: c=Counter(tokens)
            
            In [93]: print c
            Counter({'of': 5, 'I': 4, 'the': 4, 'it': 3, 'have': 3, 'to': 3, 'an': 2, 'as': 2, 'in': 2, 'array': 2, 'find': 2, 'x': 2, 'value,': 1, 'words': 1, 'do': 1, 'there': 1, 'is': 1, 'am': 1, 'frequency': 1, 'if': 1, 'string.': 1, 'index': 1, 'one': 1, 'directly': 1, 'tokens.': 1, 'any': 1, 'access': 1, 'only': 1, 'array.': 1, 'way': 1, 'doesnt': 1, 'Now': 1, 'words,': 1, 'more': 1, 'a': 1, 'corresponding': 1, 'tried': 1, 'than': 1, 'adding': 1, 'strings,': 1, 'but': 1, 'tokens': 1, 'So': 1, 'key': 1, 'first?': 1, 'not': 1, 'trying': 1, 'pairs.': 1, 'count': 1, 'this': 1, 'Is': 1, 'value': 1, 'rather': 1, 'any,': 1, 'stored': 1, 'following': 1, 'loop': 1})
            
            In [94]: c['of']
            Out[94]: 5
            

            EDIT:

            To count words manually when you have an outer loop. Tokens is changing with each iteration, what @Alexander suggested is a good way. Also, Counter supports + operator, which makes accumulative counting easier:

            In [30]: (c+c)['of']
            Out[30]: 10
            
            qid & accept id: (32679481, 32680096) query: How to plot histogram of multiple lists? soup:

            Histogram is probably not what you need. It's a good solution if you have a list of numbers (for example, IQs of people) and you want to attribute each number to a category (f.e. 79-, 80-99, 100+). There will be 3 bins and height of each bin will represent the quantity of numbers that fit in the corresponding category.

            \n

            In your case, you already have the height of each bin, so (as I understand) what you want is a plot that looks like like a histogram. This (as I understand) is not supported by matplotlib and would require using matplotlib not the way it was intended to be used.

            \n

            If you're OK with using plots instead of histograms, that's what you can do.

            \n
            import matplotlib.pyplot as plt\n\nlists = [data[project]["tweets"] for project in data] # Collect all lists into one\nsum_list = [sum(x) for x in zip(*lists)] # Create a list with sums of tweets for each day\n\nplt.plot(sum_list) # Create a plot for sum_list\nplt.show() # Show the plot\n
            \n

            If you want to make a plot look like a histogram you should do that:

            \n
            plt.bar(range(0, len(sum_list)), sum_list)\n
            \n

            instead of plt.plot.

            \n soup wrap:

            Histogram is probably not what you need. It's a good solution if you have a list of numbers (for example, IQs of people) and you want to attribute each number to a category (f.e. 79-, 80-99, 100+). There will be 3 bins and height of each bin will represent the quantity of numbers that fit in the corresponding category.

            In your case, you already have the height of each bin, so (as I understand) what you want is a plot that looks like like a histogram. This (as I understand) is not supported by matplotlib and would require using matplotlib not the way it was intended to be used.

            If you're OK with using plots instead of histograms, that's what you can do.

            import matplotlib.pyplot as plt
            
            lists = [data[project]["tweets"] for project in data] # Collect all lists into one
            sum_list = [sum(x) for x in zip(*lists)] # Create a list with sums of tweets for each day
            
            plt.plot(sum_list) # Create a plot for sum_list
            plt.show() # Show the plot
            

            If you want to make a plot look like a histogram you should do that:

            plt.bar(range(0, len(sum_list)), sum_list)
            

            instead of plt.plot.

            qid & accept id: (32687664, 32687759) query: Combine two lists which have the same item in dict soup:

            You can create an intermediate dictionary of {category_name: dict} and then use update:

            \n
            temp = {a['category_name']: dict(a) for a in a_list}\nfor b in b_list:\n    temp[b['category_name']].update(b)\nc_list = list(temp.values())    # list() unnecessary in py2.X\n
            \n

            But this isn't guaranteed to preserve the order of the lists. If order is important:

            \n
            c_list = [temp[a['category_name']] for a in a_list]\n
            \n soup wrap:

            You can create an intermediate dictionary of {category_name: dict} and then use update:

            temp = {a['category_name']: dict(a) for a in a_list}
            for b in b_list:
                temp[b['category_name']].update(b)
            c_list = list(temp.values())    # list() unnecessary in py2.X
            

            But this isn't guaranteed to preserve the order of the lists. If order is important:

            c_list = [temp[a['category_name']] for a in a_list]
            
            qid & accept id: (32690945, 32691081) query: Filtering for multiple strings on f.read soup:

            You are close in your fist code but you need to use or between conditions not objects, so you can change it to following :

            \n
            with open('file_name') as f:\n    fi = f.read()\n    if 'string' in fi or 'string2' in fi or 'string3' in fi:\n
            \n

            But instead of that you can use built-in function any :

            \n
            with open('file_name') as f:\n    fi = f.read()\n    if any(i in fi for i in word_set)\n
            \n

            And if you are dealing with a huge file instead of loading the whole of file content in memory you can check the existence of strings in each line with a function :

            \n
            def my_func(word_set):\n    with open('file_name') as f:\n        for line in f:\n            if any(i in line for i in word_set):\n                return True\n        return False\n
            \n soup wrap:

            You are close in your fist code but you need to use or between conditions not objects, so you can change it to following :

            with open('file_name') as f:
                fi = f.read()
                if 'string' in fi or 'string2' in fi or 'string3' in fi:
            

            But instead of that you can use built-in function any :

            with open('file_name') as f:
                fi = f.read()
                if any(i in fi for i in word_set)
            

            And if you are dealing with a huge file instead of loading the whole of file content in memory you can check the existence of strings in each line with a function :

            def my_func(word_set):
                with open('file_name') as f:
                    for line in f:
                        if any(i in line for i in word_set):
                            return True
                    return False
            
            qid & accept id: (32700996, 32701699) query: Optimizing time series generation soup:

            numba often works well for these type of problems. You could also a get a similar result with cython with more annotations.

            \n
            @numba.jit(nopython=True)\ndef generate_values(mins, maxs, vals):\n    N = len(vals)\n    ans = np.empty(N)\n\n    for i in range(N):\n        for j in range(i, N):\n            if vals[j] < mins[i] or vals[j] > maxs[i]:\n                ans[i] = vals[j]\n                break\n        else:\n            ans[i] = np.nan\n    return ans\n
            \n

            A bit verbose, but very fast.

            \n
            In [278]: %%time\n     ...: LIMIT = len(df)\n     ...: for i in range(LIMIT):\n     ...:     df['shift'] = df['shift'].shift(-1)\n     ...:     df['result'].update(df['shift'][((df['shift'] < df['min']) | \\n     ...:                                      (df['shift'] > df['max'])) & \\n     ...:                                     (df['result'].isnull())])\nWall time: 62 ms\n\n\nIn [281]: %timeit generate_values(df['min'].values, df['max'].values, df['val'].values)\n10000 loops, best of 3: 20.6 µs per loop\n
            \n soup wrap:

            numba often works well for these type of problems. You could also a get a similar result with cython with more annotations.

            @numba.jit(nopython=True)
            def generate_values(mins, maxs, vals):
                N = len(vals)
                ans = np.empty(N)
            
                for i in range(N):
                    for j in range(i, N):
                        if vals[j] < mins[i] or vals[j] > maxs[i]:
                            ans[i] = vals[j]
                            break
                    else:
                        ans[i] = np.nan
                return ans
            

            A bit verbose, but very fast.

            In [278]: %%time
                 ...: LIMIT = len(df)
                 ...: for i in range(LIMIT):
                 ...:     df['shift'] = df['shift'].shift(-1)
                 ...:     df['result'].update(df['shift'][((df['shift'] < df['min']) | \
                 ...:                                      (df['shift'] > df['max'])) & \
                 ...:                                     (df['result'].isnull())])
            Wall time: 62 ms
            
            
            In [281]: %timeit generate_values(df['min'].values, df['max'].values, df['val'].values)
            10000 loops, best of 3: 20.6 µs per loop
            
            qid & accept id: (32719686, 32746864) query: Importing Denormalized data into django models via modelforms soup:

            I came up with a partial solution, at least for the problem involving the choices. I guess with some tinkering it could work for ForeignKey fields as well.

            \n

            First, I define a function get_choice_by_name which goes through a choices tuple and looks for a key by value.

            \n

            Then I subclassed TypedChoiceField and overrode its clean() method to transform the data. This method seems to get called before any validation.

            \n

            Here's the code:

            \n
            def get_choice_by_name(name, choices, case_sensitive=False):\n    try:\n        if name is None:\n            return ''\n        elif name and not case_sensitive:\n            return next(k for k, n in choices\n                        if n.lower() == name.lower())\n        else:\n            return next(k for k, n in choices if n == name)\n    except StopIteration:\n        raise ValueError(\n            "Invalid choice: {}, not found in {}".format(name, choices)\n        )\n\nclass DenormalizedChoiceField(TypedChoiceField):\n\n    def clean(self, value):\n        if not value:\n            return self.empty_value\n        try:\n            value = get_choice_by_name(value, self.choices)\n        except ValueError as e:\n            raise ValidationError(str(e))\n\n        value = super(DenormalizedChoiceField, self).clean(value)\n        return value\n
            \n

            My ModelForm now just needs to redefine the fields in question as DenormalizedChoiceField. I need to specify the choices explicitly, though, for some reason it doesn't pick this up from the model if you override the field.

            \n
            class PersonForm(forms.ModelForm):\n    favorite_color = DenormalizedChoiceField(choices=Person.COLORS)\n    class Meta:\n        model = Person\n        fields = '__all__'\n
            \n soup wrap:

            I came up with a partial solution, at least for the problem involving the choices. I guess with some tinkering it could work for ForeignKey fields as well.

            First, I define a function get_choice_by_name which goes through a choices tuple and looks for a key by value.

            Then I subclassed TypedChoiceField and overrode its clean() method to transform the data. This method seems to get called before any validation.

            Here's the code:

            def get_choice_by_name(name, choices, case_sensitive=False):
                try:
                    if name is None:
                        return ''
                    elif name and not case_sensitive:
                        return next(k for k, n in choices
                                    if n.lower() == name.lower())
                    else:
                        return next(k for k, n in choices if n == name)
                except StopIteration:
                    raise ValueError(
                        "Invalid choice: {}, not found in {}".format(name, choices)
                    )
            
            class DenormalizedChoiceField(TypedChoiceField):
            
                def clean(self, value):
                    if not value:
                        return self.empty_value
                    try:
                        value = get_choice_by_name(value, self.choices)
                    except ValueError as e:
                        raise ValidationError(str(e))
            
                    value = super(DenormalizedChoiceField, self).clean(value)
                    return value
            

            My ModelForm now just needs to redefine the fields in question as DenormalizedChoiceField. I need to specify the choices explicitly, though, for some reason it doesn't pick this up from the model if you override the field.

            class PersonForm(forms.ModelForm):
                favorite_color = DenormalizedChoiceField(choices=Person.COLORS)
                class Meta:
                    model = Person
                    fields = '__all__'
            
            qid & accept id: (32772190, 32778103) query: How to find connected components in a matrix using Julia soup:

            Using Image.jl's label_components is indeed the easiest way to solve the core problem. However, your loop over 1:maximum(labels) may not be efficient: it's O(N*n), where N is the number of elements in labels and n the maximum, because you visit each element of labels n times.

            \n

            You'd be much better off just visiting each element of labels just twice: once to determine the maximum, and once to assign each non-zero element to its proper group:

            \n
            using Images\n\nfunction collect_groups(labels)\n    groups = [Int[] for i = 1:maximum(labels)]\n    for (i,l) in enumerate(labels)\n        if l != 0\n            push!(groups[l], i)\n        end\n    end\n    groups\nend\n\nmat = [1 1 0 0 0 ; 1 1 0 0 0 ; 0 0 0 0 1 ; 0 0 0 1 1]\n\nlabels = label_components(mat)\ngroups = collect_groups(labels)\n
            \n

            Output on your test matrix:

            \n
            2-element Array{Array{Int64,1},1}:\n [1,2,5,6] \n [16,19,20]\n
            \n

            Calling library functions like find can occasionally be useful, but it's also a habit from slower languages that's worth leaving behind. In julia, you can write your own loops and they will be fast; better yet, often the resulting algorithm is much easier to understand. collect(zip(ind2sub(size(mat),find( x -> x == value, mat))...)) does not exactly roll off the tongue.

            \n soup wrap:

            Using Image.jl's label_components is indeed the easiest way to solve the core problem. However, your loop over 1:maximum(labels) may not be efficient: it's O(N*n), where N is the number of elements in labels and n the maximum, because you visit each element of labels n times.

            You'd be much better off just visiting each element of labels just twice: once to determine the maximum, and once to assign each non-zero element to its proper group:

            using Images
            
            function collect_groups(labels)
                groups = [Int[] for i = 1:maximum(labels)]
                for (i,l) in enumerate(labels)
                    if l != 0
                        push!(groups[l], i)
                    end
                end
                groups
            end
            
            mat = [1 1 0 0 0 ; 1 1 0 0 0 ; 0 0 0 0 1 ; 0 0 0 1 1]
            
            labels = label_components(mat)
            groups = collect_groups(labels)
            

            Output on your test matrix:

            2-element Array{Array{Int64,1},1}:
             [1,2,5,6] 
             [16,19,20]
            

            Calling library functions like find can occasionally be useful, but it's also a habit from slower languages that's worth leaving behind. In julia, you can write your own loops and they will be fast; better yet, often the resulting algorithm is much easier to understand. collect(zip(ind2sub(size(mat),find( x -> x == value, mat))...)) does not exactly roll off the tongue.

            qid & accept id: (32792411, 32792467) query: Get ALL results of a word mapping with a dictionary soup:

            You can use itertools.product. For example, let's define the preliminaries:

            \n
            >>> import itertools\n>>> s = 'kfc'\n>>> d = {'k':'1', 'c':'3'}\n
            \n

            Now, let's compute the result:

            \n
            >>> [ ''.join(x) for x in itertools.product( *[(c, d.get(c)) if d.get(c) else c for c in s] ) ]\n['kfc', 'kf3', '1fc', '1f3']\n
            \n

            How it works

            \n

            First, we use list comprehension to get the posibilities that we need to consider:

            \n
            >>> [(c, d.get(c)) if d.get(c) else c for c in s]\n[('k', '1'), 'f', ('c', '3')]\n
            \n

            In the above list comprehension, we iterate through each character c in string s. For each c, we assemble the possibilities which are either (c, d[c]) if d[c] exists or else just c if it doesn't.

            \n

            Next, we use itertools to create all the possible products:

            \n
            >>> list( itertools.product( *[(c, d.get(c)) if d.get(c) else c for c in s] ) )\n[('k', 'f', 'c'), ('k', 'f', '3'), ('1', 'f', 'c'), ('1', 'f', '3')]\n
            \n

            The above has the answers that we need. We just need to re-assemble the strings using ''.join:

            \n
            >>> [ ''.join(x) for x in itertools.product( *[(c, d.get(c)) if d.get(c) else c for c in s] ) ]\n['kfc', 'kf3', '1fc', '1f3']\n
            \n soup wrap:

            You can use itertools.product. For example, let's define the preliminaries:

            >>> import itertools
            >>> s = 'kfc'
            >>> d = {'k':'1', 'c':'3'}
            

            Now, let's compute the result:

            >>> [ ''.join(x) for x in itertools.product( *[(c, d.get(c)) if d.get(c) else c for c in s] ) ]
            ['kfc', 'kf3', '1fc', '1f3']
            

            How it works

            First, we use list comprehension to get the posibilities that we need to consider:

            >>> [(c, d.get(c)) if d.get(c) else c for c in s]
            [('k', '1'), 'f', ('c', '3')]
            

            In the above list comprehension, we iterate through each character c in string s. For each c, we assemble the possibilities which are either (c, d[c]) if d[c] exists or else just c if it doesn't.

            Next, we use itertools to create all the possible products:

            >>> list( itertools.product( *[(c, d.get(c)) if d.get(c) else c for c in s] ) )
            [('k', 'f', 'c'), ('k', 'f', '3'), ('1', 'f', 'c'), ('1', 'f', '3')]
            

            The above has the answers that we need. We just need to re-assemble the strings using ''.join:

            >>> [ ''.join(x) for x in itertools.product( *[(c, d.get(c)) if d.get(c) else c for c in s] ) ]
            ['kfc', 'kf3', '1fc', '1f3']
            
            qid & accept id: (32798908, 32798989) query: Dictionary items to variables soup:

            Method-1: Using vars()

            \n

            You can use the built-in function vars() to do that.

            \n
            \n

            Return the __dict__ attribute for a module, class, instance, or any\n other object with a __dict__ attribute.

            \n

            Without an argument, vars() acts like locals().

            \n
            \n
            In [1]: dct = {'key1': 1, 'key2': 2}\n\nIn [2]: vars().update(dct) # creates variables with name as keys and value as their corresponding value of 'dct' dictionary\n\nIn [3]: key1 # access 'key1' as variable\nOut[3]: 1\n\nIn [4]: key2 # access 'key2' as variable\nOut[4]: 2\n
            \n

            Method-2: Using **kwargs

            \n

            Another option is to use **kwargs as suggested by @Jonrsharpe which is a cleaner approach.

            \n

            We define a function some_function having arguments as the keys of the dictionary. We call this function and pass it the dictionary dct using **kwargs option. This will give us access to the keys key1 and key2 as variables inside that function.

            \n
            In [1]: dct = {'key1': 1, 'key2': 2}\n\nIn [2]: def some_func(key1, key2): # define keys as function parameters\n   ...:     print key1 # print value of variable 'key1'\n   ...:     print key2 # print value of variable 'key2'\n   ...:  \n\nIn [3]: some_func(**dct) # pass 'dct' dictionary using '**kwargs'\n1 # Value of variable 'key1'\n2 # Value of variable 'key2'\n
            \n soup wrap:

            Method-1: Using vars()

            You can use the built-in function vars() to do that.

            Return the __dict__ attribute for a module, class, instance, or any other object with a __dict__ attribute.

            Without an argument, vars() acts like locals().

            In [1]: dct = {'key1': 1, 'key2': 2}
            
            In [2]: vars().update(dct) # creates variables with name as keys and value as their corresponding value of 'dct' dictionary
            
            In [3]: key1 # access 'key1' as variable
            Out[3]: 1
            
            In [4]: key2 # access 'key2' as variable
            Out[4]: 2
            

            Method-2: Using **kwargs

            Another option is to use **kwargs as suggested by @Jonrsharpe which is a cleaner approach.

            We define a function some_function having arguments as the keys of the dictionary. We call this function and pass it the dictionary dct using **kwargs option. This will give us access to the keys key1 and key2 as variables inside that function.

            In [1]: dct = {'key1': 1, 'key2': 2}
            
            In [2]: def some_func(key1, key2): # define keys as function parameters
               ...:     print key1 # print value of variable 'key1'
               ...:     print key2 # print value of variable 'key2'
               ...:  
            
            In [3]: some_func(**dct) # pass 'dct' dictionary using '**kwargs'
            1 # Value of variable 'key1'
            2 # Value of variable 'key2'
            
            qid & accept id: (32802833, 32802877) query: Selecting a subset of functions from a list of functions in python soup:

            I don't think that there is a pythonic™ way to solve the question. But in my code it's quite a common situation, so I've written my own function for that:

            \n
            def applyfs(funcs, args):\n    """\n    Applies several functions to single set of arguments. This function takes\n    a list of functions, applies each to given arguments, and returns the list\n    of obtained results. For example:\n\n        >>> from operator import add, sub, mul\n        >>> list(applyfs([add, sub, mul], (10, 2)))\n        [12, 8, 20]\n\n    :param funcs: List of functions.\n    :param args:  List or tuple of arguments to apply to each function.\n    :return:      List of results, returned by each of `funcs`.\n    """\n    return map(lambda f: f(*args), funcs)\n
            \n

            In your case I would use it the following way:

            \n
            applyfs([mean, std, var, fxn4 ...], mylist)\n
            \n

            Note that you really don't have to use function names (as you would have to do in, for example, PHP4), in Python function is the callable object by itself and can be stored in a list.

            \n

            EDIT:

            \n

            Or possibly, it would be more pythonic to use list comprehension instead of map:

            \n
            results = [f(mylist) for f in [mean, std, var, fxn4 ...]]\n
            \n soup wrap:

            I don't think that there is a pythonic™ way to solve the question. But in my code it's quite a common situation, so I've written my own function for that:

            def applyfs(funcs, args):
                """
                Applies several functions to single set of arguments. This function takes
                a list of functions, applies each to given arguments, and returns the list
                of obtained results. For example:
            
                    >>> from operator import add, sub, mul
                    >>> list(applyfs([add, sub, mul], (10, 2)))
                    [12, 8, 20]
            
                :param funcs: List of functions.
                :param args:  List or tuple of arguments to apply to each function.
                :return:      List of results, returned by each of `funcs`.
                """
                return map(lambda f: f(*args), funcs)
            

            In your case I would use it the following way:

            applyfs([mean, std, var, fxn4 ...], mylist)
            

            Note that you really don't have to use function names (as you would have to do in, for example, PHP4), in Python function is the callable object by itself and can be stored in a list.

            EDIT:

            Or possibly, it would be more pythonic to use list comprehension instead of map:

            results = [f(mylist) for f in [mean, std, var, fxn4 ...]]
            
            qid & accept id: (32821122, 32821232) query: How to make a time object TZ aware without changing the value? soup:

            You can use the make_aware function from django on your naive datetime objects. You will then have to specify the time zone of your naive timestamps.

            \n
            now_ts = datetime.now(pytz.timezone('Europe/Istanbul'))\nnow_ts > make_aware(campaingObject.publish_end, pytz.timezone('Europe/Istanbul'))\n
            \n

            https://docs.djangoproject.com/en/1.8/ref/utils/#django.utils.timezone.make_aware

            \n

            On the other hand, you could also use the make_naive function to remove the timezone information from your now() timestamp:

            \n
            now_ts = datetime.now(pytz.timezone('Europe/Istanbul'))\nnow_naive = make_naive(now_ts, pytz.timezone('Europe/Istanbul'))\nnow_naive > campaingObject.publish_end\n
            \n

            https://docs.djangoproject.com/en/1.8/ref/utils/#django.utils.timezone.make_naive

            \n soup wrap:

            You can use the make_aware function from django on your naive datetime objects. You will then have to specify the time zone of your naive timestamps.

            now_ts = datetime.now(pytz.timezone('Europe/Istanbul'))
            now_ts > make_aware(campaingObject.publish_end, pytz.timezone('Europe/Istanbul'))
            

            https://docs.djangoproject.com/en/1.8/ref/utils/#django.utils.timezone.make_aware

            On the other hand, you could also use the make_naive function to remove the timezone information from your now() timestamp:

            now_ts = datetime.now(pytz.timezone('Europe/Istanbul'))
            now_naive = make_naive(now_ts, pytz.timezone('Europe/Istanbul'))
            now_naive > campaingObject.publish_end
            

            https://docs.djangoproject.com/en/1.8/ref/utils/#django.utils.timezone.make_naive

            qid & accept id: (32896987, 32897330) query: numpy tile without memory allocation soup:
            c = (b.reshape(2,4)+a).ravel()\n
            \n

            The reshape and ravel are both views, so (I think) the only new array is produced by the summation. In effect I am changing b to a shape that can be broadcasted with a.

            \n

            This is measureably faster, even in this small problem.

            \n
            \n

            broadcast_array lets you do the broadcasting in steps

            \n
            In [506]: b1,a1 = np.broadcast_arrays(b.reshape(2,4),a)  \n
            \n

            a1 is a view, as shown by the data buffer pointer

            \n
            In [507]: a1.__array_interface__['data']\nOut[507]: (164774704, False)\nIn [508]: a.__array_interface__['data']\nOut[508]: (164774704, False)\n
            \n

            The sum

            \n
            In [509]: a1+b1\nOut[509]: \narray([[ 2.04663934,  1.02951915,  1.30616273,  1.75154236],\n       [ 1.79237632,  1.08252741,  1.17031265,  1.2675438 ]])\n
            \n

            a1 has, effectively, been tiled without copying

            \n
            In [511]: a1.shape\nOut[511]: (2, 4)\nIn [512]: a1.strides\nOut[512]: (0, 8)\n
            \n

            Look at the np.lib.stride_tricks.py file for more details on this sort of broadcasting. np.lib.stride_tricks.as_strided is the underlying function that lets you construct a view with new shape and strides. It's been used most often on SO to construct sliding windows.

            \n soup wrap:
            c = (b.reshape(2,4)+a).ravel()
            

            The reshape and ravel are both views, so (I think) the only new array is produced by the summation. In effect I am changing b to a shape that can be broadcasted with a.

            This is measureably faster, even in this small problem.


            broadcast_array lets you do the broadcasting in steps

            In [506]: b1,a1 = np.broadcast_arrays(b.reshape(2,4),a)  
            

            a1 is a view, as shown by the data buffer pointer

            In [507]: a1.__array_interface__['data']
            Out[507]: (164774704, False)
            In [508]: a.__array_interface__['data']
            Out[508]: (164774704, False)
            

            The sum

            In [509]: a1+b1
            Out[509]: 
            array([[ 2.04663934,  1.02951915,  1.30616273,  1.75154236],
                   [ 1.79237632,  1.08252741,  1.17031265,  1.2675438 ]])
            

            a1 has, effectively, been tiled without copying

            In [511]: a1.shape
            Out[511]: (2, 4)
            In [512]: a1.strides
            Out[512]: (0, 8)
            

            Look at the np.lib.stride_tricks.py file for more details on this sort of broadcasting. np.lib.stride_tricks.as_strided is the underlying function that lets you construct a view with new shape and strides. It's been used most often on SO to construct sliding windows.

            qid & accept id: (32900442, 32901387) query: Large point-matrix array multiplication in numpy soup:

            You can use numpy.einsum. Here's an example with 5 matrices and 5 points:

            \n
            In [49]: matrices.shape\nOut[49]: (5, 3, 3)\n\nIn [50]: points.shape\nOut[50]: (5, 3)\n\nIn [51]: p = np.einsum('ijk,ik->ij', matrices, points)\n\nIn [52]: p[0]\nOut[52]: array([ 1.16532051,  0.95155227,  1.5130032 ])\n\nIn [53]: matrices[0].dot(points[0])\nOut[53]: array([ 1.16532051,  0.95155227,  1.5130032 ])\n\nIn [54]: p[1]\nOut[54]: array([ 0.79929572,  0.32048587,  0.81462493])\n\nIn [55]: matrices[1].dot(points[1])\nOut[55]: array([ 0.79929572,  0.32048587,  0.81462493])\n
            \n

            The above is doing matrix[i] * points[i] (i.e. multiplying on the right), but I just reread the question and noticed that your code uses points[i] * matrix[i]. You can do that by switching the indices and arguments of einsum:

            \n
            In [76]: lp = np.einsum('ij,ijk->ik', points, matrices)\n\nIn [77]: lp[0]\nOut[77]: array([ 1.39510822,  1.12011057,  1.05704609])\n\nIn [78]: points[0].dot(matrices[0])\nOut[78]: array([ 1.39510822,  1.12011057,  1.05704609])\n\nIn [79]: lp[1]\nOut[79]: array([ 0.49750324,  0.70664634,  0.7142573 ])\n\nIn [80]: points[1].dot(matrices[1])\nOut[80]: array([ 0.49750324,  0.70664634,  0.7142573 ])\n
            \n soup wrap:

            You can use numpy.einsum. Here's an example with 5 matrices and 5 points:

            In [49]: matrices.shape
            Out[49]: (5, 3, 3)
            
            In [50]: points.shape
            Out[50]: (5, 3)
            
            In [51]: p = np.einsum('ijk,ik->ij', matrices, points)
            
            In [52]: p[0]
            Out[52]: array([ 1.16532051,  0.95155227,  1.5130032 ])
            
            In [53]: matrices[0].dot(points[0])
            Out[53]: array([ 1.16532051,  0.95155227,  1.5130032 ])
            
            In [54]: p[1]
            Out[54]: array([ 0.79929572,  0.32048587,  0.81462493])
            
            In [55]: matrices[1].dot(points[1])
            Out[55]: array([ 0.79929572,  0.32048587,  0.81462493])
            

            The above is doing matrix[i] * points[i] (i.e. multiplying on the right), but I just reread the question and noticed that your code uses points[i] * matrix[i]. You can do that by switching the indices and arguments of einsum:

            In [76]: lp = np.einsum('ij,ijk->ik', points, matrices)
            
            In [77]: lp[0]
            Out[77]: array([ 1.39510822,  1.12011057,  1.05704609])
            
            In [78]: points[0].dot(matrices[0])
            Out[78]: array([ 1.39510822,  1.12011057,  1.05704609])
            
            In [79]: lp[1]
            Out[79]: array([ 0.49750324,  0.70664634,  0.7142573 ])
            
            In [80]: points[1].dot(matrices[1])
            Out[80]: array([ 0.49750324,  0.70664634,  0.7142573 ])
            
            qid & accept id: (32902648, 32903915) query: Python list comparison to create trees soup:

            This seems to be a case for the union-find, or disjoint-set algorithm. Here's an implementation I use to keep in my toolbox:

            \n
            from collections import defaultdict\n\nclass UnionFind:\n    def __init__(self):\n        self.leaders = defaultdict(lambda: None)\n\n    def find(self, x):\n        l = self.leaders[x]\n        if l is not None:\n            l = self.find(l)\n            self.leaders[x] = l\n            return l\n        return x\n\n    def union(self, x, y):\n        lx, ly = self.find(x), self.find(y)\n        if lx != ly:\n            self.leaders[lx] = ly\n\n    def get_groups(self):\n        groups = defaultdict(set)\n        for x in self.leaders:\n            groups[self.find(x)].add(x)\n        return groups\n
            \n

            And here how to apply it to your data:

            \n
            # parse data\ndata = """Group  Item-1  Item-2\n0       7       13\n0      10        4\n1       2        8\n1       3        1\n1       4        3\n1       6       28\n1       8        6"""\ndata = [[int(x) for x in line.split()] for line in data.splitlines()[1:]]\n\n# get mapping {group_number: [list of pairs]}\ngroups = defaultdict(list)\nfor g, x, y in data:\n    groups[g].append((x, y))\n\n# for each group, add pairs to union find structure and get groups\nfor group, links in groups.items():\n    union = UnionFind()\n    for x, y in links:\n        union.union(x, y)\n    print group, union.get_groups().values()\n
            \n

            Output is:

            \n
            0 [set([10, 4]), set([13, 7])]\n1 [set([1, 3, 4]), set([8, 2, 28, 6])]\n
            \n soup wrap:

            This seems to be a case for the union-find, or disjoint-set algorithm. Here's an implementation I use to keep in my toolbox:

            from collections import defaultdict
            
            class UnionFind:
                def __init__(self):
                    self.leaders = defaultdict(lambda: None)
            
                def find(self, x):
                    l = self.leaders[x]
                    if l is not None:
                        l = self.find(l)
                        self.leaders[x] = l
                        return l
                    return x
            
                def union(self, x, y):
                    lx, ly = self.find(x), self.find(y)
                    if lx != ly:
                        self.leaders[lx] = ly
            
                def get_groups(self):
                    groups = defaultdict(set)
                    for x in self.leaders:
                        groups[self.find(x)].add(x)
                    return groups
            

            And here how to apply it to your data:

            # parse data
            data = """Group  Item-1  Item-2
            0       7       13
            0      10        4
            1       2        8
            1       3        1
            1       4        3
            1       6       28
            1       8        6"""
            data = [[int(x) for x in line.split()] for line in data.splitlines()[1:]]
            
            # get mapping {group_number: [list of pairs]}
            groups = defaultdict(list)
            for g, x, y in data:
                groups[g].append((x, y))
            
            # for each group, add pairs to union find structure and get groups
            for group, links in groups.items():
                union = UnionFind()
                for x, y in links:
                    union.union(x, y)
                print group, union.get_groups().values()
            

            Output is:

            0 [set([10, 4]), set([13, 7])]
            1 [set([1, 3, 4]), set([8, 2, 28, 6])]
            
            qid & accept id: (32907015, 32907551) query: Alternatives to cartesian in Spark? soup:

            First lets define some helpers:

            \n
            def swap(x):\n    """Given a tuple (x1, x2) return (x2, 1)"""\n    return (x[1], 1)\n\ndef filter_source(x):\n    """Check if s1 < s2 in (x, (s1, s2))"""\n    return x[1][0] < x[1][1]\n\ndef reshape(kv):\n    """Reshape ((k1, k2), v) to get final result"""\n    ((k1, k2), v) = kv\n    return (k1, (k2, v))\n
            \n

            and create an example RDD:

            \n
            rdd = sc.parallelize([\n    (1, [3, 10, 11]), (2, [3, 4, 10, 11]),\n    (3, [1, 4]), (4, [2, 3, 10])])\n
            \n

            Finally you can do something like this:

            \n
            from operator import add\n\nflattened = rdd.flatMap(lambda kv: ((v, kv[0]) for v in kv[1])) # Flatten input\nflattened.first()\n# (1, 3) <- from (3, [1, 4])\n\nresult = (flattened \n    .join(flattened) # Perform self join using value from input as key\n    .filter(filter_source) # Remove pairs from the same source\n    .map(swap)\n    .reduceByKey(add)\n    .map(reshape)) # Get final output\n
            \n soup wrap:

            First lets define some helpers:

            def swap(x):
                """Given a tuple (x1, x2) return (x2, 1)"""
                return (x[1], 1)
            
            def filter_source(x):
                """Check if s1 < s2 in (x, (s1, s2))"""
                return x[1][0] < x[1][1]
            
            def reshape(kv):
                """Reshape ((k1, k2), v) to get final result"""
                ((k1, k2), v) = kv
                return (k1, (k2, v))
            

            and create an example RDD:

            rdd = sc.parallelize([
                (1, [3, 10, 11]), (2, [3, 4, 10, 11]),
                (3, [1, 4]), (4, [2, 3, 10])])
            

            Finally you can do something like this:

            from operator import add
            
            flattened = rdd.flatMap(lambda kv: ((v, kv[0]) for v in kv[1])) # Flatten input
            flattened.first()
            # (1, 3) <- from (3, [1, 4])
            
            result = (flattened 
                .join(flattened) # Perform self join using value from input as key
                .filter(filter_source) # Remove pairs from the same source
                .map(swap)
                .reduceByKey(add)
                .map(reshape)) # Get final output
            
            qid & accept id: (32910848, 32910930) query: A simple looping command In Python soup:

            You could put the code in a function, something like:

            \n
            def simple():\n    a = int(input("What's one of the angles?"))\n    b = int(input("What's the other angle in the triangle?"))\n    c = (a + b)\n    d = 180\n    f = int(180 - c)\n    print(f)\n
            \n

            and then simply type:

            \n
            simple()\n
            \n

            each time to use it.

            \n soup wrap:

            You could put the code in a function, something like:

            def simple():
                a = int(input("What's one of the angles?"))
                b = int(input("What's the other angle in the triangle?"))
                c = (a + b)
                d = 180
                f = int(180 - c)
                print(f)
            

            and then simply type:

            simple()
            

            each time to use it.

            qid & accept id: (32921049, 32921204) query: Pipe delimiter file, but no pipe inside data soup:

            Don't reinvent the value-separated file parsing wheel. Use the csv module to do the parsing and the writing for you.

            \n

            The csv module will add "..." quotes around values that contain the separator, so in principle you don't need to replace the | pipe symbols in the values. To replace the original file, write to a new (temporary) outputfile then move that back into place.

            \n
            import csv\nimport os\n\noutputfile = inputfile + '.tmp'\nwith open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:\n    reader = csv.reader(inf)\n    writer = csv.writer(outf, delimiter='|')\n    writer.writerows(reader)\nos.remove(inputfile)\nos.rename(outputfile, inputfile)\n
            \n

            For an input file containing:

            \n
            foo,bar|baz,spam\n
            \n

            this produces

            \n
            foo|"bar|baz"|spam\n
            \n

            Note that the middle column is wrapped in quotes.

            \n

            If you do need to replace the | characters in the values, you can do so as you copy the rows:

            \n
            outputfile = inputfile + '.tmp'\nwith open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:\n    reader = csv.reader(inf)\n    writer = csv.writer(outf, delimiter='|')\n    for row in reader:\n        writer.writerow([col.replace('|', ' ') for col in row])\nos.remove(inputfile)\nos.rename(outputfile, inputfile)\n
            \n

            Now the output for my example becomes:

            \n
            foo|bar baz|spam\n
            \n soup wrap:

            Don't reinvent the value-separated file parsing wheel. Use the csv module to do the parsing and the writing for you.

            The csv module will add "..." quotes around values that contain the separator, so in principle you don't need to replace the | pipe symbols in the values. To replace the original file, write to a new (temporary) outputfile then move that back into place.

            import csv
            import os
            
            outputfile = inputfile + '.tmp'
            with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
                reader = csv.reader(inf)
                writer = csv.writer(outf, delimiter='|')
                writer.writerows(reader)
            os.remove(inputfile)
            os.rename(outputfile, inputfile)
            

            For an input file containing:

            foo,bar|baz,spam
            

            this produces

            foo|"bar|baz"|spam
            

            Note that the middle column is wrapped in quotes.

            If you do need to replace the | characters in the values, you can do so as you copy the rows:

            outputfile = inputfile + '.tmp'
            with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
                reader = csv.reader(inf)
                writer = csv.writer(outf, delimiter='|')
                for row in reader:
                    writer.writerow([col.replace('|', ' ') for col in row])
            os.remove(inputfile)
            os.rename(outputfile, inputfile)
            

            Now the output for my example becomes:

            foo|bar baz|spam
            
            qid & accept id: (32935232, 32935278) query: Python: Apply function to values in nested dictionary soup:

            Visit all nested values recursively:

            \n
            import collections\n\ndef map_nested_dicts(ob, func):\n    if isinstance(ob, collections.Mapping):\n        return {k: map_nested_dicts(v, func) for k, v in ob.iteritems()}\n    else:\n        return func(ob)\n\nmap_nested_dicts(x, lambda v: v + 7)\n# Creates a new dict object:\n#    {'a': 8, 'b': {'c': 13, 'g': {'h': 10, 'i': 16}, 'd': 14}, 'e': {'f': 10}}\n
            \n

            In some cases it's desired to modify the original dict object (to avoid re-creating it):

            \n
            import collections\n\ndef map_nested_dicts_modify(ob, func):\n    for k, v in ob.iteritems():\n        if isinstance(v, collections.Mapping):\n            map_nested_dicts_modify(v, func)\n        else:\n            ob[k] = func(v)\n\nmap_nested_dicts_modify(x, lambda v: v + 7)\n# x is now\n#    {'a': 8, 'b': {'c': 13, 'g': {'h': 10, 'i': 16}, 'd': 14}, 'e': {'f': 10}}\n
            \n

            If you're using Python 3:

            \n
              \n
            • replace dict.iteritems with dict.items

            • \n
            • replace import collections with import collections.abc

            • \n
            • replace collections.Mapping with collections.abc.Mapping

            • \n
            \n soup wrap:

            Visit all nested values recursively:

            import collections
            
            def map_nested_dicts(ob, func):
                if isinstance(ob, collections.Mapping):
                    return {k: map_nested_dicts(v, func) for k, v in ob.iteritems()}
                else:
                    return func(ob)
            
            map_nested_dicts(x, lambda v: v + 7)
            # Creates a new dict object:
            #    {'a': 8, 'b': {'c': 13, 'g': {'h': 10, 'i': 16}, 'd': 14}, 'e': {'f': 10}}
            

            In some cases it's desired to modify the original dict object (to avoid re-creating it):

            import collections
            
            def map_nested_dicts_modify(ob, func):
                for k, v in ob.iteritems():
                    if isinstance(v, collections.Mapping):
                        map_nested_dicts_modify(v, func)
                    else:
                        ob[k] = func(v)
            
            map_nested_dicts_modify(x, lambda v: v + 7)
            # x is now
            #    {'a': 8, 'b': {'c': 13, 'g': {'h': 10, 'i': 16}, 'd': 14}, 'e': {'f': 10}}
            

            If you're using Python 3:

            • replace dict.iteritems with dict.items

            • replace import collections with import collections.abc

            • replace collections.Mapping with collections.abc.Mapping

            qid & accept id: (32935585, 32986926) query: Returning user to referrer in flask in smartest pythonic way soup:

            When rendering the form for the delete view, you can add a hidden form element named next:

            \n
            \n    \n    ...\n\n
            \n

            Then in your route:

            \n
            ...\nreturn redirect(request.form.get('next', '/'))\n
            \n

            Note: your redirect handling should take care to prevent the next parameter from being an absolute URL to an arbitrary site (see https://www.owasp.org/index.php/Open_redirect).

            \n soup wrap:

            When rendering the form for the delete view, you can add a hidden form element named next:

            ...

            Then in your route:

            ...
            return redirect(request.form.get('next', '/'))
            

            Note: your redirect handling should take care to prevent the next parameter from being an absolute URL to an arbitrary site (see https://www.owasp.org/index.php/Open_redirect).

            qid & accept id: (32940738, 32941001) query: Filtering in Django by a set of String soup:
            >>> from django.db.models import Q\n\n>>> values = ['1.01', '1.02']\n\n>>> query = Q()\n>>> for value in values:\n...     query |= Q(name__startswith=value)\n\n>>> Inventary.objects.filter(query)\n
            \n

            It dynamically builds a query that'll fetch the objects whose name starts with 1.01 or 1.02:

            \n
            >>> Inventary.objects.filter(Q(name__startswith='1.01') | Q(name__startswith='1.02'))\n
            \n soup wrap:
            >>> from django.db.models import Q
            
            >>> values = ['1.01', '1.02']
            
            >>> query = Q()
            >>> for value in values:
            ...     query |= Q(name__startswith=value)
            
            >>> Inventary.objects.filter(query)
            

            It dynamically builds a query that'll fetch the objects whose name starts with 1.01 or 1.02:

            >>> Inventary.objects.filter(Q(name__startswith='1.01') | Q(name__startswith='1.02'))
            
            qid & accept id: (32946908, 32957906) query: How should I subtract two dataframes and in Pandas and diplay the required output? soup:

            Hope I understood the question correctly. After grouping both groups as you did:

            \n
            MvT101group = MvT101.groupby('Order',sort=True).sum()\nMvT102group = MvT102.groupby('Order',sort=True).sum()\n
            \n

            You can update the columns' names for both groups:

            \n
            MvT101group.columns = MvT101group.columns.map(lambda x: str(x) + '_101')\nMvT102group.columns = MvT102group.columns.map(lambda x: str(x) + '_102')\n
            \n

            Then merge all 3 tables so that you will have all 3 columns in the main table:

            \n
            df = df.merge(MvT101group, left_on=['Order'], right_index=True, how='left')\ndf = df.merge(MvT102group, left_on=['Order'], right_index=True, how='left')\n
            \n

            And then you can add the calculated column:

            \n
            df['calc'] = (df['Order_101']-df['Order_102']) / 100\n
            \n soup wrap:

            Hope I understood the question correctly. After grouping both groups as you did:

            MvT101group = MvT101.groupby('Order',sort=True).sum()
            MvT102group = MvT102.groupby('Order',sort=True).sum()
            

            You can update the columns' names for both groups:

            MvT101group.columns = MvT101group.columns.map(lambda x: str(x) + '_101')
            MvT102group.columns = MvT102group.columns.map(lambda x: str(x) + '_102')
            

            Then merge all 3 tables so that you will have all 3 columns in the main table:

            df = df.merge(MvT101group, left_on=['Order'], right_index=True, how='left')
            df = df.merge(MvT102group, left_on=['Order'], right_index=True, how='left')
            

            And then you can add the calculated column:

            df['calc'] = (df['Order_101']-df['Order_102']) / 100
            
            qid & accept id: (32953680, 32956669) query: Get statistics from subgroups in pandas soup:

            Actually just tweaking your answer a little bit and realizing how you used iloc, pointed me on exactly what I needed. Posting it in case useful for someone:

            \n

            Instead of doing this that would give you the statistic of just subgroup 1:

            \n
            results2=[]\nfor item in results[1]:\n    results2.append(item -1)\nsub = df.iloc[results2]\nsub['three'].mean()\n
            \n

            I just did this, that would give you the mean (or anything you need) of every subgroup.

            \n
            for z in range(len(results)):\n    sub =  df.iloc[results[z]]\n    print sub['three'].mean()  \n
            \n soup wrap:

            Actually just tweaking your answer a little bit and realizing how you used iloc, pointed me on exactly what I needed. Posting it in case useful for someone:

            Instead of doing this that would give you the statistic of just subgroup 1:

            results2=[]
            for item in results[1]:
                results2.append(item -1)
            sub = df.iloc[results2]
            sub['three'].mean()
            

            I just did this, that would give you the mean (or anything you need) of every subgroup.

            for z in range(len(results)):
                sub =  df.iloc[results[z]]
                print sub['three'].mean()  
            
            qid & accept id: (32966627, 32972148) query: Matrix triple product with theano soup:
            np.einsum('nr,mr,lr->nml', A, B, C)\n
            \n

            is equivalent to

            \n
            np.dot(A[:, None, :] * B[None, :, :], C.T)\n
            \n

            which can be implemented in Theano as

            \n
            theano.dot(A[:, None, :] * B[None, :, :], C.T)\n
            \n soup wrap:
            np.einsum('nr,mr,lr->nml', A, B, C)
            

            is equivalent to

            np.dot(A[:, None, :] * B[None, :, :], C.T)
            

            which can be implemented in Theano as

            theano.dot(A[:, None, :] * B[None, :, :], C.T)
            
            qid & accept id: (32975636, 32976882) query: How to Use both Scala and Python in a same Spark project? soup:

            You can indeed pipe out to a python script using Scala and Spark and a regular Python script.

            \n

            test.py

            \n
            #!/usr/bin/python\n\nimport sys\n\nfor line in sys.stdin:\n  print "hello " + line\n
            \n

            spark-shell (scala)

            \n
            val data = List("john","paul","george","ringo")\n\nval dataRDD = sc.makeRDD(data)\n\nval scriptPath = "./test.py"\n\nval pipeRDD = dataRDD.pipe(scriptPath)\n\npipeRDD.foreach(println)\n
            \n

            Output

            \n

            hello john

            \n

            hello ringo

            \n

            hello george

            \n

            hello paul

            \n soup wrap:

            You can indeed pipe out to a python script using Scala and Spark and a regular Python script.

            test.py

            #!/usr/bin/python
            
            import sys
            
            for line in sys.stdin:
              print "hello " + line
            

            spark-shell (scala)

            val data = List("john","paul","george","ringo")
            
            val dataRDD = sc.makeRDD(data)
            
            val scriptPath = "./test.py"
            
            val pipeRDD = dataRDD.pipe(scriptPath)
            
            pipeRDD.foreach(println)
            

            Output

            hello john

            hello ringo

            hello george

            hello paul

            qid & accept id: (32981875, 32983951) query: How to add two Sparse Vectors in Spark using Python soup:

            Something like this should work:

            \n
            from pyspark.mllib.linalg import Vectors, SparseVector, DenseVector\nimport numpy as np\n\ndef add(v1, v2):\n    """Add two sparse vectors\n    >>> v1 = Vectors.sparse(3, {0: 1.0, 2: 1.0})\n    >>> v2 = Vectors.sparse(3, {1: 1.0})\n    >>> add(v1, v2)\n    SparseVector(3, {0: 1.0, 1: 1.0, 2: 1.0})\n    """\n    assert isinstance(v1, SparseVector) and isinstance(v2, SparseVector)\n    assert v1.size == v2.size \n    # Compute union of indices\n    indices = set(v1.indices).union(set(v2.indices))\n    # Not particularly efficient but we are limited by SPARK-10973\n    # Create index: value dicts\n    v1d = dict(zip(v1.indices, v1.values))\n    v2d = dict(zip(v2.indices, v2.values))\n    zero = np.float64(0)\n    # Create dictionary index: (v1[index] + v2[index])\n    values =  {i: v1d.get(i, zero) + v2d.get(i, zero)\n       for i in indices\n       if v1d.get(i, zero) + v2d.get(i, zero) != zero}\n\n    return Vectors.sparse(v1.size, values)\n
            \n

            If you prefer only single pass and don't care about introduced zeros you can modify above code like this:

            \n
            from collections import defaultdict\n\ndef add(v1, v2):\n    assert isinstance(v1, SparseVector) and isinstance(v2, SparseVector)\n    assert v1.size == v2.size\n    values = defaultdict(float) # Dictionary with default value 0.0\n    # Add values from v1\n    for i in range(v1.indices.size):\n        values[v1.indices[i]] += v1.values[i]\n    # Add values from v2\n    for i in range(v2.indices.size):\n        values[v2.indices[i]] += v2.values[i]\n    return Vectors.sparse(v1.size, dict(values))\n
            \n

            If you want you can try monkey patch SparseVector:

            \n
            SparseVector.__add__ = add\nv1 = Vectors.sparse(5, {0: 1.0, 2: 3.0})\nv2 = Vectors.sparse(5, {0: -3.0, 2: -3.0, 4: 10})\nv1 + v2\n## SparseVector(5, {0: -2.0, 4: 10.0})\n
            \n

            Alternatively you should be able to use scipy.sparse.

            \n
            from scipy.sparse import csc_matrix\nfrom pyspark.mllib.regression import LabeledPoint\n\nm1 = csc_matrix((\n   v1.values,\n   (v1.indices, [0] * v1.numNonzeros())),\n   shape=(v1.size, 1))\n\nm2 = csc_matrix((\n   v2.values,\n   (v2.indices, [0] * v2.numNonzeros())),\n   shape=(v2.size, 1))\n\nLabeledPoint(0, m1 + m2)\n
            \n soup wrap:

            Something like this should work:

            from pyspark.mllib.linalg import Vectors, SparseVector, DenseVector
            import numpy as np
            
            def add(v1, v2):
                """Add two sparse vectors
                >>> v1 = Vectors.sparse(3, {0: 1.0, 2: 1.0})
                >>> v2 = Vectors.sparse(3, {1: 1.0})
                >>> add(v1, v2)
                SparseVector(3, {0: 1.0, 1: 1.0, 2: 1.0})
                """
                assert isinstance(v1, SparseVector) and isinstance(v2, SparseVector)
                assert v1.size == v2.size 
                # Compute union of indices
                indices = set(v1.indices).union(set(v2.indices))
                # Not particularly efficient but we are limited by SPARK-10973
                # Create index: value dicts
                v1d = dict(zip(v1.indices, v1.values))
                v2d = dict(zip(v2.indices, v2.values))
                zero = np.float64(0)
                # Create dictionary index: (v1[index] + v2[index])
                values =  {i: v1d.get(i, zero) + v2d.get(i, zero)
                   for i in indices
                   if v1d.get(i, zero) + v2d.get(i, zero) != zero}
            
                return Vectors.sparse(v1.size, values)
            

            If you prefer only single pass and don't care about introduced zeros you can modify above code like this:

            from collections import defaultdict
            
            def add(v1, v2):
                assert isinstance(v1, SparseVector) and isinstance(v2, SparseVector)
                assert v1.size == v2.size
                values = defaultdict(float) # Dictionary with default value 0.0
                # Add values from v1
                for i in range(v1.indices.size):
                    values[v1.indices[i]] += v1.values[i]
                # Add values from v2
                for i in range(v2.indices.size):
                    values[v2.indices[i]] += v2.values[i]
                return Vectors.sparse(v1.size, dict(values))
            

            If you want you can try monkey patch SparseVector:

            SparseVector.__add__ = add
            v1 = Vectors.sparse(5, {0: 1.0, 2: 3.0})
            v2 = Vectors.sparse(5, {0: -3.0, 2: -3.0, 4: 10})
            v1 + v2
            ## SparseVector(5, {0: -2.0, 4: 10.0})
            

            Alternatively you should be able to use scipy.sparse.

            from scipy.sparse import csc_matrix
            from pyspark.mllib.regression import LabeledPoint
            
            m1 = csc_matrix((
               v1.values,
               (v1.indices, [0] * v1.numNonzeros())),
               shape=(v1.size, 1))
            
            m2 = csc_matrix((
               v2.values,
               (v2.indices, [0] * v2.numNonzeros())),
               shape=(v2.size, 1))
            
            LabeledPoint(0, m1 + m2)
            
            qid & accept id: (32998355, 32998468) query: Pandas: Dealing with Boolean in Pivot Table soup:

            IIUC you can just call apply and pass value_counts:

            \n
            In [13]:\ndf.ix[:,:'q3'].apply(pd.Series.value_counts)\n\nOut[13]:\n       q1  q2  q3\nTrue    4   3   4\nFalse   2   3   2\n
            \n

            As @DSM has pointed out if you have columns with all True/False then it will insert NaN for the non-existing values in which case you can call fillna(0) like so:

            \n
            df.ix[:,:'q3'].apply(pd.Series.value_counts).fillna(0)\n
            \n soup wrap:

            IIUC you can just call apply and pass value_counts:

            In [13]:
            df.ix[:,:'q3'].apply(pd.Series.value_counts)
            
            Out[13]:
                   q1  q2  q3
            True    4   3   4
            False   2   3   2
            

            As @DSM has pointed out if you have columns with all True/False then it will insert NaN for the non-existing values in which case you can call fillna(0) like so:

            df.ix[:,:'q3'].apply(pd.Series.value_counts).fillna(0)
            
            qid & accept id: (33003547, 33003827) query: How to filter through pandas pivot table soup:

            First, create a multi-indexed dataframe:

            \n
            df = pd.DataFrame({'i1': [1, 1, 1, 1], 'i2': [2, 2, 3, 3], 'i3': [4, 5, 4, 5], 'v1': [10] * 4, 'v2': [20] * 4}).set_index(['i1', 'i2', 'i3'])\n>>> df\n          v1  v2\ni1 i2 i3        \n1  2  4   10  20\n      5   10  20\n   3  4   10  20\n      5   10  20\n
            \n

            For me, the easiest way to slice this type of dataframe is to use a combination of .loc and IndexSlice. So, to slice the above df where i2=3 and i3=5:

            \n
            >>> df.loc[pd.IndexSlice[:, 3, 5], :]\n\n          v1  v2\ni1 i2 i3        \n1  3  5   10  20\n
            \n

            The : inside IndexSlice signifies to select all rows of i1. The very last : inside the loc function signifies to select all columns in the dataframe (v1 and v2).

            \n soup wrap:

            First, create a multi-indexed dataframe:

            df = pd.DataFrame({'i1': [1, 1, 1, 1], 'i2': [2, 2, 3, 3], 'i3': [4, 5, 4, 5], 'v1': [10] * 4, 'v2': [20] * 4}).set_index(['i1', 'i2', 'i3'])
            >>> df
                      v1  v2
            i1 i2 i3        
            1  2  4   10  20
                  5   10  20
               3  4   10  20
                  5   10  20
            

            For me, the easiest way to slice this type of dataframe is to use a combination of .loc and IndexSlice. So, to slice the above df where i2=3 and i3=5:

            >>> df.loc[pd.IndexSlice[:, 3, 5], :]
            
                      v1  v2
            i1 i2 i3        
            1  3  5   10  20
            

            The : inside IndexSlice signifies to select all rows of i1. The very last : inside the loc function signifies to select all columns in the dataframe (v1 and v2).

            qid & accept id: (33010861, 33011094) query: Removing repetitive lists in a list of list soup:

            This is because list is not hashable.\nYou need to convert the elements of your list in tuples (which are hashable) before applying set().

            \n
            >>> my_list = [[1, 2], [1, 2], [3, 4]]\n>>> result = [list(el) for el in set(tuple(el) for el in my_list)]\n[[1, 2], [3, 4]]\n
            \n

            Updated with your new data:

            \n
            >>> [list(list(y) for y in el) \n        for el in set([tuple(tuple(x) for x in el) for el in my_list])]\n\n[[[26, 28, 80.0], [25, 40, 80.0]],\n [[10, 12, 80.0]],\n [[40, 42, 80.0], [40, 41, 80.0]],\n [[44, 45, 80.0]],\n [[5, 10, 80.0], [6, 9, 80.0], [5, 8, 80.0]],\n [[22, 24, 80.0]],\n [[14, 16, 80.0], [13, 20, 81.0]],\n [[2, 5, 71.1], [1, 3, 70.0]]]\n
            \n soup wrap:

            This is because list is not hashable. You need to convert the elements of your list in tuples (which are hashable) before applying set().

            >>> my_list = [[1, 2], [1, 2], [3, 4]]
            >>> result = [list(el) for el in set(tuple(el) for el in my_list)]
            [[1, 2], [3, 4]]
            

            Updated with your new data:

            >>> [list(list(y) for y in el) 
                    for el in set([tuple(tuple(x) for x in el) for el in my_list])]
            
            [[[26, 28, 80.0], [25, 40, 80.0]],
             [[10, 12, 80.0]],
             [[40, 42, 80.0], [40, 41, 80.0]],
             [[44, 45, 80.0]],
             [[5, 10, 80.0], [6, 9, 80.0], [5, 8, 80.0]],
             [[22, 24, 80.0]],
             [[14, 16, 80.0], [13, 20, 81.0]],
             [[2, 5, 71.1], [1, 3, 70.0]]]
            
            qid & accept id: (33019600, 33019885) query: Python selenium and fuzzy matching soup:

            have you tried using a regular expression?? Python regex to match the third line, or even using pythons builtin .find() method. Since you're using selenium you can find all the options elements, iterate over each element, check the text of each element, and compare it to your string.

            \n

            For example

            \n
            elem = browser.find_elements_by_tag_name("option") \nfor ele in elem:\n  if ele.get_attribute("innerHTML").find('Red') > -1 and ele.get_attribute("innerHTML").find('wolly') > -1 and ele.get_attribute("innerHTML").find('small') > -1 and ele.get_attribute("innerHTML").find('small') > -1:\n    #TODO\n
            \n

            However that gets kind of long so I would use a regex, for example:

            \n
            import re\nelem = browser.find_elements_by_tag_name("option") \nfor ele in elem:\n  m = re.search(r'(Red,.+wooly,.+small,.+UK)', ele.get_attribute("innerHTML"))\n  if m:\n    print m.group(1)\n
            \n

            if .get_attribute("innerHTML") doesn't get the inner text try .text()

            \n soup wrap:

            have you tried using a regular expression?? Python regex to match the third line, or even using pythons builtin .find() method. Since you're using selenium you can find all the options elements, iterate over each element, check the text of each element, and compare it to your string.

            For example

            elem = browser.find_elements_by_tag_name("option") 
            for ele in elem:
              if ele.get_attribute("innerHTML").find('Red') > -1 and ele.get_attribute("innerHTML").find('wolly') > -1 and ele.get_attribute("innerHTML").find('small') > -1 and ele.get_attribute("innerHTML").find('small') > -1:
                #TODO
            

            However that gets kind of long so I would use a regex, for example:

            import re
            elem = browser.find_elements_by_tag_name("option") 
            for ele in elem:
              m = re.search(r'(Red,.+wooly,.+small,.+UK)', ele.get_attribute("innerHTML"))
              if m:
                print m.group(1)
            

            if .get_attribute("innerHTML") doesn't get the inner text try .text()

            qid & accept id: (33021916, 33041328) query: Transform QuadgramCollationFinder into PentagramCollationFinder soup:

            Building the patterns seems to be of some concern, so here is some code which builds all legal i-patterns, and n-patterns to be used.

            \n
            import collections\n\ndef make_ngram_ipatterns(n):\n    """Make all needed patterns used by *gramCollocationFinder up to n words"""\n\n    i_patterns = []\n\n    for i in xrange(1, n+1):\n        if i <= 2:\n            i_patterns.append('i' * i)\n\n        else:\n            for j in xrange(2**(i-2)):\n                 bin_str = '{0:0{1}b}'.format(j, i-2)\n                 ix_pattern = bin_str.replace('0', 'x').replace('1', 'i')\n                 i_patterns.append('i{}i'.format(ix_pattern))\n\n    return i_patterns\n\ndef make_ngram_npatterns(n):\n    """Make all needed n-patterings used by *gramCollocationFinder up to n words"""\n    all_ipatterns = make_ngram_ipatterns(n)\n\n    npatterns = []\n\n    for ipattern in all_ipatterns:\n         i_order = sum(c == 'i' for c in ipattern)\n         i_length = len(ipattern)\n         for j in xrange(n - i_length+1):\n             npattern = 'n_{}{}{}'.format('x'* j,\n                                           ipattern ,\n                                           'x'* (n - i_length - j))\n\n             npatterns.append((i_order, ipattern, npattern))\n\n    return sorted(npatterns)\n\n\ndef main():\n\n    n = 5\n\n    all_ipatterns = make_ngram_ipatterns(n)\n\n    print '\n'.join(make_ngram_ipatterns(n))\n\n    for order, ipattern, npattern in make_ngram_npatterns(n):\n         wparams = ', '.join('w{}'.format(i+1)\n                                for i, c in enumerate(npattern[2:])\n                                if c == 'i'\n                            )\n         print('order: {1:2}   ipattern: {2:{0}s}   npattern: {3}'\n               ' ->  {3} = self.{2}({4})'.format(\n                   n, order, ipattern, npattern, wparams))\n\n\nif __name__ == '__main__':\n    main()\n
            \n

            Output for n=5 as it stands are:

            \n
            i\nii\nixi\niii\nixxi\nixii\niixi\niiii\nixxxi\nixxii\nixixi\nixiii\niixxi\niixii\niiixi\niiiii\norder:  1   ipattern: i       npattern: n_ixxxx ->  n_ixxxx = self.i(w1)\norder:  1   ipattern: i       npattern: n_xixxx ->  n_xixxx = self.i(w2)\norder:  1   ipattern: i       npattern: n_xxixx ->  n_xxixx = self.i(w3)\norder:  1   ipattern: i       npattern: n_xxxix ->  n_xxxix = self.i(w4)\norder:  1   ipattern: i       npattern: n_xxxxi ->  n_xxxxi = self.i(w5)\norder:  2   ipattern: ii      npattern: n_iixxx ->  n_iixxx = self.ii(w1, w2)\norder:  2   ipattern: ii      npattern: n_xiixx ->  n_xiixx = self.ii(w2, w3)\norder:  2   ipattern: ii      npattern: n_xxiix ->  n_xxiix = self.ii(w3, w4)\norder:  2   ipattern: ii      npattern: n_xxxii ->  n_xxxii = self.ii(w4, w5)\norder:  2   ipattern: ixi     npattern: n_ixixx ->  n_ixixx = self.ixi(w1, w3)\norder:  2   ipattern: ixi     npattern: n_xixix ->  n_xixix = self.ixi(w2, w4)\norder:  2   ipattern: ixi     npattern: n_xxixi ->  n_xxixi = self.ixi(w3, w5)\norder:  2   ipattern: ixxi    npattern: n_ixxix ->  n_ixxix = self.ixxi(w1, w4)\norder:  2   ipattern: ixxi    npattern: n_xixxi ->  n_xixxi = self.ixxi(w2, w5)\norder:  2   ipattern: ixxxi   npattern: n_ixxxi ->  n_ixxxi = self.ixxxi(w1, w5)\norder:  3   ipattern: iii     npattern: n_iiixx ->  n_iiixx = self.iii(w1, w2, w3)\norder:  3   ipattern: iii     npattern: n_xiiix ->  n_xiiix = self.iii(w2, w3, w4)\norder:  3   ipattern: iii     npattern: n_xxiii ->  n_xxiii = self.iii(w3, w4, w5)\norder:  3   ipattern: iixi    npattern: n_iixix ->  n_iixix = self.iixi(w1, w2, w4)\norder:  3   ipattern: iixi    npattern: n_xiixi ->  n_xiixi = self.iixi(w2, w3, w5)\norder:  3   ipattern: iixxi   npattern: n_iixxi ->  n_iixxi = self.iixxi(w1, w2, w5)\norder:  3   ipattern: ixii    npattern: n_ixiix ->  n_ixiix = self.ixii(w1, w3, w4)\norder:  3   ipattern: ixii    npattern: n_xixii ->  n_xixii = self.ixii(w2, w4, w5)\norder:  3   ipattern: ixixi   npattern: n_ixixi ->  n_ixixi = self.ixixi(w1, w3, w5)\norder:  3   ipattern: ixxii   npattern: n_ixxii ->  n_ixxii = self.ixxii(w1, w4, w5)\norder:  4   ipattern: iiii    npattern: n_iiiix ->  n_iiiix = self.iiii(w1, w2, w3, w4)\norder:  4   ipattern: iiii    npattern: n_xiiii ->  n_xiiii = self.iiii(w2, w3, w4, w5)\norder:  4   ipattern: iiixi   npattern: n_iiixi ->  n_iiixi = self.iiixi(w1, w2, w3, w5)\norder:  4   ipattern: iixii   npattern: n_iixii ->  n_iixii = self.iixii(w1, w2, w4, w5)\norder:  4   ipattern: ixiii   npattern: n_ixiii ->  n_ixiii = self.ixiii(w1, w3, w4, w5)\norder:  5   ipattern: iiiii   npattern: n_iiiii ->  n_iiiii = self.iiiii(w1, w2, w3, w4, w5)\n
            \n

            Changing to a new dimension is now a matter of using and setting all i-patterns as a lower order class, replacing the n-patterns, and collating all n-patterns of same order into score_fn() sets.

            \n

            Edit: Completed the setting of the n-patterns with appropriate w#'s

            \n soup wrap:

            Building the patterns seems to be of some concern, so here is some code which builds all legal i-patterns, and n-patterns to be used.

            import collections
            
            def make_ngram_ipatterns(n):
                """Make all needed patterns used by *gramCollocationFinder up to n words"""
            
                i_patterns = []
            
                for i in xrange(1, n+1):
                    if i <= 2:
                        i_patterns.append('i' * i)
            
                    else:
                        for j in xrange(2**(i-2)):
                             bin_str = '{0:0{1}b}'.format(j, i-2)
                             ix_pattern = bin_str.replace('0', 'x').replace('1', 'i')
                             i_patterns.append('i{}i'.format(ix_pattern))
            
                return i_patterns
            
            def make_ngram_npatterns(n):
                """Make all needed n-patterings used by *gramCollocationFinder up to n words"""
                all_ipatterns = make_ngram_ipatterns(n)
            
                npatterns = []
            
                for ipattern in all_ipatterns:
                     i_order = sum(c == 'i' for c in ipattern)
                     i_length = len(ipattern)
                     for j in xrange(n - i_length+1):
                         npattern = 'n_{}{}{}'.format('x'* j,
                                                       ipattern ,
                                                       'x'* (n - i_length - j))
            
                         npatterns.append((i_order, ipattern, npattern))
            
                return sorted(npatterns)
            
            
            def main():
            
                n = 5
            
                all_ipatterns = make_ngram_ipatterns(n)
            
                print '\n'.join(make_ngram_ipatterns(n))
            
                for order, ipattern, npattern in make_ngram_npatterns(n):
                     wparams = ', '.join('w{}'.format(i+1)
                                            for i, c in enumerate(npattern[2:])
                                            if c == 'i'
                                        )
                     print('order: {1:2}   ipattern: {2:{0}s}   npattern: {3}'
                           ' ->  {3} = self.{2}({4})'.format(
                               n, order, ipattern, npattern, wparams))
            
            
            if __name__ == '__main__':
                main()
            

            Output for n=5 as it stands are:

            i
            ii
            ixi
            iii
            ixxi
            ixii
            iixi
            iiii
            ixxxi
            ixxii
            ixixi
            ixiii
            iixxi
            iixii
            iiixi
            iiiii
            order:  1   ipattern: i       npattern: n_ixxxx ->  n_ixxxx = self.i(w1)
            order:  1   ipattern: i       npattern: n_xixxx ->  n_xixxx = self.i(w2)
            order:  1   ipattern: i       npattern: n_xxixx ->  n_xxixx = self.i(w3)
            order:  1   ipattern: i       npattern: n_xxxix ->  n_xxxix = self.i(w4)
            order:  1   ipattern: i       npattern: n_xxxxi ->  n_xxxxi = self.i(w5)
            order:  2   ipattern: ii      npattern: n_iixxx ->  n_iixxx = self.ii(w1, w2)
            order:  2   ipattern: ii      npattern: n_xiixx ->  n_xiixx = self.ii(w2, w3)
            order:  2   ipattern: ii      npattern: n_xxiix ->  n_xxiix = self.ii(w3, w4)
            order:  2   ipattern: ii      npattern: n_xxxii ->  n_xxxii = self.ii(w4, w5)
            order:  2   ipattern: ixi     npattern: n_ixixx ->  n_ixixx = self.ixi(w1, w3)
            order:  2   ipattern: ixi     npattern: n_xixix ->  n_xixix = self.ixi(w2, w4)
            order:  2   ipattern: ixi     npattern: n_xxixi ->  n_xxixi = self.ixi(w3, w5)
            order:  2   ipattern: ixxi    npattern: n_ixxix ->  n_ixxix = self.ixxi(w1, w4)
            order:  2   ipattern: ixxi    npattern: n_xixxi ->  n_xixxi = self.ixxi(w2, w5)
            order:  2   ipattern: ixxxi   npattern: n_ixxxi ->  n_ixxxi = self.ixxxi(w1, w5)
            order:  3   ipattern: iii     npattern: n_iiixx ->  n_iiixx = self.iii(w1, w2, w3)
            order:  3   ipattern: iii     npattern: n_xiiix ->  n_xiiix = self.iii(w2, w3, w4)
            order:  3   ipattern: iii     npattern: n_xxiii ->  n_xxiii = self.iii(w3, w4, w5)
            order:  3   ipattern: iixi    npattern: n_iixix ->  n_iixix = self.iixi(w1, w2, w4)
            order:  3   ipattern: iixi    npattern: n_xiixi ->  n_xiixi = self.iixi(w2, w3, w5)
            order:  3   ipattern: iixxi   npattern: n_iixxi ->  n_iixxi = self.iixxi(w1, w2, w5)
            order:  3   ipattern: ixii    npattern: n_ixiix ->  n_ixiix = self.ixii(w1, w3, w4)
            order:  3   ipattern: ixii    npattern: n_xixii ->  n_xixii = self.ixii(w2, w4, w5)
            order:  3   ipattern: ixixi   npattern: n_ixixi ->  n_ixixi = self.ixixi(w1, w3, w5)
            order:  3   ipattern: ixxii   npattern: n_ixxii ->  n_ixxii = self.ixxii(w1, w4, w5)
            order:  4   ipattern: iiii    npattern: n_iiiix ->  n_iiiix = self.iiii(w1, w2, w3, w4)
            order:  4   ipattern: iiii    npattern: n_xiiii ->  n_xiiii = self.iiii(w2, w3, w4, w5)
            order:  4   ipattern: iiixi   npattern: n_iiixi ->  n_iiixi = self.iiixi(w1, w2, w3, w5)
            order:  4   ipattern: iixii   npattern: n_iixii ->  n_iixii = self.iixii(w1, w2, w4, w5)
            order:  4   ipattern: ixiii   npattern: n_ixiii ->  n_ixiii = self.ixiii(w1, w3, w4, w5)
            order:  5   ipattern: iiiii   npattern: n_iiiii ->  n_iiiii = self.iiiii(w1, w2, w3, w4, w5)
            

            Changing to a new dimension is now a matter of using and setting all i-patterns as a lower order class, replacing the n-patterns, and collating all n-patterns of same order into score_fn() sets.

            Edit: Completed the setting of the n-patterns with appropriate w#'s

            qid & accept id: (33037416, 33037470) query: Returning the value of an index in a python list based on other values soup:

            You can try this:

            \n
            >>> offset = 2\n>>> aString = raw_input("digit a letter: ")\n>>> aString\n'a'\n>>> chr(ord(aString)+offset)\n'c'\n
            \n

            documentation:

            \n\n
            \n

            If you want to iterate over an entire string, a simple way is using a for loop. I assume the input string is always lowercase.

            \n

            EDIT2: I improved the solution to handle the case when a letter is 'y' or 'z' and without "rotation" should begin a not alphabetic character, eg:

            \n
            # with only offset addiction this return a non-alphabetic character\n>>> chr(ord('z')+2)\n'|'\n\n# the 'z' rotation return the letter 'b'\n>>> letter = "z"\n>>> ord_letter = ord(letter)+offset\n>>> ord_letter_rotated = ((ord_letter - 97) % 26) + 97\n>>> chr(ord_letter_rotated)\n'b'\n
            \n

            The code solution:

            \n
            offset = 2\naString = raw_input("digit the string to convert: ")\n#aString = "abz"\nnewString = ""\n\nfor letter in aString:\n    ord_letter = ord(letter)+offset\n    ord_letter_rotated = ((ord_letter - 97) % 26) + 97\n    newString += chr(ord_letter_rotated)\n\nprint newString\n
            \n

            The output of this code for the entire lowercase alphabet:

            \n
            cdefghijklmnopqrstuvwxyzab\n
            \n

            Note: you can obtain the lowercase alphabet for free also this way:

            \n
            >>> import string\n>>> string.lowercase\n'abcdefghijklmnopqrstuvwxyz'\n
            \n

            See the wikipedia page to learn something about ROT13:

            \n

            https://en.wikipedia.org/wiki/ROT13

            \n soup wrap:

            You can try this:

            >>> offset = 2
            >>> aString = raw_input("digit a letter: ")
            >>> aString
            'a'
            >>> chr(ord(aString)+offset)
            'c'
            

            documentation:


            If you want to iterate over an entire string, a simple way is using a for loop. I assume the input string is always lowercase.

            EDIT2: I improved the solution to handle the case when a letter is 'y' or 'z' and without "rotation" should begin a not alphabetic character, eg:

            # with only offset addiction this return a non-alphabetic character
            >>> chr(ord('z')+2)
            '|'
            
            # the 'z' rotation return the letter 'b'
            >>> letter = "z"
            >>> ord_letter = ord(letter)+offset
            >>> ord_letter_rotated = ((ord_letter - 97) % 26) + 97
            >>> chr(ord_letter_rotated)
            'b'
            

            The code solution:

            offset = 2
            aString = raw_input("digit the string to convert: ")
            #aString = "abz"
            newString = ""
            
            for letter in aString:
                ord_letter = ord(letter)+offset
                ord_letter_rotated = ((ord_letter - 97) % 26) + 97
                newString += chr(ord_letter_rotated)
            
            print newString
            

            The output of this code for the entire lowercase alphabet:

            cdefghijklmnopqrstuvwxyzab
            

            Note: you can obtain the lowercase alphabet for free also this way:

            >>> import string
            >>> string.lowercase
            'abcdefghijklmnopqrstuvwxyz'
            

            See the wikipedia page to learn something about ROT13:

            https://en.wikipedia.org/wiki/ROT13

            qid & accept id: (33042988, 33128660) query: How to add/remove said a curve to/from a plot in Python with Matplotlib soup:

            I found similar question. To update the curve the ydata (and xdata if different) need to be updated.

            \n

            1) assign a handle to the plot with the ydata of your curve

            \n
            self.h,=self.axes.plot(data,"-g")\n
            \n

            2) update the the ydata with your handle

            \n
            self.h.set_ydata(newdata)\n
            \n soup wrap:

            I found similar question. To update the curve the ydata (and xdata if different) need to be updated.

            1) assign a handle to the plot with the ydata of your curve

            self.h,=self.axes.plot(data,"-g")
            

            2) update the the ydata with your handle

            self.h.set_ydata(newdata)
            
            qid & accept id: (33062288, 33062350) query: Recursive List containing Lists soup:

            One way you can do this would be to use list comprehension to create the sub-list instead of [choice((0, 1))] . Example -

            \n
            from random import choice\n\ndef Number_recursive(N,initialN=None):\n    initialN = initialN or N\n    if N < 0:\n        raise ValueError('N must be positive')\n    if N == 0:\n        return []\n    return [[choice((0, 1)) for _ in range(initialN)]] + Number_recursive(N-1,initialN)\n
            \n

            Demo -

            \n
            >>> from random import choice\n>>>\n>>> def Number_recursive(N,M=None):\n...     M = M or N\n...     if N < 0:\n...         raise ValueError('N must be positive')\n...     if N == 0:\n...         return []\n...     return [[choice((0, 1)) for _ in range(M)]] + Number_recursive(N-1,M)\n...\n>>> Number_recursive(4)\n[[0, 0, 1, 0], [0, 1, 1, 1], [1, 1, 0, 0], [1, 0, 1, 0]]\n
            \n soup wrap:

            One way you can do this would be to use list comprehension to create the sub-list instead of [choice((0, 1))] . Example -

            from random import choice
            
            def Number_recursive(N,initialN=None):
                initialN = initialN or N
                if N < 0:
                    raise ValueError('N must be positive')
                if N == 0:
                    return []
                return [[choice((0, 1)) for _ in range(initialN)]] + Number_recursive(N-1,initialN)
            

            Demo -

            >>> from random import choice
            >>>
            >>> def Number_recursive(N,M=None):
            ...     M = M or N
            ...     if N < 0:
            ...         raise ValueError('N must be positive')
            ...     if N == 0:
            ...         return []
            ...     return [[choice((0, 1)) for _ in range(M)]] + Number_recursive(N-1,M)
            ...
            >>> Number_recursive(4)
            [[0, 0, 1, 0], [0, 1, 1, 1], [1, 1, 0, 0], [1, 0, 1, 0]]
            
            qid & accept id: (33069366, 33402742) query: Fast linear interpolation in Numpy / Scipy "along a path" soup:

            For a fixed point in time, you can utilize the following interpolation function:

            \n
            g(a) = cc[0]*abs(a-aa[0]) + cc[1]*abs(a-aa[1]) + cc[2]*abs(a-aa[2])\n
            \n

            where a is the hiker's altitude, aa the vector with the 3 measurement altitudes and cc is a vector with the coefficients. There are three things to note:

            \n
              \n
            1. For given temperatures (alltemps) corresponding to aa, determining cc can be done by solving a linear matrix equation using np.linalg.solve().
            2. \n
            3. g(a) is easy to vectorize for a (N,) dimensional a and (N, 3) dimensional cc (including np.linalg.solve() respectively).
            4. \n
            5. g(a) is called a first order univariate spline kernel (for three points). Using abs(a-aa[i])**(2*d-1) would change the spline order to d. This approach could be interpreted a simplified version of a Gaussian Process in Machine Learning.
            6. \n
            \n

            So the code would be:

            \n
            import matplotlib.pyplot as plt\nimport numpy as np\nimport seaborn as sns\n\n# generate temperatures\nnp.random.seed(0)\nN, sigma = 1000, 5\ntrend = np.sin(4 / N * np.arange(N)) * 30\nalltemps = np.array([tmp0 + trend + sigma*np.random.randn(N)\n                     for tmp0 in [70, 50, 40]])\n\n# generate attitudes:\naltitudes = np.array([500, 1500, 4000]).astype(float)\nlocation = np.linspace(altitudes[0], altitudes[-1], N)\n\n\ndef doit():\n    """ do the interpolation, improved version for speed """\n    AA = np.vstack([np.abs(altitudes-a_i) for a_i in altitudes])\n    # This is slighty faster than np.linalg.solve(), because AA is small:\n    cc = np.dot(np.linalg.inv(AA), alltemps)\n\n    return (cc[0]*np.abs(location-altitudes[0]) +\n            cc[1]*np.abs(location-altitudes[1]) +\n            cc[2]*np.abs(location-altitudes[2]))\n\n\nt_loc = doit()  # call interpolator\n\n# do the plotting:\nfg, ax = plt.subplots(num=1)\nfor alt, t in zip(altitudes, alltemps):\n    ax.plot(t, label="%d feet" % alt, alpha=.5)\nax.plot(t_loc, label="Interpolation")\nax.legend(loc="best", title="Altitude:")\nax.set_xlabel("Time")\nax.set_ylabel("Temperature")\nfg.canvas.draw()\n
            \n

            Measuring the time gives:

            \n
            In [2]: %timeit doit()\n10000 loops, best of 3: 107 µs per loop\n
            \n

            Update: I replaced the original list comprehensions in doit()\nto import speed by 30% (For N=1000).

            \n

            Furthermore, as requested for comparison, @moarningsun's benchmark code block on my machine:

            \n
            10 loops, best of 3: 110 ms per loop  \ninterp_checked\n10000 loops, best of 3: 83.9 µs per loop\nscipy_interpn\n1000 loops, best of 3: 678 µs per loop\nOutput allclose:\n[True, True, True]\n
            \n

            Note that N=1000 is a relatively small number. Using N=100000 produces the results:

            \n
            interp_checked\n100 loops, best of 3: 8.37 ms per loop\n\n%timeit doit()\n100 loops, best of 3: 5.31 ms per loop\n
            \n

            This shows that this approach scales better for large N than the interp_checked approach.

            \n soup wrap:

            For a fixed point in time, you can utilize the following interpolation function:

            g(a) = cc[0]*abs(a-aa[0]) + cc[1]*abs(a-aa[1]) + cc[2]*abs(a-aa[2])
            

            where a is the hiker's altitude, aa the vector with the 3 measurement altitudes and cc is a vector with the coefficients. There are three things to note:

            1. For given temperatures (alltemps) corresponding to aa, determining cc can be done by solving a linear matrix equation using np.linalg.solve().
            2. g(a) is easy to vectorize for a (N,) dimensional a and (N, 3) dimensional cc (including np.linalg.solve() respectively).
            3. g(a) is called a first order univariate spline kernel (for three points). Using abs(a-aa[i])**(2*d-1) would change the spline order to d. This approach could be interpreted a simplified version of a Gaussian Process in Machine Learning.

            So the code would be:

            import matplotlib.pyplot as plt
            import numpy as np
            import seaborn as sns
            
            # generate temperatures
            np.random.seed(0)
            N, sigma = 1000, 5
            trend = np.sin(4 / N * np.arange(N)) * 30
            alltemps = np.array([tmp0 + trend + sigma*np.random.randn(N)
                                 for tmp0 in [70, 50, 40]])
            
            # generate attitudes:
            altitudes = np.array([500, 1500, 4000]).astype(float)
            location = np.linspace(altitudes[0], altitudes[-1], N)
            
            
            def doit():
                """ do the interpolation, improved version for speed """
                AA = np.vstack([np.abs(altitudes-a_i) for a_i in altitudes])
                # This is slighty faster than np.linalg.solve(), because AA is small:
                cc = np.dot(np.linalg.inv(AA), alltemps)
            
                return (cc[0]*np.abs(location-altitudes[0]) +
                        cc[1]*np.abs(location-altitudes[1]) +
                        cc[2]*np.abs(location-altitudes[2]))
            
            
            t_loc = doit()  # call interpolator
            
            # do the plotting:
            fg, ax = plt.subplots(num=1)
            for alt, t in zip(altitudes, alltemps):
                ax.plot(t, label="%d feet" % alt, alpha=.5)
            ax.plot(t_loc, label="Interpolation")
            ax.legend(loc="best", title="Altitude:")
            ax.set_xlabel("Time")
            ax.set_ylabel("Temperature")
            fg.canvas.draw()
            

            Measuring the time gives:

            In [2]: %timeit doit()
            10000 loops, best of 3: 107 µs per loop
            

            Update: I replaced the original list comprehensions in doit() to import speed by 30% (For N=1000).

            Furthermore, as requested for comparison, @moarningsun's benchmark code block on my machine:

            10 loops, best of 3: 110 ms per loop  
            interp_checked
            10000 loops, best of 3: 83.9 µs per loop
            scipy_interpn
            1000 loops, best of 3: 678 µs per loop
            Output allclose:
            [True, True, True]
            

            Note that N=1000 is a relatively small number. Using N=100000 produces the results:

            interp_checked
            100 loops, best of 3: 8.37 ms per loop
            
            %timeit doit()
            100 loops, best of 3: 5.31 ms per loop
            

            This shows that this approach scales better for large N than the interp_checked approach.

            qid & accept id: (33099417, 33099902) query: Replace xml tag contents using python soup:

            UPDATE - XML PARSER IMPLEMENTATION : since replace a specific tag require to modify the regex i'm providing a more general and safer alternative implementation based upon ElementTree parser (as stated above by @stribizhev and @Saket Mittal).

            \n

            I've to add a root element (to make a valid xml doc, requiring root element), i've also chosen to filter the location to edit by the tag (but may be everyfield):

            \n
            #!/usr/bin/python\n# Alternative Implementation with ElementTree XML Parser\n\nxml = '''\\n\n    \n        Raja\n        \n            ABC\n            123\n            XYZ\n        \n        100\n        temp\n    \n    \n        GsusRecovery\n        \n            Torino\n            456\n            UVW\n        \n        120\n        perm\n    \n\n'''\n\nfrom xml.etree import ElementTree as ET\n# tree = ET.parse('input.xml')  # decomment to parse xml from file\ntree = ET.ElementTree(ET.fromstring(xml))\nroot = tree.getroot()\n\nfor location in root.iter('Location'):\n    if location.find('city').text == 'Torino':\n        location.set("isupdated", "1")\n        location.find('city').text = 'MyCity'\n        location.find('geocode').text = '10.12'\n        location.find('state').text = 'MyState'\n\nprint ET.tostring(root, encoding='utf8', method='xml')\n# tree.write('output.xml') # decomment if you want to write to file\n
            \n

            Online runnable version of the code here

            \n

            PREVIOUS REGEX IMPLEMENTATION

            \n

            This is a possible implementation using the lazy modifier .*? and dot all (?s):

            \n
            #!/usr/bin/python\n\nimport re\n\nxml = '''\\n\nRaja\n\n     ABC\n     123\n     XYZ\n\n'''\n\nlocUpdate = '''\\n    \n         MyCity\n         10.12\n         MyState\n    '''\n\noutput = re.sub(r"(?s).*?", r"%s" % locUpdate, xml)\n\nprint output\n
            \n

            You can test the code online here

            \n

            Caveat: if there are more than one tag in the xml input the regex replace them all with locUpdate. You have to use:

            \n
            # (note the last ``1`` at the end to limit the substitution only to the first occurrence)\noutput = re.sub(r"(?s).*?", r"%s" % locUpdate, xml, 1)\n
            \n soup wrap:

            UPDATE - XML PARSER IMPLEMENTATION : since replace a specific tag require to modify the regex i'm providing a more general and safer alternative implementation based upon ElementTree parser (as stated above by @stribizhev and @Saket Mittal).

            I've to add a root element (to make a valid xml doc, requiring root element), i've also chosen to filter the location to edit by the tag (but may be everyfield):

            #!/usr/bin/python
            # Alternative Implementation with ElementTree XML Parser
            
            xml = '''\
            
                
                    Raja
                    
                        ABC
                        123
                        XYZ
                    
                    100
                    temp
                
                
                    GsusRecovery
                    
                        Torino
                        456
                        UVW
                    
                    120
                    perm
                
            
            '''
            
            from xml.etree import ElementTree as ET
            # tree = ET.parse('input.xml')  # decomment to parse xml from file
            tree = ET.ElementTree(ET.fromstring(xml))
            root = tree.getroot()
            
            for location in root.iter('Location'):
                if location.find('city').text == 'Torino':
                    location.set("isupdated", "1")
                    location.find('city').text = 'MyCity'
                    location.find('geocode').text = '10.12'
                    location.find('state').text = 'MyState'
            
            print ET.tostring(root, encoding='utf8', method='xml')
            # tree.write('output.xml') # decomment if you want to write to file
            

            Online runnable version of the code here

            PREVIOUS REGEX IMPLEMENTATION

            This is a possible implementation using the lazy modifier .*? and dot all (?s):

            #!/usr/bin/python
            
            import re
            
            xml = '''\
            
            Raja
            
                 ABC
                 123
                 XYZ
            
            '''
            
            locUpdate = '''\
                
                     MyCity
                     10.12
                     MyState
                '''
            
            output = re.sub(r"(?s).*?", r"%s" % locUpdate, xml)
            
            print output
            

            You can test the code online here

            Caveat: if there are more than one tag in the xml input the regex replace them all with locUpdate. You have to use:

            # (note the last ``1`` at the end to limit the substitution only to the first occurrence)
            output = re.sub(r"(?s).*?", r"%s" % locUpdate, xml, 1)
            
            qid & accept id: (33103802, 33106573) query: Re-Construct a png image from a GDK Pixbuf soup:

            If you want to send a Pixbuf over a socket you have to send all data, not just the pixels. The BytesIO object is not necessary as Numpy arrays have a tostring() method.

            \n

            It would be easier/make more sense to send a PNG instead of sending raw data and encode it at the receiving end to a PNG image. Here a BytesIO object actually is necessary to avoid a temporary file. Sending side:

            \n
            screen = ScreenShot()\nimage = screen.get_screenshot()\npng_file = BytesIO()\nimage.save_to_callback(png_file.write)\ndata = png_file.getvalue()\n
            \n

            Then send data over the socket and on the receiving side simply save it:

            \n
            with open('result.png', 'wb') as png_file:\n    png_file.write(data)\n
            \n soup wrap:

            If you want to send a Pixbuf over a socket you have to send all data, not just the pixels. The BytesIO object is not necessary as Numpy arrays have a tostring() method.

            It would be easier/make more sense to send a PNG instead of sending raw data and encode it at the receiving end to a PNG image. Here a BytesIO object actually is necessary to avoid a temporary file. Sending side:

            screen = ScreenShot()
            image = screen.get_screenshot()
            png_file = BytesIO()
            image.save_to_callback(png_file.write)
            data = png_file.getvalue()
            

            Then send data over the socket and on the receiving side simply save it:

            with open('result.png', 'wb') as png_file:
                png_file.write(data)
            
            qid & accept id: (33105265, 33105810) query: Python Regex: Optional White Space Around Matching Group soup:
            text = 'user = bob'\na = re.match(r'(?P.*?) ?(?PNOT LIKE|LIKE|<=>|>=|<=|!=|<>|=|>|<) ?(?P.*)',text)\nprint a.group()\n
            \n

            Output:

            \n
            user = bob\n
            \n

            if you want spaces to be part of your second group. You could do below.

            \n
            a = re.match(r'(?P.*?)(?P ?[NOT LIKE|LIKE|<=>|>=|<=|!=|<>|=|>|<] ?)(?P.*)',text)\n
            \n

            a.group(2)

            \n

            Output:

            \n
             = \n
            \n

            Since you mentioned whitespace(space, tab etc.) you can replace space with \s

            \n soup wrap:
            text = 'user = bob'
            a = re.match(r'(?P.*?) ?(?PNOT LIKE|LIKE|<=>|>=|<=|!=|<>|=|>|<) ?(?P.*)',text)
            print a.group()
            

            Output:

            user = bob
            

            if you want spaces to be part of your second group. You could do below.

            a = re.match(r'(?P.*?)(?P ?[NOT LIKE|LIKE|<=>|>=|<=|!=|<>|=|>|<] ?)(?P.*)',text)
            

            a.group(2)

            Output:

             = 
            

            Since you mentioned whitespace(space, tab etc.) you can replace space with \s

            qid & accept id: (33139927, 33140223) query: Running program/function in background in Python soup:

            As I understand it, your goal is to start a background process in subprocess but not have the main program wait for it to finish. Here is an example program that does that:

            \n
            $ cat script.py\nimport subprocess\nsubprocess.Popen("sleep 3; echo 'Done!';", shell=True)\n
            \n

            Here is an example of that program in operation:

            \n
            $ python script.py\n$ \n$ \n$ Done!\n
            \n

            As you can see, the shell script continued to run after python exited.

            \n

            subprocess has many options and you will want to customize the subprocess call to your needs.

            \n

            In some cases, a child process that lives after its parent exits may leave a zombie process. For instructions on how to avoid that, see here.

            \n

            The alternative: making subprocess wait

            \n

            If you want the opposite to happen with python waiting for subprocess to complete, look at this program:

            \n
            $ cat script.py\nimport subprocess\np = subprocess.Popen("sleep 3; echo 'Done!';", shell=True)\np.wait()\n
            \n

            Here is an example its output:

            \n
            $ python script.py\nDone!\n$ \n$ \n
            \n

            As you can see, because we called wait, python waited until subprocess completed before exiting.

            \n soup wrap:

            As I understand it, your goal is to start a background process in subprocess but not have the main program wait for it to finish. Here is an example program that does that:

            $ cat script.py
            import subprocess
            subprocess.Popen("sleep 3; echo 'Done!';", shell=True)
            

            Here is an example of that program in operation:

            $ python script.py
            $ 
            $ 
            $ Done!
            

            As you can see, the shell script continued to run after python exited.

            subprocess has many options and you will want to customize the subprocess call to your needs.

            In some cases, a child process that lives after its parent exits may leave a zombie process. For instructions on how to avoid that, see here.

            The alternative: making subprocess wait

            If you want the opposite to happen with python waiting for subprocess to complete, look at this program:

            $ cat script.py
            import subprocess
            p = subprocess.Popen("sleep 3; echo 'Done!';", shell=True)
            p.wait()
            

            Here is an example its output:

            $ python script.py
            Done!
            $ 
            $ 
            

            As you can see, because we called wait, python waited until subprocess completed before exiting.

            qid & accept id: (33140945, 33155399) query: Grammar rule extraction from parsed result soup:

            First to navigate a tree, see How to iterate through all nodes of a tree? and How to navigate a nltk.tree.Tree? :

            \n
            >>> from nltk.tree import Tree\n>>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))"\n>>> ptree = Tree.fromstring(bracket_parse)\n>>> ptree\nTree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])])\n>>> for subtree in ptree.subtrees():\n...     print subtree\n... \n(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))\n(VP (VB get) (NP (PRP me)) (ADVP (RB now)))\n(VB get)\n(NP (PRP me))\n(PRP me)\n(ADVP (RB now))\n(RB now)\n
            \n

            And what you're looking for is https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341:

            \n
            >>> ptree.productions()\n[S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now']\n
            \n

            Note that Tree.productions() returns a Production object, see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22 and https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236.

            \n

            If you want a string form of the grammar rules, you can either do:

            \n
            >>> for rule in ptree.productions():\n...     print rule\n... \nS -> VP\nVP -> VB NP ADVP\nVB -> 'get'\nNP -> PRP\nPRP -> 'me'\nADVP -> RB\nRB -> 'now'\n
            \n

            Or

            \n
            >>> rules = [str(p) for p in ptree.productions()]\n>>> rules\n['S -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"]\n
            \n soup wrap:

            First to navigate a tree, see How to iterate through all nodes of a tree? and How to navigate a nltk.tree.Tree? :

            >>> from nltk.tree import Tree
            >>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))"
            >>> ptree = Tree.fromstring(bracket_parse)
            >>> ptree
            Tree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])])
            >>> for subtree in ptree.subtrees():
            ...     print subtree
            ... 
            (S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
            (VP (VB get) (NP (PRP me)) (ADVP (RB now)))
            (VB get)
            (NP (PRP me))
            (PRP me)
            (ADVP (RB now))
            (RB now)
            

            And what you're looking for is https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341:

            >>> ptree.productions()
            [S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now']
            

            Note that Tree.productions() returns a Production object, see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22 and https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236.

            If you want a string form of the grammar rules, you can either do:

            >>> for rule in ptree.productions():
            ...     print rule
            ... 
            S -> VP
            VP -> VB NP ADVP
            VB -> 'get'
            NP -> PRP
            PRP -> 'me'
            ADVP -> RB
            RB -> 'now'
            

            Or

            >>> rules = [str(p) for p in ptree.productions()]
            >>> rules
            ['S -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"]
            
            qid & accept id: (33144460, 33144865) query: Create a list property in Python soup:
            import collections\n\n\nclass PrivateList(collections.MutableSequence):\n    def __init__(self, initial=None):\n        self._list = initial or []\n\n    def __repr__(self):\n        return repr(self._list)\n\n    def __getitem__(self, item):\n        print("Accessed element {}".format(item))\n        return self._list[item]\n\n    def __setitem__(self, key, value):\n        print("Set element {} to {}".format(key, value))\n        self._list[key] = value\n\n    def __delitem__(self, key):\n        print("Deleting element {}".format(key))\n        del self._list[key]\n\n    def __len__(self):\n        print("Getting length")\n        return len(self._list)\n\n    def insert(self, index, item):\n        print("Inserting item {} at {}".format(item, index))\n        self._list.insert(index, item)\n\n\nclass Foo(object):\n    def __init__(self, a_list):\n        self.list = PrivateList(a_list)\n
            \n

            Then runnning this:

            \n
            foo = Foo([1,2,3])\nprint(foo.list)\nprint(foo.list[1])\nfoo.list[1] = 12\nprint(foo.list)\n
            \n

            Outputs:

            \n
            [1, 2, 3]\nAccessed element 1\n2\nSet element 1 to 12\n[1, 12, 3]\n
            \n soup wrap:
            import collections
            
            
            class PrivateList(collections.MutableSequence):
                def __init__(self, initial=None):
                    self._list = initial or []
            
                def __repr__(self):
                    return repr(self._list)
            
                def __getitem__(self, item):
                    print("Accessed element {}".format(item))
                    return self._list[item]
            
                def __setitem__(self, key, value):
                    print("Set element {} to {}".format(key, value))
                    self._list[key] = value
            
                def __delitem__(self, key):
                    print("Deleting element {}".format(key))
                    del self._list[key]
            
                def __len__(self):
                    print("Getting length")
                    return len(self._list)
            
                def insert(self, index, item):
                    print("Inserting item {} at {}".format(item, index))
                    self._list.insert(index, item)
            
            
            class Foo(object):
                def __init__(self, a_list):
                    self.list = PrivateList(a_list)
            

            Then runnning this:

            foo = Foo([1,2,3])
            print(foo.list)
            print(foo.list[1])
            foo.list[1] = 12
            print(foo.list)
            

            Outputs:

            [1, 2, 3]
            Accessed element 1
            2
            Set element 1 to 12
            [1, 12, 3]
            
            qid & accept id: (33201383, 33208413) query: Pygame- Sprite set position with mouseclick soup:

            You store the current position of the Sprite already in self.rect, so you don't need x_start_position and y_start_position.

            \n

            If you want to store the original starting position you used when creating the Sprite, you'll have to create a member in the initializer:

            \n
            #TODO: respect naming convention\nclass sprite_to_place(pygame.sprite.Sprite):\n    # you can use a single parameter instead of two\n    def __init__(self, pos):\n        pygame.sprite.Sprite.__init__(self)\n        self.image = pygame.image.load("a_picture.png")\n        # you can pass the position directly to get_rect to set it's position\n        self.rect = self.image.get_rect(topleft=pos)\n        # I don't know if you actually need this\n        self.start_pos = pos\n
            \n

            Then in update:

            \n
            def update(self): \n    # current position is self.rect.topleft\n    # starting position is self.start_pos\n    # to move the Sprite/Rect, you can also use the move functions\n    self.rect.move_ip(10, 20) # moves the Sprite 10px vertically and 20px horizontally\n
            \n soup wrap:

            You store the current position of the Sprite already in self.rect, so you don't need x_start_position and y_start_position.

            If you want to store the original starting position you used when creating the Sprite, you'll have to create a member in the initializer:

            #TODO: respect naming convention
            class sprite_to_place(pygame.sprite.Sprite):
                # you can use a single parameter instead of two
                def __init__(self, pos):
                    pygame.sprite.Sprite.__init__(self)
                    self.image = pygame.image.load("a_picture.png")
                    # you can pass the position directly to get_rect to set it's position
                    self.rect = self.image.get_rect(topleft=pos)
                    # I don't know if you actually need this
                    self.start_pos = pos
            

            Then in update:

            def update(self): 
                # current position is self.rect.topleft
                # starting position is self.start_pos
                # to move the Sprite/Rect, you can also use the move functions
                self.rect.move_ip(10, 20) # moves the Sprite 10px vertically and 20px horizontally
            
            qid & accept id: (33203746, 33203873) query: Get header row in pandas dataframe soup:

            You can just do:

            \n
            In [6]:\n' '.join(df)\n\nOut[6]:\n'Col_A Col_B Col_C'\n
            \n

            This works because the iterable returned from a df are the columns which are strings so you can just join them with your separator.

            \n

            EDIT

            \n

            If you want to get exactly what your header was stored then you can do the following:

            \n
            In [8]:\ndf = pd.read_table(io.StringIO(t), skiprows=3, header=None, nrows=1)\ndf\n\nOut[8]:\n                        0\n0  Col_A    Col_B   Col_C\n\nIn [10]:\ndf.iloc[0][0]\n\nOut[10]:\n'Col_A    Col_B   Col_C'\n
            \n

            So this doesn't specify a separator so it will look for commas which there are none so the entire header row is read as a single column value, you can then get just the row value by indexing it as shown above

            \n soup wrap:

            You can just do:

            In [6]:
            ' '.join(df)
            
            Out[6]:
            'Col_A Col_B Col_C'
            

            This works because the iterable returned from a df are the columns which are strings so you can just join them with your separator.

            EDIT

            If you want to get exactly what your header was stored then you can do the following:

            In [8]:
            df = pd.read_table(io.StringIO(t), skiprows=3, header=None, nrows=1)
            df
            
            Out[8]:
                                    0
            0  Col_A    Col_B   Col_C
            
            In [10]:
            df.iloc[0][0]
            
            Out[10]:
            'Col_A    Col_B   Col_C'
            

            So this doesn't specify a separator so it will look for commas which there are none so the entire header row is read as a single column value, you can then get just the row value by indexing it as shown above

            qid & accept id: (33223682, 33223727) query: How to decode() with a subset of 'ascii'? soup:

            in python 2:

            \n
            def test_if_ascii(text):\n    if isinstance(test, unicode):\n        raise TypeError('hey man, dont feed me unicode plz')\n    return all(32 <= ord(c) <= 126 for c in text)\n
            \n

            in python 3 almost the same, just unicode is call 'str', and bytes are called 'bytes'

            \n
            def test_if_ascii(text):\n    if isinstance(test, str):\n        raise TypeError('hey man, dont feed me unicode plz')\n    return all(32 <= ord(c) <= 126 for c in text)\n
            \n soup wrap:

            in python 2:

            def test_if_ascii(text):
                if isinstance(test, unicode):
                    raise TypeError('hey man, dont feed me unicode plz')
                return all(32 <= ord(c) <= 126 for c in text)
            

            in python 3 almost the same, just unicode is call 'str', and bytes are called 'bytes'

            def test_if_ascii(text):
                if isinstance(test, str):
                    raise TypeError('hey man, dont feed me unicode plz')
                return all(32 <= ord(c) <= 126 for c in text)
            
            qid & accept id: (33252350, 33256649) query: Sorting list of dictionaries with primary key from list of keywords and alphabetical order as secondary key soup:

            You can use the in-operator to find out whether a substring is contained in another string).

            \n
            \n

            For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. [...] Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

            \n
            \n

            You can use this to implement your keyword sorting key.

            \n

            You'd use the approach given in the other answer (pass a tuple as key) to implement the alphabetical sorting as a secondary key.

            \n

            Here's an example:

            \n
            import pprint\n\n# Define the keywords I want to see first\npreferred_projects = ['one', 'two', 'three']\n\n# example data\nAllMyProjectsFromaDatasource = [{ 'name': 'project two', 'id': 5, 'otherkey': 'othervalue'},\n                                { 'name': 'project three', 'id': 1, 'otherkey': 'othervalue'},\n                                { 'name': 'project one', 'id': 3, 'otherkey': 'othervalue'},\n                                { 'name': 'abc project', 'id': 6, 'otherkey': 'othervalue'},\n                                { 'name': 'one project', 'id': 9, 'otherkey': 'othervalue'}\n                               ]    \n\ndef keyfunc(x):\n    # keyword primary key\n    # (add index to list comprehension when keyword is in name)\n    preferred_key = [float(idx) \n                     for idx, i in enumerate(preferred_projects)\n                     if i in x['name']]\n    # found at least one match in preferred keywords, use first if any, else infinity\n    keyword_sortkey = preferred_key[0] if preferred_key else float('inf')\n    # return tuple to sort according to primary and secondary key\n    return keyword_sortkey, x['name']\n\nAllMyProjectsFromaDatasource.sort(key=keyfunc)\n\npprint.pprint(AllMyProjectsFromaDatasource)\n
            \n

            The output is:

            \n
            [{'id': 9, 'name': 'one project', 'otherkey': 'othervalue'},\n {'id': 3, 'name': 'project one', 'otherkey': 'othervalue'},\n {'id': 5, 'name': 'project two', 'otherkey': 'othervalue'},\n {'id': 1, 'name': 'project three', 'otherkey': 'othervalue'},\n {'id': 6, 'name': 'abc project', 'otherkey': 'othervalue'}]\n
            \n soup wrap:

            You can use the in-operator to find out whether a substring is contained in another string).

            For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. [...] Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

            You can use this to implement your keyword sorting key.

            You'd use the approach given in the other answer (pass a tuple as key) to implement the alphabetical sorting as a secondary key.

            Here's an example:

            import pprint
            
            # Define the keywords I want to see first
            preferred_projects = ['one', 'two', 'three']
            
            # example data
            AllMyProjectsFromaDatasource = [{ 'name': 'project two', 'id': 5, 'otherkey': 'othervalue'},
                                            { 'name': 'project three', 'id': 1, 'otherkey': 'othervalue'},
                                            { 'name': 'project one', 'id': 3, 'otherkey': 'othervalue'},
                                            { 'name': 'abc project', 'id': 6, 'otherkey': 'othervalue'},
                                            { 'name': 'one project', 'id': 9, 'otherkey': 'othervalue'}
                                           ]    
            
            def keyfunc(x):
                # keyword primary key
                # (add index to list comprehension when keyword is in name)
                preferred_key = [float(idx) 
                                 for idx, i in enumerate(preferred_projects)
                                 if i in x['name']]
                # found at least one match in preferred keywords, use first if any, else infinity
                keyword_sortkey = preferred_key[0] if preferred_key else float('inf')
                # return tuple to sort according to primary and secondary key
                return keyword_sortkey, x['name']
            
            AllMyProjectsFromaDatasource.sort(key=keyfunc)
            
            pprint.pprint(AllMyProjectsFromaDatasource)
            

            The output is:

            [{'id': 9, 'name': 'one project', 'otherkey': 'othervalue'},
             {'id': 3, 'name': 'project one', 'otherkey': 'othervalue'},
             {'id': 5, 'name': 'project two', 'otherkey': 'othervalue'},
             {'id': 1, 'name': 'project three', 'otherkey': 'othervalue'},
             {'id': 6, 'name': 'abc project', 'otherkey': 'othervalue'}]
            
            qid & accept id: (33270388, 33270435) query: Switching positions of two strings within a list soup:

            The fact that you 'pulled' the data from the list in variables x and y doesn't help at all, since those variables have no connection anymore with the items from the list. But why don't you swap them directly:

            \n
            original[0], original[9] = original[9], original[0]\n
            \n

            You can use the slicing operator in a similar manner to swap the inner parts of the list.

            \n

            But, there is no need to create a list from the original string. Instead, you can use the slicing operator to achieve the result you want. Note that you cannot swap the string elements as you did with lists, since in Python strings are immutable. However, you can do the following:

            \n
            >>> a = "1234567890"\n>>> a[9] + a[5:9] + a[1:5] + a[0]\n'0678923451'\n>>>\n
            \n soup wrap:

            The fact that you 'pulled' the data from the list in variables x and y doesn't help at all, since those variables have no connection anymore with the items from the list. But why don't you swap them directly:

            original[0], original[9] = original[9], original[0]
            

            You can use the slicing operator in a similar manner to swap the inner parts of the list.

            But, there is no need to create a list from the original string. Instead, you can use the slicing operator to achieve the result you want. Note that you cannot swap the string elements as you did with lists, since in Python strings are immutable. However, you can do the following:

            >>> a = "1234567890"
            >>> a[9] + a[5:9] + a[1:5] + a[0]
            '0678923451'
            >>>
            
            qid & accept id: (33270413, 33270498) query: How to index a user input list in Python 2.x? soup:

            You could use an infinite loop with some sort of sentinel for the user to indicate "Okay no more." How about:

            \n
            cities = []\nwhile True:\n    city = raw_input("Enter a city you've been to (or press enter to exit): ")\n    if city == '':  # no input -- this is your sentinel\n        break  # leave the loop\n    else:\n        cities.append(city)\n
            \n

            Then you can prompt for the countries if you wanted to do that separately for some reason.

            \n
            countries = []\nfor idx, city in enumerate(cities):\n    country = raw_input("Where is " + city + " located? ")\n    countries.append(country)\n    # why did you need the index? enumerate is the way to go now....\n
            \n

            Maybe you need a dictionary then?

            \n
            cities_to_countries = dict(zip(cities, countries))\n
            \n soup wrap:

            You could use an infinite loop with some sort of sentinel for the user to indicate "Okay no more." How about:

            cities = []
            while True:
                city = raw_input("Enter a city you've been to (or press enter to exit): ")
                if city == '':  # no input -- this is your sentinel
                    break  # leave the loop
                else:
                    cities.append(city)
            

            Then you can prompt for the countries if you wanted to do that separately for some reason.

            countries = []
            for idx, city in enumerate(cities):
                country = raw_input("Where is " + city + " located? ")
                countries.append(country)
                # why did you need the index? enumerate is the way to go now....
            

            Maybe you need a dictionary then?

            cities_to_countries = dict(zip(cities, countries))
            
            qid & accept id: (33288420, 33288656) query: Extracting URL parameters into Pandas DataFrame soup:

            You can use a dictionary comprehension to extract the data in the parameters per parameter. I'm not sure if you wanted the final values in list form. If not, it would be easy to extract it.

            \n
            >>> pd.DataFrame({p: [d.get(p) for d in params] \n                  for p in ['param1', 'param2', 'param3', 'param4']})\n     param1    param2    param3    param4\n0   [apple]  [tomato]  [carrot]      None\n1  [banana]      None  [potato]   [berry]\n2      None   [apple]  [tomato]  [carrot]\n
            \n

            or...

            \n
            >>> pd.DataFrame({p: [d[p][0] if p in d else None for d in params] \n                  for p in ['param1', 'param2', 'param3', 'param4']})\n   param1  param2  param3  param4\n0   apple  tomato  carrot    None\n1  banana    None  potato   berry\n2    None   apple  tomato  carrot\n
            \n soup wrap:

            You can use a dictionary comprehension to extract the data in the parameters per parameter. I'm not sure if you wanted the final values in list form. If not, it would be easy to extract it.

            >>> pd.DataFrame({p: [d.get(p) for d in params] 
                              for p in ['param1', 'param2', 'param3', 'param4']})
                 param1    param2    param3    param4
            0   [apple]  [tomato]  [carrot]      None
            1  [banana]      None  [potato]   [berry]
            2      None   [apple]  [tomato]  [carrot]
            

            or...

            >>> pd.DataFrame({p: [d[p][0] if p in d else None for d in params] 
                              for p in ['param1', 'param2', 'param3', 'param4']})
               param1  param2  param3  param4
            0   apple  tomato  carrot    None
            1  banana    None  potato   berry
            2    None   apple  tomato  carrot
            
            qid & accept id: (33296707, 33296969) query: which random am i looking for to achieve this: soup:

            Repeat the procedure a random number of times:

            \n
            [random.choice(['[', ']', '[]']) for _ in range(random.randint(1, 10))]\n
            \n

            will produce a list with up to 10 random selections of the three choices available, starting with at least one. Note that I used a list of the 3 strings, not just 2 characters here.

            \n

            Demo:

            \n
            >>> import random\n>>> [random.choice(['[', ']', '[]']) for _ in range(random.randint(1, 10))]\n[']', '[]']\n>>> [random.choice(['[', ']', '[]']) for _ in range(random.randint(1, 10))]\n['[', '[]', ']', '[]', '[]', '[]', '[]']\n
            \n

            If you need this as one string, use str.join() to concatenate the results:

            \n
            ''.join([random.choice(['[', ']', '[]']) for _ in range(random.randint(1, 10))])\n
            \n

            Note that this'll produce a string between 1 and 20 characters long, as you concatenate a random selection of up to 10 strings each 1 or 2 characters long. Use random.choice('[]') if you need a string up to 10 characters long instead.

            \n soup wrap:

            Repeat the procedure a random number of times:

            [random.choice(['[', ']', '[]']) for _ in range(random.randint(1, 10))]
            

            will produce a list with up to 10 random selections of the three choices available, starting with at least one. Note that I used a list of the 3 strings, not just 2 characters here.

            Demo:

            >>> import random
            >>> [random.choice(['[', ']', '[]']) for _ in range(random.randint(1, 10))]
            [']', '[]']
            >>> [random.choice(['[', ']', '[]']) for _ in range(random.randint(1, 10))]
            ['[', '[]', ']', '[]', '[]', '[]', '[]']
            

            If you need this as one string, use str.join() to concatenate the results:

            ''.join([random.choice(['[', ']', '[]']) for _ in range(random.randint(1, 10))])
            

            Note that this'll produce a string between 1 and 20 characters long, as you concatenate a random selection of up to 10 strings each 1 or 2 characters long. Use random.choice('[]') if you need a string up to 10 characters long instead.

            qid & accept id: (33304356, 33304778) query: Import a exported dict into python soup:

            If you'd like save some data such as list, dict or tuple, etc. to a file. And you want to edit them or you just want them be readable . Use json module like this:

            \n
            >>> import json\n>>> d = {'apple': 1, 'bear': 2}\n\n>>> print(d)\n{'bear': 2, 'apple': 1}\n\n>>> print(json.dumps(d))\n{"bear": 2, "apple": 1}  # these are json data\n>>> \n
            \n

            Now you could save these data to a file. If you want to load them, use json.loads() like this:

            \n
            >>> json_data = '{"bear": 2, "apple": 1}'\n>>> d = json.loads(json_data)\n>>> d['bear']\n2\n>>>\n
            \n soup wrap:

            If you'd like save some data such as list, dict or tuple, etc. to a file. And you want to edit them or you just want them be readable . Use json module like this:

            >>> import json
            >>> d = {'apple': 1, 'bear': 2}
            
            >>> print(d)
            {'bear': 2, 'apple': 1}
            
            >>> print(json.dumps(d))
            {"bear": 2, "apple": 1}  # these are json data
            >>> 
            

            Now you could save these data to a file. If you want to load them, use json.loads() like this:

            >>> json_data = '{"bear": 2, "apple": 1}'
            >>> d = json.loads(json_data)
            >>> d['bear']
            2
            >>>
            
            qid & accept id: (33319664, 33322307) query: Extracting text from HTML file using Python (Music Artist / Title) soup:

            Using a combination of BeautifulSoup, requests, and lxml:

            \n

            First, install the prerequisites:

            \n
            pip install beautifulsoup4\npip install requests\npip install lxml\n
            \n

            swr3.py:

            \n
            import requests, lxml\nfrom bs4 import BeautifulSoup\n\nparsedsongs = []\nresult = requests.get('http://www.swr3.de//-/id=47424/cf=42/did=65794/93avs/index.html?hour=5&date=2015-10-23')\nsoup = BeautifulSoup(result.content, "lxml")\ndetailbodys = soup.find_all('div', 'detail-body')\nfor detailbody in detailbodys:\n    title = detailbody.h4.string.encode('utf-8').strip()\n    if detailbody.h5:\n        artist = detailbody.h5.string.encode('utf-8').strip()\n    else:\n        artist = detailbody.span.string.encode('utf-8').strip()\n    parsedsongs.append({'artist': artist, 'title': title})\n\nfor entry in parsedsongs:\n    print 'Artist: {}\tTitle: {}'.format(entry['artist'], entry['title'])\n
            \n

            The output:

            \n
            (swr3)macbook:swr3 joeyoung$ python swr3.py\nArtist: Vaya Con Dios   Title: Nah neh nah\nArtist: Genesis Title: No son of mine\nArtist: Genesis Title: No son of mine\nArtist: Double You  Title: Please don't go\nArtist: Stereo MC's Title: Step it up\nArtist: Cranberries Title: Zombie\nArtist: La Bouche   Title: Sweet dreams\nArtist: Die Prinzen Title: Du mußt ein Schwein sein\nArtist: Bad Religion    Title: Punk rock song\nArtist: Bellini Title: Samba de Janeiro\nArtist: Dion, Celine; Bee Gees  Title: Immortality\nArtist: Jones, Tom; Mousse T.   Title: Sex bomb\nArtist: Yanai, Kate Title: Bacardi feeling (Summer dreamin')\nArtist: Heroes Del Silencio Title: Entre dos tierras\n
            \n soup wrap:

            Using a combination of BeautifulSoup, requests, and lxml:

            First, install the prerequisites:

            pip install beautifulsoup4
            pip install requests
            pip install lxml
            

            swr3.py:

            import requests, lxml
            from bs4 import BeautifulSoup
            
            parsedsongs = []
            result = requests.get('http://www.swr3.de//-/id=47424/cf=42/did=65794/93avs/index.html?hour=5&date=2015-10-23')
            soup = BeautifulSoup(result.content, "lxml")
            detailbodys = soup.find_all('div', 'detail-body')
            for detailbody in detailbodys:
                title = detailbody.h4.string.encode('utf-8').strip()
                if detailbody.h5:
                    artist = detailbody.h5.string.encode('utf-8').strip()
                else:
                    artist = detailbody.span.string.encode('utf-8').strip()
                parsedsongs.append({'artist': artist, 'title': title})
            
            for entry in parsedsongs:
                print 'Artist: {}\tTitle: {}'.format(entry['artist'], entry['title'])
            

            The output:

            (swr3)macbook:swr3 joeyoung$ python swr3.py
            Artist: Vaya Con Dios   Title: Nah neh nah
            Artist: Genesis Title: No son of mine
            Artist: Genesis Title: No son of mine
            Artist: Double You  Title: Please don't go
            Artist: Stereo MC's Title: Step it up
            Artist: Cranberries Title: Zombie
            Artist: La Bouche   Title: Sweet dreams
            Artist: Die Prinzen Title: Du mußt ein Schwein sein
            Artist: Bad Religion    Title: Punk rock song
            Artist: Bellini Title: Samba de Janeiro
            Artist: Dion, Celine; Bee Gees  Title: Immortality
            Artist: Jones, Tom; Mousse T.   Title: Sex bomb
            Artist: Yanai, Kate Title: Bacardi feeling (Summer dreamin')
            Artist: Heroes Del Silencio Title: Entre dos tierras
            
            qid & accept id: (33341000, 33341085) query: Extract Numbers and Size Information (KB, MB, etc) from a String in Python soup:

            This script:

            \n
            import re\n\n\ntest_string = '44.5MB\n12b\n6.5GB\n12pb'\n\nregex = re.compile(r'(\d+(?:\.\d+)?)\s*([kmgtp]?b)', re.IGNORECASE)\n\norder = ['b', 'kb', 'mb', 'gb', 'tb', 'pb']\n\nfor value, unit in regex.findall(test_string):\n    print(int(float(value) * (1024**order.index(unit.lower()))))\n
            \n

            Will print:

            \n
            46661632\n12\n6979321856\n13510798882111488\n
            \n

            Which is the sizes it found in bytes.

            \n soup wrap:

            This script:

            import re
            
            
            test_string = '44.5MB\n12b\n6.5GB\n12pb'
            
            regex = re.compile(r'(\d+(?:\.\d+)?)\s*([kmgtp]?b)', re.IGNORECASE)
            
            order = ['b', 'kb', 'mb', 'gb', 'tb', 'pb']
            
            for value, unit in regex.findall(test_string):
                print(int(float(value) * (1024**order.index(unit.lower()))))
            

            Will print:

            46661632
            12
            6979321856
            13510798882111488
            

            Which is the sizes it found in bytes.

            qid & accept id: (33347648, 33350443) query: How to process input in parallel with python, but without processes? soup:

            You can easily define Worker threads that work in parallel till a queue is empty.

            \n
            from threading import Thread\nfrom collections import deque\nimport time\n\n\n# Create a new class that inherits from Thread\nclass Worker(Thread):\n\n    def __init__(self, inqueue, outqueue, func):\n        '''\n        A worker that calls func on objects in inqueue and\n        pushes the result into outqueue\n\n        runs until inqueue is empty\n        '''\n\n        self.inqueue = inqueue\n        self.outqueue = outqueue\n        self.func = func\n        super().__init__()\n\n    # override the run method, this is starte when\n    # you call worker.start()\n    def run(self):\n        while self.inqueue:\n            data = self.inqueue.popleft()\n            print('start')\n            result = self.func(data)\n            self.outqueue.append(result)\n            print('finished')\n\n\ndef test(x):\n    time.sleep(x)\n    return 2 * x\n\n\nif __name__ == '__main__':\n    data = 12 * [1, ]\n    queue = deque(data)\n    result = deque()\n\n    # create 3 workers working on the same input\n    workers = [Worker(queue, result, test) for _ in range(3)]\n\n    # start the workers\n    for worker in workers:\n        worker.start()\n\n    # wait till all workers are finished\n    for worker in workers:\n        worker.join()\n\n    print(result)\n
            \n

            As expected, this runs ca. 4 seconds.

            \n

            One could also write a simple Pool class to get rid of the noise in the main function:

            \n
            from threading import Thread\nfrom collections import deque\nimport time\n\n\nclass Pool():\n\n    def __init__(self, n_threads):\n        self.n_threads = n_threads\n\n    def map(self, func, data):\n        inqueue = deque(data)\n        result = deque()\n\n        workers = [Worker(inqueue, result, func) for i in range(self.n_threads)]\n\n        for worker in workers:\n            worker.start()\n\n        for worker in workers:\n            worker.join()\n\n        return list(result)\n\n\nclass Worker(Thread):\n\n    def __init__(self, inqueue, outqueue, func):\n        '''\n        A worker that calls func on objects in inqueue and\n        pushes the result into outqueue\n\n        runs until inqueue is empty\n        '''\n\n        self.inqueue = inqueue\n        self.outqueue = outqueue\n        self.func = func\n        super().__init__()\n\n    # override the run method, this is starte when\n    # you call worker.start()\n    def run(self):\n        while self.inqueue:\n            data = self.inqueue.popleft()\n            print('start')\n            result = self.func(data)\n            self.outqueue.append(result)\n            print('finished')\n\n\ndef test(x):\n    time.sleep(x)\n    return 2 * x\n\n\nif __name__ == '__main__':\n    data = 12 * [1, ]\n\n    pool = Pool(6)\n    result = pool.map(test, data)\n\n    print(result)\n
            \n soup wrap:

            You can easily define Worker threads that work in parallel till a queue is empty.

            from threading import Thread
            from collections import deque
            import time
            
            
            # Create a new class that inherits from Thread
            class Worker(Thread):
            
                def __init__(self, inqueue, outqueue, func):
                    '''
                    A worker that calls func on objects in inqueue and
                    pushes the result into outqueue
            
                    runs until inqueue is empty
                    '''
            
                    self.inqueue = inqueue
                    self.outqueue = outqueue
                    self.func = func
                    super().__init__()
            
                # override the run method, this is starte when
                # you call worker.start()
                def run(self):
                    while self.inqueue:
                        data = self.inqueue.popleft()
                        print('start')
                        result = self.func(data)
                        self.outqueue.append(result)
                        print('finished')
            
            
            def test(x):
                time.sleep(x)
                return 2 * x
            
            
            if __name__ == '__main__':
                data = 12 * [1, ]
                queue = deque(data)
                result = deque()
            
                # create 3 workers working on the same input
                workers = [Worker(queue, result, test) for _ in range(3)]
            
                # start the workers
                for worker in workers:
                    worker.start()
            
                # wait till all workers are finished
                for worker in workers:
                    worker.join()
            
                print(result)
            

            As expected, this runs ca. 4 seconds.

            One could also write a simple Pool class to get rid of the noise in the main function:

            from threading import Thread
            from collections import deque
            import time
            
            
            class Pool():
            
                def __init__(self, n_threads):
                    self.n_threads = n_threads
            
                def map(self, func, data):
                    inqueue = deque(data)
                    result = deque()
            
                    workers = [Worker(inqueue, result, func) for i in range(self.n_threads)]
            
                    for worker in workers:
                        worker.start()
            
                    for worker in workers:
                        worker.join()
            
                    return list(result)
            
            
            class Worker(Thread):
            
                def __init__(self, inqueue, outqueue, func):
                    '''
                    A worker that calls func on objects in inqueue and
                    pushes the result into outqueue
            
                    runs until inqueue is empty
                    '''
            
                    self.inqueue = inqueue
                    self.outqueue = outqueue
                    self.func = func
                    super().__init__()
            
                # override the run method, this is starte when
                # you call worker.start()
                def run(self):
                    while self.inqueue:
                        data = self.inqueue.popleft()
                        print('start')
                        result = self.func(data)
                        self.outqueue.append(result)
                        print('finished')
            
            
            def test(x):
                time.sleep(x)
                return 2 * x
            
            
            if __name__ == '__main__':
                data = 12 * [1, ]
            
                pool = Pool(6)
                result = pool.map(test, data)
            
                print(result)
            
            qid & accept id: (33354950, 33354994) query: Python: Removing random whitespace from a string of numbers soup:

            this

            \n
            s = '20101002  100224   1    1044      45508  1001  1002  1003  1004  1005  1006'\nnew_s = ' '.join(s.split())\nprint(new_s)\n
            \n

            produces

            \n
            20101002 100224 1 1044 45508 1001 1002 1003 1004 1005 1006\n
            \n

            Basically, there are two steps involved:

            \n

            first we split the string into words with s.split(), which returns this list

            \n
            ['20101002', '100224', '1', '1044', '45508', '1001', '1002', '1003', '1004', '1005', '1006']\n
            \n

            then we pass the list to ' '.join, which joins all the elements of the list using the space character between them

            \n soup wrap:

            this

            s = '20101002  100224   1    1044      45508  1001  1002  1003  1004  1005  1006'
            new_s = ' '.join(s.split())
            print(new_s)
            

            produces

            20101002 100224 1 1044 45508 1001 1002 1003 1004 1005 1006
            

            Basically, there are two steps involved:

            first we split the string into words with s.split(), which returns this list

            ['20101002', '100224', '1', '1044', '45508', '1001', '1002', '1003', '1004', '1005', '1006']
            

            then we pass the list to ' '.join, which joins all the elements of the list using the space character between them

            qid & accept id: (33371558, 33372143) query: Neat way of popping key, value PAIR from dictionary? soup:

            You can define yourself dictionary object using python ABCs which provides the infrastructure for defining abstract base classes. And then overload the pop attribute of python dictionary objects based on your need:

            \n
            from collections import Mapping\n\nclass MyDict(Mapping):\n    def __init__(self, *args, **kwargs):\n        self.update(dict(*args, **kwargs))\n\n    def __setitem__(self, key, item): \n        self.__dict__[key] = item\n\n    def __getitem__(self, key): \n        return self.__dict__[key]\n\n    def __delitem__(self, key): \n        del self.__dict__[key]\n\n    def pop(self, k, d=None):\n        return k,self.__dict__.pop(k, d)\n\n    def update(self, *args, **kwargs):\n        return self.__dict__.update(*args, **kwargs)\n\n    def __iter__(self):\n        return iter(self.__dict__)\n\n    def __len__(self):\n        return len(self.__dict__)\n\n    def __repr__(self): \n        return repr(self.__dict__)\n
            \n

            Demo:

            \n
            d=MyDict()\n\nd['a']=1\nd['b']=5\nd['c']=8\n\nprint d\n{'a': 1, 'c': 8, 'b': 5}\n\nprint d.pop(min(d, key=d.get))\n('a', 1)\n\nprint d\n{'c': 8, 'b': 5}\n
            \n

            Note : As @chepner suggested in comment as a better choice you can override popitem, which already returns a key/value pair.

            \n soup wrap:

            You can define yourself dictionary object using python ABCs which provides the infrastructure for defining abstract base classes. And then overload the pop attribute of python dictionary objects based on your need:

            from collections import Mapping
            
            class MyDict(Mapping):
                def __init__(self, *args, **kwargs):
                    self.update(dict(*args, **kwargs))
            
                def __setitem__(self, key, item): 
                    self.__dict__[key] = item
            
                def __getitem__(self, key): 
                    return self.__dict__[key]
            
                def __delitem__(self, key): 
                    del self.__dict__[key]
            
                def pop(self, k, d=None):
                    return k,self.__dict__.pop(k, d)
            
                def update(self, *args, **kwargs):
                    return self.__dict__.update(*args, **kwargs)
            
                def __iter__(self):
                    return iter(self.__dict__)
            
                def __len__(self):
                    return len(self.__dict__)
            
                def __repr__(self): 
                    return repr(self.__dict__)
            

            Demo:

            d=MyDict()
            
            d['a']=1
            d['b']=5
            d['c']=8
            
            print d
            {'a': 1, 'c': 8, 'b': 5}
            
            print d.pop(min(d, key=d.get))
            ('a', 1)
            
            print d
            {'c': 8, 'b': 5}
            

            Note : As @chepner suggested in comment as a better choice you can override popitem, which already returns a key/value pair.

            qid & accept id: (33385238, 33385305) query: How to convert pandas single column data frame to series or numpy vector soup:

            You can simply index the series you want. Example -

            \n
            tdf['s1']\n
            \n

            Demo -

            \n
            In [24]: tdf =  pd.DataFrame({'s1' : [0,1,23.4,10,23]})\n\nIn [25]: tdf['s1']\nOut[25]:\n0     0.0\n1     1.0\n2    23.4\n3    10.0\n4    23.0\nName: s1, dtype: float64\n\nIn [26]: tdf['s1'].shape\nOut[26]: (5,)\n
            \n

            If you want the values in the series as numpy array, you can use .values accessor , Example -

            \n
            In [27]: tdf['s1'].values\nOut[27]: array([  0. ,   1. ,  23.4,  10. ,  23. ])\n
            \n soup wrap:

            You can simply index the series you want. Example -

            tdf['s1']
            

            Demo -

            In [24]: tdf =  pd.DataFrame({'s1' : [0,1,23.4,10,23]})
            
            In [25]: tdf['s1']
            Out[25]:
            0     0.0
            1     1.0
            2    23.4
            3    10.0
            4    23.0
            Name: s1, dtype: float64
            
            In [26]: tdf['s1'].shape
            Out[26]: (5,)
            

            If you want the values in the series as numpy array, you can use .values accessor , Example -

            In [27]: tdf['s1'].values
            Out[27]: array([  0. ,   1. ,  23.4,  10. ,  23. ])
            
            qid & accept id: (33401529, 33401588) query: Parse a file into a dictionary of arrays soup:

            You can use dict.setdefault method to create your expected dictionary :

            \n
            my_dict={}\nwith open('file.dat', 'rb') as csvfile:\n    dataReader=csv.reader(csvfile)\n    for name,item1,item2 in dataReader:\n         my_dict.setdefault(name,[]).append([item1,item2])\n
            \n

            If you are using python 3.X you can use unpacking assingement in your loop :

            \n
            for name,*items in dataReader:\n    my_dict.setdefault(name,[]).append(items)\n
            \n soup wrap:

            You can use dict.setdefault method to create your expected dictionary :

            my_dict={}
            with open('file.dat', 'rb') as csvfile:
                dataReader=csv.reader(csvfile)
                for name,item1,item2 in dataReader:
                     my_dict.setdefault(name,[]).append([item1,item2])
            

            If you are using python 3.X you can use unpacking assingement in your loop :

            for name,*items in dataReader:
                my_dict.setdefault(name,[]).append(items)
            
            qid & accept id: (33402355, 33456324) query: Finding groups of increasing numbers in a list soup:

            A couple of different ways using itertools and numpy:

            \n
            from itertools import groupby, tee, cycle\n\nx = [17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35,\n     36, 1, 2, 3, 4,34,54]\n\n\ndef sequences(l):\n    x2 = cycle(l)\n    next(x2)\n    grps = groupby(l, key=lambda j: j + 1 == next(x2))\n    for k, v in grps:\n        if k:\n            yield tuple(v) + (next((next(grps)[1])),)\n\n\nprint(list(sequences(x)))\n\n[(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), (28, 29, 30, 31, 32, 33, 34, 35, 36), (1, 2, 3, 4)]\n
            \n

            Or using python3 and yield from:

            \n
            def sequences(l):\n    x2 = cycle(l)\n    next(x2)\n    grps = groupby(l, key=lambda j: j + 1 == next(x2))\n    yield from (tuple(v) + (next((next(grps)[1])),) for k,v in grps if k)\n\nprint(list(sequences(x)))\n
            \n

            Using a variation of my answer here with numpy.split :

            \n
            out = [tuple(arr) for arr in np.split(x, np.where(np.diff(x) != 1)[0] + 1) if arr.size > 1]\n\nprint(out)\n\n[(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), (28, 29, 30, 31, 32, 33, 34, 35, 36), (1, 2, 3, 4)]\n
            \n

            And similar to ekhumoro's answer:

            \n
            def sequences(x):\n    it = iter(x)\n    prev, temp = next(it), []\n    while prev is not None:\n        start = next(it, None)\n        if prev + 1 == start:\n            temp.append(prev)\n        elif temp:\n            yield tuple(temp + [prev])\n            temp = []\n        prev = start\n
            \n

            To get the length and the tuple:

            \n
            def sequences(l):\n    x2 = cycle(l)\n    next(x2)\n    grps = groupby(l, key=lambda j: j + 1 == next(x2))\n    for k, v in grps:\n        if k:\n            t = tuple(v) + (next(next(grps)[1]),)\n            yield t, len(t)\n\n\ndef sequences(l):\n    x2 = cycle(l)\n    next(x2)\n    grps = groupby(l, lambda j: j + 1 == next(x2))\n    yield from ((t, len(t)) for t in (tuple(v) + (next(next(grps)[1]),)\n                                      for k, v in grps if k))\n\n\n\ndef sequences(x):\n        it = iter(x)\n        prev, temp = next(it), []\n        while prev is not None:\n            start = next(it, None)\n            if prev + 1 == start:\n                temp.append(prev)\n            elif temp:\n                yield tuple(temp + [prev]), len(temp) + 1\n                temp = []\n            prev = start\n
            \n

            Output will be the same for all three:

            \n
            [((19, 20, 21, 22), 4), ((0, 1, 2), 3), ((4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), 11)\n, ((28, 29, 30, 31, 32, 33, 34, 35, 36), 9), ((1, 2, 3, 4), 4)]\n
            \n soup wrap:

            A couple of different ways using itertools and numpy:

            from itertools import groupby, tee, cycle
            
            x = [17, 17, 19, 20, 21, 22, 0, 1, 2, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 14, 14, 28, 29, 30, 31, 32, 33, 34, 35,
                 36, 1, 2, 3, 4,34,54]
            
            
            def sequences(l):
                x2 = cycle(l)
                next(x2)
                grps = groupby(l, key=lambda j: j + 1 == next(x2))
                for k, v in grps:
                    if k:
                        yield tuple(v) + (next((next(grps)[1])),)
            
            
            print(list(sequences(x)))
            
            [(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), (28, 29, 30, 31, 32, 33, 34, 35, 36), (1, 2, 3, 4)]
            

            Or using python3 and yield from:

            def sequences(l):
                x2 = cycle(l)
                next(x2)
                grps = groupby(l, key=lambda j: j + 1 == next(x2))
                yield from (tuple(v) + (next((next(grps)[1])),) for k,v in grps if k)
            
            print(list(sequences(x)))
            

            Using a variation of my answer here with numpy.split :

            out = [tuple(arr) for arr in np.split(x, np.where(np.diff(x) != 1)[0] + 1) if arr.size > 1]
            
            print(out)
            
            [(19, 20, 21, 22), (0, 1, 2), (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), (28, 29, 30, 31, 32, 33, 34, 35, 36), (1, 2, 3, 4)]
            

            And similar to ekhumoro's answer:

            def sequences(x):
                it = iter(x)
                prev, temp = next(it), []
                while prev is not None:
                    start = next(it, None)
                    if prev + 1 == start:
                        temp.append(prev)
                    elif temp:
                        yield tuple(temp + [prev])
                        temp = []
                    prev = start
            

            To get the length and the tuple:

            def sequences(l):
                x2 = cycle(l)
                next(x2)
                grps = groupby(l, key=lambda j: j + 1 == next(x2))
                for k, v in grps:
                    if k:
                        t = tuple(v) + (next(next(grps)[1]),)
                        yield t, len(t)
            
            
            def sequences(l):
                x2 = cycle(l)
                next(x2)
                grps = groupby(l, lambda j: j + 1 == next(x2))
                yield from ((t, len(t)) for t in (tuple(v) + (next(next(grps)[1]),)
                                                  for k, v in grps if k))
            
            
            
            def sequences(x):
                    it = iter(x)
                    prev, temp = next(it), []
                    while prev is not None:
                        start = next(it, None)
                        if prev + 1 == start:
                            temp.append(prev)
                        elif temp:
                            yield tuple(temp + [prev]), len(temp) + 1
                            temp = []
                        prev = start
            

            Output will be the same for all three:

            [((19, 20, 21, 22), 4), ((0, 1, 2), 3), ((4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), 11)
            , ((28, 29, 30, 31, 32, 33, 34, 35, 36), 9), ((1, 2, 3, 4), 4)]
            
            qid & accept id: (33404038, 33404170) query: Python pandas to get specified rows from a CSV file soup:

            When you use the read_csv you need to read the whole file you can't read part of it.

            \n

            When it comes down to the chunksize, you need to take those "chunks" that are listed under wow and concat().

            \n

            For example:

            \n
            chunks = pd.read_csv(data, chunksize = 100)\ndf = pd.concat(chunks, ignore_index=True)\n
            \n

            So now you have a the full dataframe and you can do whatever analysis you need to do.

            \n

            It's also an iterable object, so you can do the following:

            \n
            for chunk in chunks:\n    #do something to each chunk\n
            \n soup wrap:

            When you use the read_csv you need to read the whole file you can't read part of it.

            When it comes down to the chunksize, you need to take those "chunks" that are listed under wow and concat().

            For example:

            chunks = pd.read_csv(data, chunksize = 100)
            df = pd.concat(chunks, ignore_index=True)
            

            So now you have a the full dataframe and you can do whatever analysis you need to do.

            It's also an iterable object, so you can do the following:

            for chunk in chunks:
                #do something to each chunk
            
            qid & accept id: (33431816, 33432607) query: How can a Python module single file be installed using pip and PyPI? soup:

            Try using something like this for a single file. Following is the directory structure:

            \n
            .\n├── example.py\n├── LICENSE\n├── README.md\n└── setup.py\n\n0 directories, 4 files\n
            \n

            setup.py

            \n
            from setuptools import setup\n\nsetup(\n    name='example',\n    version='0.1.0',\n    py_modules=['example'],\n    install_requires=[\n        'exampledep',\n    ],\n    entry_points='''\n        [console_scripts]\n        example=example:example\n    ''',\n)\n
            \n

            The above worked for me. This is how example file would look like.

            \n
            def example():\n    # Note: You can use sys.argv here\n    print "Hi! I'm a command written in python."\n
            \n

            This can also be imported like so:

            \n
            import example\nexample.example()\n# or\nfrom example import example\nexample()\n
            \n

            Hope this helps.

            \n

            Install Requires

            \n

            The install_requires is used to define the dependencies for your module/application. For e.g in this case example module in dependent on exampledep. So when someone does pip install example, then pip will also install exampledep as it is listed in the dependencies.

            \n

            Entry Points

            \n

            This is usually a callable which the end user of package might want to use. This usually a callable and is used for command line. You can look at this question or this doc for more details.

            \n soup wrap:

            Try using something like this for a single file. Following is the directory structure:

            .
            ├── example.py
            ├── LICENSE
            ├── README.md
            └── setup.py
            
            0 directories, 4 files
            

            setup.py

            from setuptools import setup
            
            setup(
                name='example',
                version='0.1.0',
                py_modules=['example'],
                install_requires=[
                    'exampledep',
                ],
                entry_points='''
                    [console_scripts]
                    example=example:example
                ''',
            )
            

            The above worked for me. This is how example file would look like.

            def example():
                # Note: You can use sys.argv here
                print "Hi! I'm a command written in python."
            

            This can also be imported like so:

            import example
            example.example()
            # or
            from example import example
            example()
            

            Hope this helps.

            Install Requires

            The install_requires is used to define the dependencies for your module/application. For e.g in this case example module in dependent on exampledep. So when someone does pip install example, then pip will also install exampledep as it is listed in the dependencies.

            Entry Points

            This is usually a callable which the end user of package might want to use. This usually a callable and is used for command line. You can look at this question or this doc for more details.

            qid & accept id: (33466627, 33481344) query: Cron Job File Creation - Created File Permissions soup:

            It turns out that cron does not source any shell profiles (/etc/profile, ~/.bashrc), so the umask has to be set in the script that is being called by cron.

            \n

            When using user-level crontabs (crontab -e), the umask can be simply set as follows:

            \n
            0 * * * * umask 002; /path/to/script\n
            \n

            This will work even if it is a python script, as the default value of os.umask inherits from the shell's umask.

            \n

            However, placing a python script in /etc/cron.hourly etc., there is no way to set the umask except in the python script itself:

            \n
            import os\nos.umask(002)\n
            \n soup wrap:

            It turns out that cron does not source any shell profiles (/etc/profile, ~/.bashrc), so the umask has to be set in the script that is being called by cron.

            When using user-level crontabs (crontab -e), the umask can be simply set as follows:

            0 * * * * umask 002; /path/to/script
            

            This will work even if it is a python script, as the default value of os.umask inherits from the shell's umask.

            However, placing a python script in /etc/cron.hourly etc., there is no way to set the umask except in the python script itself:

            import os
            os.umask(002)
            
            qid & accept id: (33477811, 33494157) query: Computing the Difference between two graphs 'edge wise' in networkx soup:

            Do you want to compute a graph (DIF) with the edges that are in your reference (R) graph, but not in your input graph (S)? \nOr do you want to calculate a graph with the edges that are different between R and S? I included both options, one is commented out.

            \n
            import networkx as nx\n\nS = nx.DiGraph()#S-sample graph\nS.add_nodes_from([0, 1, 2])\nS.add_edge(0, 2)\nS.add_edge(1, 2)\n\nR = nx.DiGraph()#R-reference graph\nR.add_nodes_from([0, 1, 2])\nR.add_edge(1, 2)\n\n\ndef difference(S, R):\n    DIF = nx.create_empty_copy(R)\n    DIF.name = "Difference of (%s and %s)" % (S.name, R.name)\n    if set(S) != set(R):\n        raise nx.NetworkXError("Node sets of graphs is not equal")\n\n    r_edges = set(R.edges_iter())\n    s_edges = set(S.edges_iter())\n\n    # I'm not sure what the goal is: the difference, or the edges that are in R but not in S\n    # In case it is the difference:\n    diff_edges = r_edges.symmetric_difference(s_edges)\n\n    # In case its the edges that are in R but not in S:\n    # diff_edges = r_edges - s_edges\n\n    DIF.add_edges_from(diff_edges)\n\n    return DIF\n\nprint(difference(S, R).edges())\n
            \n

            this version prints [(0, 2)]

            \n

            As @Joel noticed, in undirected Graphs, there is no guaranty (at least: I did not find it in the source or the documentation) that the order of the nodes will be consistent. If that is an issue, you could convert the tuples into frozensets first, so the order does not matter. You need frozensets, and not sets or lists, because these are hashable (and that is a requirement for members of a set)

            \n
            set([frozenset(x) for x in S.edges()])\n
            \n soup wrap:

            Do you want to compute a graph (DIF) with the edges that are in your reference (R) graph, but not in your input graph (S)? Or do you want to calculate a graph with the edges that are different between R and S? I included both options, one is commented out.

            import networkx as nx
            
            S = nx.DiGraph()#S-sample graph
            S.add_nodes_from([0, 1, 2])
            S.add_edge(0, 2)
            S.add_edge(1, 2)
            
            R = nx.DiGraph()#R-reference graph
            R.add_nodes_from([0, 1, 2])
            R.add_edge(1, 2)
            
            
            def difference(S, R):
                DIF = nx.create_empty_copy(R)
                DIF.name = "Difference of (%s and %s)" % (S.name, R.name)
                if set(S) != set(R):
                    raise nx.NetworkXError("Node sets of graphs is not equal")
            
                r_edges = set(R.edges_iter())
                s_edges = set(S.edges_iter())
            
                # I'm not sure what the goal is: the difference, or the edges that are in R but not in S
                # In case it is the difference:
                diff_edges = r_edges.symmetric_difference(s_edges)
            
                # In case its the edges that are in R but not in S:
                # diff_edges = r_edges - s_edges
            
                DIF.add_edges_from(diff_edges)
            
                return DIF
            
            print(difference(S, R).edges())
            

            this version prints [(0, 2)]

            As @Joel noticed, in undirected Graphs, there is no guaranty (at least: I did not find it in the source or the documentation) that the order of the nodes will be consistent. If that is an issue, you could convert the tuples into frozensets first, so the order does not matter. You need frozensets, and not sets or lists, because these are hashable (and that is a requirement for members of a set)

            set([frozenset(x) for x in S.edges()])
            
            qid & accept id: (33518124, 33518725) query: How to apply a function on every row on a dataframe? soup:

            As I don't know what PartMaster is, the following should work:

            \n
            def EOQ(D,p,ck,ch):\n    p,D = Partmaster\n    Q = math.sqrt((2*D*ck)/(ch*p))\n    return Q\nch=0.2\nck=5\ndf['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)\ndf\n
            \n

            If all you're doing is calculating the square root of some result then use the np.sqrt method this is vectorised and will be significantly faster:

            \n
            In [80]:\ndf['Q'] = np.sqrt((2*df['D']*ck)/(ch*df['p']))\n\ndf\nOut[80]:\n    D   p          Q\n0  10  20   5.000000\n1  20  30   5.773503\n2  30  10  12.247449\n
            \n

            Timings

            \n

            For a 30k row df:

            \n
            In [92]:\n\nimport math\nch=0.2\nck=5\ndef EOQ(D,p,ck,ch):\n    Q = math.sqrt((2*D*ck)/(ch*p))\n    return Q\n\n%timeit np.sqrt((2*df['D']*ck)/(ch*df['p']))\n%timeit df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)\n1000 loops, best of 3: 622 µs per loop\n1 loops, best of 3: 1.19 s per loop\n
            \n

            You can see that the np method is ~1900 X faster

            \n soup wrap:

            As I don't know what PartMaster is, the following should work:

            def EOQ(D,p,ck,ch):
                p,D = Partmaster
                Q = math.sqrt((2*D*ck)/(ch*p))
                return Q
            ch=0.2
            ck=5
            df['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
            df
            

            If all you're doing is calculating the square root of some result then use the np.sqrt method this is vectorised and will be significantly faster:

            In [80]:
            df['Q'] = np.sqrt((2*df['D']*ck)/(ch*df['p']))
            
            df
            Out[80]:
                D   p          Q
            0  10  20   5.000000
            1  20  30   5.773503
            2  30  10  12.247449
            

            Timings

            For a 30k row df:

            In [92]:
            
            import math
            ch=0.2
            ck=5
            def EOQ(D,p,ck,ch):
                Q = math.sqrt((2*D*ck)/(ch*p))
                return Q
            
            %timeit np.sqrt((2*df['D']*ck)/(ch*df['p']))
            %timeit df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
            1000 loops, best of 3: 622 µs per loop
            1 loops, best of 3: 1.19 s per loop
            

            You can see that the np method is ~1900 X faster

            qid & accept id: (33519408, 33519509) query: On using a string as an integer counter (aka index) in a for loop soup:

            You could use enumerate, which generates tuples that pair the index of an item with the item itself:

            \n
            for i, file in enumerate(os.listdir(directoryPath)):\n    if file.endswith(".csv"):\n       array1[i] = numpy.genfromtxt(file, delimiter=',')[:,2]\n
            \n

            Or you could store the Numpy arrays in a dictionary that is indexed directly by the associated file name:

            \n
            arrays = {}\nfor file in os.listdir(directoryPath):\n    if file.endswith(".csv"):\n       arrays[file] = numpy.genfromtxt(file, delimiter=',')[:,2]\n
            \n

            Or with an OrderedDict:

            \n
            from collections import OrderedDict\n\narrays = OrderedDict()\nfor file in os.listdir(directoryPath):\n    if file.endswith(".csv"):\n       arrays[file] = numpy.genfromtxt(file, delimiter=',')[:,2]\n
            \n soup wrap:

            You could use enumerate, which generates tuples that pair the index of an item with the item itself:

            for i, file in enumerate(os.listdir(directoryPath)):
                if file.endswith(".csv"):
                   array1[i] = numpy.genfromtxt(file, delimiter=',')[:,2]
            

            Or you could store the Numpy arrays in a dictionary that is indexed directly by the associated file name:

            arrays = {}
            for file in os.listdir(directoryPath):
                if file.endswith(".csv"):
                   arrays[file] = numpy.genfromtxt(file, delimiter=',')[:,2]
            

            Or with an OrderedDict:

            from collections import OrderedDict
            
            arrays = OrderedDict()
            for file in os.listdir(directoryPath):
                if file.endswith(".csv"):
                   arrays[file] = numpy.genfromtxt(file, delimiter=',')[:,2]
            
            qid & accept id: (33553826, 33553911) query: Given two lists of strings, find the total number of strings in the second list which contains any string in the first list as substring soup:

            This is a simple way, but I get 4:

            \n
            >>> sum(a in b for a in ListA for b in ListB)\n4\n
            \n

            Unless you want to be case-insensitive

            \n
            >>> sum(a.lower() in b.lower() for a in ListA for b in ListB)\n5\n
            \n

            As stated, though, your question is ambiguous: this method counts how many matches there are. If you want to count how many words in ListB have a match, you could do this:

            \n
            >>> len(set(b for a in ListA for b in ListB if a.lower() in b.lower()))\n5\n
            \n

            As an example of where it differs:

            \n
            >>> ListA = ['stop', 'kill']\n>>> ListB = ['stoppable', 'killable', 'stopkill']\n\n>>> sum(a.lower() in b.lower() for a in ListA for b in ListB)\n4\n>>> len(set(b for a in ListA for b in ListB if a.lower() in b.lower()))\n3\n
            \n soup wrap:

            This is a simple way, but I get 4:

            >>> sum(a in b for a in ListA for b in ListB)
            4
            

            Unless you want to be case-insensitive

            >>> sum(a.lower() in b.lower() for a in ListA for b in ListB)
            5
            

            As stated, though, your question is ambiguous: this method counts how many matches there are. If you want to count how many words in ListB have a match, you could do this:

            >>> len(set(b for a in ListA for b in ListB if a.lower() in b.lower()))
            5
            

            As an example of where it differs:

            >>> ListA = ['stop', 'kill']
            >>> ListB = ['stoppable', 'killable', 'stopkill']
            
            >>> sum(a.lower() in b.lower() for a in ListA for b in ListB)
            4
            >>> len(set(b for a in ListA for b in ListB if a.lower() in b.lower()))
            3
            
            qid & accept id: (33554230, 33576058) query: How to set an attribute to a vector in rpy2 soup:

            Use the attribute slots. What is described about it in the doc for S4 objects applies to attributes (http://rpy2.readthedocs.org/en/version_2.7.x/notebooks/s4class.html).

            \n

            Here it should work with:

            \n
            from rpy2.robjects.vectors import FloatVector, IntVector\npot = FloatVector((2.0, 3.2, 4, 5, 6, 7))\nts = IntVector((1,6,7,19,20,30))\npot.slots['times'] = ts\n
            \n

            For rpy2 < 2.7, you should use do_slot_assign :

            \n
            pot.do_slot_assign("times",ts)\n
            \n soup wrap:

            Use the attribute slots. What is described about it in the doc for S4 objects applies to attributes (http://rpy2.readthedocs.org/en/version_2.7.x/notebooks/s4class.html).

            Here it should work with:

            from rpy2.robjects.vectors import FloatVector, IntVector
            pot = FloatVector((2.0, 3.2, 4, 5, 6, 7))
            ts = IntVector((1,6,7,19,20,30))
            pot.slots['times'] = ts
            

            For rpy2 < 2.7, you should use do_slot_assign :

            pot.do_slot_assign("times",ts)
            
            qid & accept id: (33569374, 33570371) query: Take dot product of first and middle entry, second and middle+1 entries until middle-1 and last entry python/numpy soup:

            NB reading your question, I think you want your second nested loop to be for j in xrange(Nt-i): since xrange excludes the upper limit.

            \n

            I think you can what you want with einsum:

            \n
            import numpy as np\nsummed = 0\n\ndim1 = 2  # this is 81 in your case\ndim2 = 4  # this is 990000 in your case\narray = np.random.random(size=(dim1, dim2, 3))\n\nNt = dim2\ni = Nt // 2\n\nfor k in xrange(dim1):\n    summed = 0\n    for j in xrange(dim2-i):\n        vec1 = array[k][j]\n        vec2 = array[k][j+i]\n        summed += np.dot(vec1,vec2)\n    print summed\n\nprint '='*70\n\nfor k in xrange(dim1):\n    summed = np.einsum('ij,ij', array[k][:Nt//2], array[k][Nt//2:])\n    print summed\n
            \n

            e.g.

            \n
            2.0480375425\n1.89065215839\n======================================================================\n2.0480375425\n1.89065215839\n
            \n

            Doubtless you can even remove the outer loop as well (though in your case it probably won't speed things up much):

            \n
            np.einsum('kij,kij->k', array[:,:Nt//2,:], array[:,Nt//2:,:])\n
            \n

            gives

            \n
            [ 2.0480375425  1.89065215839]\n
            \n soup wrap:

            NB reading your question, I think you want your second nested loop to be for j in xrange(Nt-i): since xrange excludes the upper limit.

            I think you can what you want with einsum:

            import numpy as np
            summed = 0
            
            dim1 = 2  # this is 81 in your case
            dim2 = 4  # this is 990000 in your case
            array = np.random.random(size=(dim1, dim2, 3))
            
            Nt = dim2
            i = Nt // 2
            
            for k in xrange(dim1):
                summed = 0
                for j in xrange(dim2-i):
                    vec1 = array[k][j]
                    vec2 = array[k][j+i]
                    summed += np.dot(vec1,vec2)
                print summed
            
            print '='*70
            
            for k in xrange(dim1):
                summed = np.einsum('ij,ij', array[k][:Nt//2], array[k][Nt//2:])
                print summed
            

            e.g.

            2.0480375425
            1.89065215839
            ======================================================================
            2.0480375425
            1.89065215839
            

            Doubtless you can even remove the outer loop as well (though in your case it probably won't speed things up much):

            np.einsum('kij,kij->k', array[:,:Nt//2,:], array[:,Nt//2:,:])
            

            gives

            [ 2.0480375425  1.89065215839]
            
            qid & accept id: (33575575, 33575591) query: Modify all rows in table soup:

            Use pure SQL and one UPDATE:

            \n
            UPDATE companies \nSET ico = REPLACE(ico, ' ', '');\n
            \n

            There is no need for using loop.

            \n

            You can update records that contains at least one space:

            \n
            UPDATE companies \nSET ico = REPLACE(ico, ' ', '')\nWHERE ico LIKE '% %';\n
            \n soup wrap:

            Use pure SQL and one UPDATE:

            UPDATE companies 
            SET ico = REPLACE(ico, ' ', '');
            

            There is no need for using loop.

            You can update records that contains at least one space:

            UPDATE companies 
            SET ico = REPLACE(ico, ' ', '')
            WHERE ico LIKE '% %';
            
            qid & accept id: (33577686, 33577785) query: calling linux shell in python with writing output in a text file soup:

            use e.g. subprocess.call like this

            \n
            subprocess.call(["ls",  "-lrt"], stdout=open("foo.txt",'w'))\n
            \n

            The signature of the function

            \n
            subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)\n
            \n

            is rather self-explanatory; stdin , stdout and stderr are meant for fileobjects like those returned by open()

            \n soup wrap:

            use e.g. subprocess.call like this

            subprocess.call(["ls",  "-lrt"], stdout=open("foo.txt",'w'))
            

            The signature of the function

            subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)
            

            is rather self-explanatory; stdin , stdout and stderr are meant for fileobjects like those returned by open()

            qid & accept id: (33578676, 33614534) query: Assign differing values to list generator results soup:

            There are a couple of different ways of assigning additional values to the different generators. The easiest would be to have a dictionary keyed by the generator or an iterable of the same length containing the values. Both approaches are shown here:

            \n

            Iterable

            \n
            v = (item for item in propadd if item[0]==row1[8] and harversine(custx,custy,item[2],item[3])<1500)\nk = (item for item in custadd if item[0]==row1[4])\nm = (item for item in numlist if re.search(r"^[0-9]+(?=\s)",row1[0]) is not None and item[0]==re.search(r"^[0-9]+(?=\s)",row1[0]).group())\nextraValues = ('value 1', 'value 2', 'value3')\nfor ind, gen in enumerate((v, k, m)):\n    l = list(gen) \n    if len(l) == 1:\n        row1[1] = l[0][1]\n        row1[2] = l[0][2]\n        row1[3] = extraValues[ind]\n        break\n
            \n

            Dictionary

            \n
            v = (item for item in propadd if item[0]==row1[8] and harversine(custx,custy,item[2],item[3])<1500)\nk = (item for item in custadd if item[0]==row1[4])\nm = (item for item in numlist if re.search(r"^[0-9]+(?=\s)",row1[0]) is not None and item[0]==re.search(r"^[0-9]+(?=\s)",row1[0]).group())\nextraValues = {v: 'value 1',\n               k: 'value 2',\n               m: 'value3')\nfor gen in (v, k, m):\n    l = list(gen) \n    if len(l) == 1:\n        row1[1] = l[0][1]\n        row1[2] = l[0][2]\n        row1[3] = extraValues[gen]\n        break\n
            \n

            You could also have some complex scenario where the extra value could be generated by some function other than a dictionary lookup or tuple index.

            \n soup wrap:

            There are a couple of different ways of assigning additional values to the different generators. The easiest would be to have a dictionary keyed by the generator or an iterable of the same length containing the values. Both approaches are shown here:

            Iterable

            v = (item for item in propadd if item[0]==row1[8] and harversine(custx,custy,item[2],item[3])<1500)
            k = (item for item in custadd if item[0]==row1[4])
            m = (item for item in numlist if re.search(r"^[0-9]+(?=\s)",row1[0]) is not None and item[0]==re.search(r"^[0-9]+(?=\s)",row1[0]).group())
            extraValues = ('value 1', 'value 2', 'value3')
            for ind, gen in enumerate((v, k, m)):
                l = list(gen) 
                if len(l) == 1:
                    row1[1] = l[0][1]
                    row1[2] = l[0][2]
                    row1[3] = extraValues[ind]
                    break
            

            Dictionary

            v = (item for item in propadd if item[0]==row1[8] and harversine(custx,custy,item[2],item[3])<1500)
            k = (item for item in custadd if item[0]==row1[4])
            m = (item for item in numlist if re.search(r"^[0-9]+(?=\s)",row1[0]) is not None and item[0]==re.search(r"^[0-9]+(?=\s)",row1[0]).group())
            extraValues = {v: 'value 1',
                           k: 'value 2',
                           m: 'value3')
            for gen in (v, k, m):
                l = list(gen) 
                if len(l) == 1:
                    row1[1] = l[0][1]
                    row1[2] = l[0][2]
                    row1[3] = extraValues[gen]
                    break
            

            You could also have some complex scenario where the extra value could be generated by some function other than a dictionary lookup or tuple index.

            qid & accept id: (33587404, 33587421) query: How to count how many positions away an element is in a list? soup:

            Use the builtin functions max and list.index

            \n
            >>> list1=[34,5,1,7,5,3,8,512,8,43]\n>>> max_ele = max(list1)\n>>> print(list1.index(max_ele))\n7\n
            \n

            This can be done in a single line as

            \n
            print(list1.index(max(list1)))\n
            \n soup wrap:

            Use the builtin functions max and list.index

            >>> list1=[34,5,1,7,5,3,8,512,8,43]
            >>> max_ele = max(list1)
            >>> print(list1.index(max_ele))
            7
            

            This can be done in a single line as

            print(list1.index(max(list1)))
            
            qid & accept id: (33595664, 33597277) query: Retreiving data from a nested deep.copy dictionary made by list comprehension in Python soup:

            I suposse that keygrid is some kind of matrix in wich each element is a key of the dictionay you are making copies, am I right? Then in the stockgrid[row][col] == keygrid[row][col] what you want to do is to look if the key of keygrid is in stockgrid and if it is true add 1 to that key value.

            \n

            If that is waht you are asking the answer will be:

            \n
            for row in range(HEIGHT):\n    for col in range(WIDTH):\n        if keygrid[row][col] in stockgrid[row][col]:\n           stockgrid[row][col][keygrid[row][col]]+=1\n
            \n

            If inside keygrid you have a list of keys you can do this:

            \n
            for row in range(HEIGHT):\n    for col in range(WIDTH):\n        # Now we select the common keys between the stockgrid and the keygrid\n        common_keys=[key for key in keygrid[row][col] if key in stockgrid[row][col]]\n        # Add one in the common_keys\n        for key in common_keys:\n            stockgrid[row][col][key]+=1\n
            \n soup wrap:

            I suposse that keygrid is some kind of matrix in wich each element is a key of the dictionay you are making copies, am I right? Then in the stockgrid[row][col] == keygrid[row][col] what you want to do is to look if the key of keygrid is in stockgrid and if it is true add 1 to that key value.

            If that is waht you are asking the answer will be:

            for row in range(HEIGHT):
                for col in range(WIDTH):
                    if keygrid[row][col] in stockgrid[row][col]:
                       stockgrid[row][col][keygrid[row][col]]+=1
            

            If inside keygrid you have a list of keys you can do this:

            for row in range(HEIGHT):
                for col in range(WIDTH):
                    # Now we select the common keys between the stockgrid and the keygrid
                    common_keys=[key for key in keygrid[row][col] if key in stockgrid[row][col]]
                    # Add one in the common_keys
                    for key in common_keys:
                        stockgrid[row][col][key]+=1
            
            qid & accept id: (33607716, 33752257) query: Print two report in Odoo8 soup:

            To do this you need to handle js code.

            \n

            In your form view add the following code:

            \n
            \n

            Then we need to generate the requested reports from js using python code whithout using :

            \n
            return {\n        'type': 'ir.actions.report.xml',\n        'report_name': 'my_report',\n        'datas': datas,\n        'nodestroy': True\n    }\n
            \n

            For that your method `print_reports' should be:

            \n
            def print_reports(self, cr, uid, ids, context):\n    """DO NOT EDIT !"""\n
            \n

            This is needed to catch button click event from js code.

            \n

            In your js script /modulename/static/js/script.js do this:

            \n
            openerp.MODULENAME=function(instance)\n{\n\n    var QWEB=instance.web.qweb,_t=instance.web._t;\n    instance.web.DataSet.include({\n        call_button:function(method, args){\n            var id = args[0];\n            if(String(method)=='print_reports'){\n                //get_reports should be created in modele_name class\n                new instance.web.Model('modele_name',this.get_context()).ca    ll('get_reports',[id],{context:this.get_context()}).done(function(reports){\n                    for(var b=0; b
            \n

            we call python method get_reports from js and the result is reports as 64 base string
            \nThe get_reports method generate and format reports before sending \nthem to js.
            \nDo it as bellow:

            \n
            def get_reports(self, cr, uid, ids, context):\n    #Get datas used in the reports from modele\n\n    datas = {\n        'ids': ids,\n        'model': 'modele_name',\n        'form': {\n            'key': value,\n             ...\n        }\n    }\n    pdf1 = self.pool.get('ir.actions.report.xml').render_report(cr,\n                                                                uid,\n                                                                ids,\n                                                                "report_name1",\n                                                                datas,\n                                                                context=None)\n\n\n    pdf2 = self.pool.get('ir.actions.report.xml').render_report(cr,\n                                                                uid,\n                                                                ids,\n                                                                "report_name2",\n                                                                datas,\n                                                                context=None)\n    ...\n\n    #We send 'report naem ' to name downloaded report ('reports[b+1]+'.pdf')\n    return pdf1[0].encode('base64'), 'report_name1', pdf2[0].encode(\n        'base64'), 'report_name2',...\n
            \n

            values can be used in RML code like this: datas['form'][key].

            \n

            To download them, i used download.js script,\nThere are many ways to do that, but i found that download.js is the easiest way.
            \nThe content of download.js:

            \n
            //download.js v3.0, by dandavis; 2008-2014. [CCBY2] see     http://danml.com/download.html for tests/usage\n// v1 landed a FF+Chrome compat way of downloading strings to     local un-named files, upgraded to use a hidden frame and     optional mime\n// v2 added named files via a[download], msSaveBlob, IE (10+)     support, and window.URL support for larger+faster saves     than dataURLs\n// v3 added dataURL and Blob Input, bind-toggle arity, and     legacy dataURL fallback was improved with force-download mime and     base64 support\n\n// data can be a string, Blob, File, or dataURL\n\n\n\n\nfunction download(data, strFileName, strMimeType) {\n    var self = window, // this script is only for browsers     anyway...\n        u = "application/octet-stream", // this default mime     also triggers iframe downloads\n        m = strMimeType || u, \n        x = data,\n        D = document,\n        a = D.createElement("a"),\n        z = function(a){return String(a);},\n\n\n        B = self.Blob || self.MozBlob || self.WebKitBlob || z,\n        BB = self.MSBlobBuilder || self.WebKitBlobBuilder ||     self.BlobBuilder,\n        fn = strFileName || "download",\n        blob, \n        b,\n        ua,\n        fr;\n\n    //if(typeof B.bind === 'function' ){ B=B.bind(self); }\n\n    if(String(this)==="true"){ //reverse arguments, allowing     download.bind(true, "text/xml", "export.xml") to act as a callback\n        x=[x, m];\n        m=x[0];\n        x=x[1]; \n    }\n\n\n\n    //go ahead and download dataURLs right away\n    if(String(x).match(/^data\:[\w+\-]+\/[\w+\-]+[,;]/)){\n        return navigator.msSaveBlob ?  // IE10 can't do a[download], only Blobs:\n            navigator.msSaveBlob(d2b(x), fn) : \n            saver(x) ; // everyone else can save dataURLs un-processed\n    }//end if dataURL passed?\n\n    try{\n\n        blob = x instanceof B ? \n            x : \n            new B([x], {type: m}) ;\n    }catch(y){\n        if(BB){\n            b = new BB();\n            b.append([x]);\n            blob = b.getBlob(m); // the blob\n        }\n\n    }\n\n\n\n    function d2b(u) {\n        var p= u.split(/[:;,]/),\n        t= p[1],\n        dec= p[2] == "base64" ? atob : decodeURIComponent,\n        bin= dec(p.pop()),\n        mx= bin.length,\n        i= 0,\n        uia= new Uint8Array(mx);\n\n        for(i;i
            \n

            To be able to call download method in your script you must load download.js file, to do that modify MODULENAME_view.xml to add a new line

            \n
            \n\n    \n        \n    \n \n
            \n soup wrap:

            To do this you need to handle js code.

            In your form view add the following code:

            Then we need to generate the requested reports from js using python code whithout using :

            return {
                    'type': 'ir.actions.report.xml',
                    'report_name': 'my_report',
                    'datas': datas,
                    'nodestroy': True
                }
            

            For that your method `print_reports' should be:

            def print_reports(self, cr, uid, ids, context):
                """DO NOT EDIT !"""
            

            This is needed to catch button click event from js code.

            In your js script /modulename/static/js/script.js do this:

            openerp.MODULENAME=function(instance)
            {
            
                var QWEB=instance.web.qweb,_t=instance.web._t;
                instance.web.DataSet.include({
                    call_button:function(method, args){
                        var id = args[0];
                        if(String(method)=='print_reports'){
                            //get_reports should be created in modele_name class
                            new instance.web.Model('modele_name',this.get_context()).ca    ll('get_reports',[id],{context:this.get_context()}).done(function(reports){
                                for(var b=0; b

            we call python method get_reports from js and the result is reports as 64 base string
            The get_reports method generate and format reports before sending them to js.
            Do it as bellow:

            def get_reports(self, cr, uid, ids, context):
                #Get datas used in the reports from modele
            
                datas = {
                    'ids': ids,
                    'model': 'modele_name',
                    'form': {
                        'key': value,
                         ...
                    }
                }
                pdf1 = self.pool.get('ir.actions.report.xml').render_report(cr,
                                                                            uid,
                                                                            ids,
                                                                            "report_name1",
                                                                            datas,
                                                                            context=None)
            
            
                pdf2 = self.pool.get('ir.actions.report.xml').render_report(cr,
                                                                            uid,
                                                                            ids,
                                                                            "report_name2",
                                                                            datas,
                                                                            context=None)
                ...
            
                #We send 'report naem ' to name downloaded report ('reports[b+1]+'.pdf')
                return pdf1[0].encode('base64'), 'report_name1', pdf2[0].encode(
                    'base64'), 'report_name2',...
            

            values can be used in RML code like this: datas['form'][key].

            To download them, i used download.js script, There are many ways to do that, but i found that download.js is the easiest way.
            The content of download.js:

            //download.js v3.0, by dandavis; 2008-2014. [CCBY2] see     http://danml.com/download.html for tests/usage
            // v1 landed a FF+Chrome compat way of downloading strings to     local un-named files, upgraded to use a hidden frame and     optional mime
            // v2 added named files via a[download], msSaveBlob, IE (10+)     support, and window.URL support for larger+faster saves     than dataURLs
            // v3 added dataURL and Blob Input, bind-toggle arity, and     legacy dataURL fallback was improved with force-download mime and     base64 support
            
            // data can be a string, Blob, File, or dataURL
            
            
            
            
            function download(data, strFileName, strMimeType) {
                var self = window, // this script is only for browsers     anyway...
                    u = "application/octet-stream", // this default mime     also triggers iframe downloads
                    m = strMimeType || u, 
                    x = data,
                    D = document,
                    a = D.createElement("a"),
                    z = function(a){return String(a);},
            
            
                    B = self.Blob || self.MozBlob || self.WebKitBlob || z,
                    BB = self.MSBlobBuilder || self.WebKitBlobBuilder ||     self.BlobBuilder,
                    fn = strFileName || "download",
                    blob, 
                    b,
                    ua,
                    fr;
            
                //if(typeof B.bind === 'function' ){ B=B.bind(self); }
            
                if(String(this)==="true"){ //reverse arguments, allowing     download.bind(true, "text/xml", "export.xml") to act as a callback
                    x=[x, m];
                    m=x[0];
                    x=x[1]; 
                }
            
            
            
                //go ahead and download dataURLs right away
                if(String(x).match(/^data\:[\w+\-]+\/[\w+\-]+[,;]/)){
                    return navigator.msSaveBlob ?  // IE10 can't do a[download], only Blobs:
                        navigator.msSaveBlob(d2b(x), fn) : 
                        saver(x) ; // everyone else can save dataURLs un-processed
                }//end if dataURL passed?
            
                try{
            
                    blob = x instanceof B ? 
                        x : 
                        new B([x], {type: m}) ;
                }catch(y){
                    if(BB){
                        b = new BB();
                        b.append([x]);
                        blob = b.getBlob(m); // the blob
                    }
            
                }
            
            
            
                function d2b(u) {
                    var p= u.split(/[:;,]/),
                    t= p[1],
                    dec= p[2] == "base64" ? atob : decodeURIComponent,
                    bin= dec(p.pop()),
                    mx= bin.length,
                    i= 0,
                    uia= new Uint8Array(mx);
            
                    for(i;i

            To be able to call download method in your script you must load download.js file, to do that modify MODULENAME_view.xml to add a new line

            
            
                
                    
                
             
            
            qid & accept id: (33613596, 33614106) query: Pandas dataframe - transform column values into individual columns soup:

            This where I like to use multi-level indexes and stack/unstack.

            \n

            So here, I'd do:

            \n
            from io import StringIO\nimport pandas\n\ndatacsv = StringIO("""\\nXY UV  BC   Val\ny  u    c    11\ny  u    b    22\ny  v    c    33\ny  v    b    44\nx  u    c    111\nx  u    b    222\nx  v    c    333\nx  v    b    444\n""")\ndf = pandas.read_csv(datacsv, sep='\s+')\ndf.set_index(['XY', 'UV', 'BC']).unstack(level='BC')\n
            \n

            Which gives us:

            \n
                   Val     \nBC       b    c\nXY UV          \nx  u   222  111\n   v   444  333\ny  u    22   11\n   v    44   33\n
            \n

            So we have MultiIndexes on both the rows and columns. Assuming you don't want that, I would just do:

            \n
            xtab = (df.set_index(['XY', 'UV', 'BC'])\n          .unstack(level='BC')['Val']\n          .reset_index())\n
            \n

            And that'll give you:

            \n
            BC XY UV    b    c\n0   x  u  222  111\n1   x  v  444  333\n2   y  u   22   11\n3   y  v   44   33\n
            \n soup wrap:

            This where I like to use multi-level indexes and stack/unstack.

            So here, I'd do:

            from io import StringIO
            import pandas
            
            datacsv = StringIO("""\
            XY UV  BC   Val
            y  u    c    11
            y  u    b    22
            y  v    c    33
            y  v    b    44
            x  u    c    111
            x  u    b    222
            x  v    c    333
            x  v    b    444
            """)
            df = pandas.read_csv(datacsv, sep='\s+')
            df.set_index(['XY', 'UV', 'BC']).unstack(level='BC')
            

            Which gives us:

                   Val     
            BC       b    c
            XY UV          
            x  u   222  111
               v   444  333
            y  u    22   11
               v    44   33
            

            So we have MultiIndexes on both the rows and columns. Assuming you don't want that, I would just do:

            xtab = (df.set_index(['XY', 'UV', 'BC'])
                      .unstack(level='BC')['Val']
                      .reset_index())
            

            And that'll give you:

            BC XY UV    b    c
            0   x  u  222  111
            1   x  v  444  333
            2   y  u   22   11
            3   y  v   44   33
            
            qid & accept id: (33705180, 33705734) query: how to exclude the non numerical integers from a data frame in Python soup:

            We could use ._get_numeric_data()

            \n
            import pandas as pd #import the pandas library\n#creating a small dataset for testing\ndf1 = pd.DataFrame({'PassengerId' :  [1, 2, 3], \n        'Name' : ['Abbing, Mr. Anthony', 'Ann, C', 'John, H'], \n        'Fare' : [7.25, 71.28, 7.92]})\n#extract only the numeric column types\ndf2 = df1._get_numeric_data()\nprint(df2)\n
            \n

            Or another option is select_dtypes()

            \n
            df3 = df1.select_dtypes(include = ['int64', 'float64'])\nprint(df3)\n
            \n soup wrap:

            We could use ._get_numeric_data()

            import pandas as pd #import the pandas library
            #creating a small dataset for testing
            df1 = pd.DataFrame({'PassengerId' :  [1, 2, 3], 
                    'Name' : ['Abbing, Mr. Anthony', 'Ann, C', 'John, H'], 
                    'Fare' : [7.25, 71.28, 7.92]})
            #extract only the numeric column types
            df2 = df1._get_numeric_data()
            print(df2)
            

            Or another option is select_dtypes()

            df3 = df1.select_dtypes(include = ['int64', 'float64'])
            print(df3)
            
            qid & accept id: (33706780, 33706923) query: How to read folder structure and assign it to datastructure? soup:

            I assume you'd rather get this kind of structure:

            \n
            files = {folder1: [file1, file2], folder2: [file3], ...}\n
            \n

            The following code will do the trick:

            \n
            import os\n\nrootDir = '.'\nfiles = {}\nfor dirName, subdirList, fileList in os.walk(rootDir):\n    files[dirName] = fileList\n
            \n soup wrap:

            I assume you'd rather get this kind of structure:

            files = {folder1: [file1, file2], folder2: [file3], ...}
            

            The following code will do the trick:

            import os
            
            rootDir = '.'
            files = {}
            for dirName, subdirList, fileList in os.walk(rootDir):
                files[dirName] = fileList
            
            qid & accept id: (33711511, 33711533) query: Rotate photo via python soup:

            It looks like it returns a new image, so you'll want something like

            \n
            from PIL import Image\n\nabs_img_src = 'test.png'\n\npill_img = Image.open(abs_img_src)\npill.show()\n\nrotated_img = pill_img.rotate(90)\nrotated_img.show()\n
            \n

            If we let our 'test.png' be the Python logo

            \n

            Python Logo

            \n
            pill.show()\n
            \n

            will output

            \n

            Python Logo using pillow

            \n
            rotated_img = pill_img.rotate(90)\nrotated_img.show()\n
            \n

            will result in

            \n

            Python Logo rotated with pillow

            \n

            Just to double check, let's now call im.show() after the im.rotate(90):

            \n

            enter code here

            \n

            Sure enough, we get the result we expected -- i.e., im.rotate does not mutate im, but rather it returns a new rotated Image.

            \n soup wrap:

            It looks like it returns a new image, so you'll want something like

            from PIL import Image
            
            abs_img_src = 'test.png'
            
            pill_img = Image.open(abs_img_src)
            pill.show()
            
            rotated_img = pill_img.rotate(90)
            rotated_img.show()
            

            If we let our 'test.png' be the Python logo

            Python Logo

            pill.show()
            

            will output

            Python Logo using pillow

            rotated_img = pill_img.rotate(90)
            rotated_img.show()
            

            will result in

            Python Logo rotated with pillow

            Just to double check, let's now call im.show() after the im.rotate(90):

            enter code here

            Sure enough, we get the result we expected -- i.e., im.rotate does not mutate im, but rather it returns a new rotated Image.

            qid & accept id: (33729888, 33730392) query: Merge every Every 6 dictionary into single dictionary of List soup:

            Notes:

            \n
              \n
            • using python generator function method to iterate over the list
            • \n
            • Having temporary variables to count and store temporary data
            • \n
            • using dictionary update and get methods
            • \n
            \n

            Code:

            \n
            lst=[\n{'field_id': u'36908'},{'field_name': u'Code'},{'field_value': u'900321'},\n{'field_id': u'36909'},{'field_name': u'Description'}, {'field_value': u'TIG 2.4MM TUNGSTEN (EACH ROD)'},\n{'field_id': u'36910'}, {'field_name': u'Quantity'}, {'field_value': u'2'},\n{'field_id': u'36911'}, {'field_name': u'Price'}, {'field_value': u'21.00'},\n{'field_id': u'36912'}, {'field_name': u'Line Total'}, {'field_value': u'42.00'},\n{'field_id': u'36908'}, {'field_name': u'Code'}, {'field_value': u'92.01.15.08'},\n{'field_id': u'36909'}, {'field_name': u'Description'}, {'field_value': u'BINZEL .8MM MIG TIPS MB15'},\n{'field_id': u'36910'}, {'field_name': u'Quantity'}, {'field_value': u'6'},\n{'field_id': u'36911'}, {'field_name': u'Price'}, {'field_value': u'2.60'},\n{'field_id': u'36912'}, {'field_name': u'Line Total'}, {'field_value': u'15.60'}]\n\nnew_lst=[] # List to save output\ndic={} # Temporary dictionary to create output dictionary \ncount=0 # Count variable to count the list element\ndef iterating_list(lst): # Function to iterate over list\n    for value in lst:\n        yield value\niterating=iterating_list(lst)\n\nfor value in iterating :\n    if value.get('field_name'): # If `field_name` matches in the given lists \n    #By default get method return `None` when there is no given key\n        dic.update({value.get('field_name'):next(iterating).get('field_value')})\n        count+=1\n    if count==5: # Resetting when count reaches to 5 \n        count=0\n        new_lst.append(dic)\n        dic={}\nprint new_lst\n
            \n

            Output:

            \n
            [{u'Price': u'21.00', u'Code': u'900321', u'Description': u'TIG 2.4MM TUNGSTEN (EACH ROD)', u'Line Total': u'42.00', u'Quantity': u'2'},\n{u'Price': u'2.60', u'Code': u'92.01.15.08', u'Description': u'BINZEL .8MM MIG TIPS MB15', u'Line Total': u'15.60', u'Quantity': u'6'}]\n
            \n soup wrap:

            Notes:

            • using python generator function method to iterate over the list
            • Having temporary variables to count and store temporary data
            • using dictionary update and get methods

            Code:

            lst=[
            {'field_id': u'36908'},{'field_name': u'Code'},{'field_value': u'900321'},
            {'field_id': u'36909'},{'field_name': u'Description'}, {'field_value': u'TIG 2.4MM TUNGSTEN (EACH ROD)'},
            {'field_id': u'36910'}, {'field_name': u'Quantity'}, {'field_value': u'2'},
            {'field_id': u'36911'}, {'field_name': u'Price'}, {'field_value': u'21.00'},
            {'field_id': u'36912'}, {'field_name': u'Line Total'}, {'field_value': u'42.00'},
            {'field_id': u'36908'}, {'field_name': u'Code'}, {'field_value': u'92.01.15.08'},
            {'field_id': u'36909'}, {'field_name': u'Description'}, {'field_value': u'BINZEL .8MM MIG TIPS MB15'},
            {'field_id': u'36910'}, {'field_name': u'Quantity'}, {'field_value': u'6'},
            {'field_id': u'36911'}, {'field_name': u'Price'}, {'field_value': u'2.60'},
            {'field_id': u'36912'}, {'field_name': u'Line Total'}, {'field_value': u'15.60'}]
            
            new_lst=[] # List to save output
            dic={} # Temporary dictionary to create output dictionary 
            count=0 # Count variable to count the list element
            def iterating_list(lst): # Function to iterate over list
                for value in lst:
                    yield value
            iterating=iterating_list(lst)
            
            for value in iterating :
                if value.get('field_name'): # If `field_name` matches in the given lists 
                #By default get method return `None` when there is no given key
                    dic.update({value.get('field_name'):next(iterating).get('field_value')})
                    count+=1
                if count==5: # Resetting when count reaches to 5 
                    count=0
                    new_lst.append(dic)
                    dic={}
            print new_lst
            

            Output:

            [{u'Price': u'21.00', u'Code': u'900321', u'Description': u'TIG 2.4MM TUNGSTEN (EACH ROD)', u'Line Total': u'42.00', u'Quantity': u'2'},
            {u'Price': u'2.60', u'Code': u'92.01.15.08', u'Description': u'BINZEL .8MM MIG TIPS MB15', u'Line Total': u'15.60', u'Quantity': u'6'}]
            
            qid & accept id: (33733189, 33737362) query: Stiff ODE-solver soup:

            I'm seeing something similar; with the 'vode' solver, changing methods between 'adams' and 'bdf' doesn't change the number of steps by very much. (By the way, there is no point in using order=15; the maximum order of the 'bdf' method of the 'vode' solver is 5 (and the maximum order of the 'adams' solver is 12). If you leave the argument out, it should use the maximum by default.)

            \n

            odeint is a wrapper of LSODA. ode also provides a wrapper of LSODA:\nchange 'vode' to 'lsoda'. Unfortunately the 'lsoda' solver ignores\nthe step=True argument of the integrate method.

            \n

            The 'lsoda' solver does much better than 'vode' with method='bdf'.\nYou can get an upper bound on\nthe number of steps that were used by initializing tvals = [],\nand in func, do tvals.append(t). When the solver completes, set\ntvals = np.unique(tvals). The length of tvals tells you the\nnumber of time values at which your function was evaluated.\nThis is not exactly what you want, but it does show a huge difference\nbetween using the 'lsoda' solver and the 'vode' solver with\nmethod 'bdf'. The number of steps used by the 'lsoda' solver is\non the same order as you quoted for matlab in your comment. (I used mu=10000, tf = 10.)

            \n

            Update: It turns out that, at least for a stiff problem, it make a huge difference for the 'vode' solver if you provide a function to compute the Jacobian matrix.

            \n

            The script below runs the 'vode' solver with both methods, and it\nruns the 'lsoda' solver. In each case, it runs the solver with and without the Jacobian function. Here's the output it generates:

            \n
            vode   adams    jac=None  len(tvals) = 517992\nvode   adams    jac=jac   len(tvals) = 195\nvode   bdf      jac=None  len(tvals) = 516284\nvode   bdf      jac=jac   len(tvals) = 55\nlsoda           jac=None  len(tvals) = 49\nlsoda           jac=jac   len(tvals) = 49\n
            \n

            The script:

            \n
            from __future__ import print_function\n\nimport numpy as np\nfrom scipy.integrate import ode\n\n\ndef func(t, u, mu):\n    tvals.append(t)\n    u1 = u[1]\n    u2 = mu*(1 - u[0]*u[0])*u[1] - u[0]\n    return np.array([u1, u2])\n\n\ndef jac(t, u, mu):\n    j = np.empty((2, 2))\n    j[0, 0] = 0.0\n    j[0, 1] = 1.0\n    j[1, 0] = -mu*2*u[0]*u[1] - 1\n    j[1, 1] = mu*(1 - u[0]*u[0])\n    return j\n\n\nmu = 10000.0\nu0 = [2, 0]\nt0 = 0.0\ntf = 10\n\nfor name, kwargs in [('vode', dict(method='adams')),\n                     ('vode', dict(method='bdf')),\n                     ('lsoda', {})]:\n    for j in [None, jac]:\n        solver = ode(func, jac=j)\n        solver.set_integrator(name, atol=1e-8, rtol=1e-6, **kwargs)\n        solver.set_f_params(mu)\n        solver.set_jac_params(mu)\n        solver.set_initial_value(u0, t0)\n\n        tvals = []\n        i = 0\n        while solver.successful() and solver.t < tf:\n            solver.integrate(tf, step=True)\n            i += 1\n\n        print("%-6s %-8s jac=%-5s " %\n              (name, kwargs.get('method', ''), j.func_name if j else None),\n              end='')\n\n        tvals = np.unique(tvals)\n        print("len(tvals) =", len(tvals))\n
            \n soup wrap:

            I'm seeing something similar; with the 'vode' solver, changing methods between 'adams' and 'bdf' doesn't change the number of steps by very much. (By the way, there is no point in using order=15; the maximum order of the 'bdf' method of the 'vode' solver is 5 (and the maximum order of the 'adams' solver is 12). If you leave the argument out, it should use the maximum by default.)

            odeint is a wrapper of LSODA. ode also provides a wrapper of LSODA: change 'vode' to 'lsoda'. Unfortunately the 'lsoda' solver ignores the step=True argument of the integrate method.

            The 'lsoda' solver does much better than 'vode' with method='bdf'. You can get an upper bound on the number of steps that were used by initializing tvals = [], and in func, do tvals.append(t). When the solver completes, set tvals = np.unique(tvals). The length of tvals tells you the number of time values at which your function was evaluated. This is not exactly what you want, but it does show a huge difference between using the 'lsoda' solver and the 'vode' solver with method 'bdf'. The number of steps used by the 'lsoda' solver is on the same order as you quoted for matlab in your comment. (I used mu=10000, tf = 10.)

            Update: It turns out that, at least for a stiff problem, it make a huge difference for the 'vode' solver if you provide a function to compute the Jacobian matrix.

            The script below runs the 'vode' solver with both methods, and it runs the 'lsoda' solver. In each case, it runs the solver with and without the Jacobian function. Here's the output it generates:

            vode   adams    jac=None  len(tvals) = 517992
            vode   adams    jac=jac   len(tvals) = 195
            vode   bdf      jac=None  len(tvals) = 516284
            vode   bdf      jac=jac   len(tvals) = 55
            lsoda           jac=None  len(tvals) = 49
            lsoda           jac=jac   len(tvals) = 49
            

            The script:

            from __future__ import print_function
            
            import numpy as np
            from scipy.integrate import ode
            
            
            def func(t, u, mu):
                tvals.append(t)
                u1 = u[1]
                u2 = mu*(1 - u[0]*u[0])*u[1] - u[0]
                return np.array([u1, u2])
            
            
            def jac(t, u, mu):
                j = np.empty((2, 2))
                j[0, 0] = 0.0
                j[0, 1] = 1.0
                j[1, 0] = -mu*2*u[0]*u[1] - 1
                j[1, 1] = mu*(1 - u[0]*u[0])
                return j
            
            
            mu = 10000.0
            u0 = [2, 0]
            t0 = 0.0
            tf = 10
            
            for name, kwargs in [('vode', dict(method='adams')),
                                 ('vode', dict(method='bdf')),
                                 ('lsoda', {})]:
                for j in [None, jac]:
                    solver = ode(func, jac=j)
                    solver.set_integrator(name, atol=1e-8, rtol=1e-6, **kwargs)
                    solver.set_f_params(mu)
                    solver.set_jac_params(mu)
                    solver.set_initial_value(u0, t0)
            
                    tvals = []
                    i = 0
                    while solver.successful() and solver.t < tf:
                        solver.integrate(tf, step=True)
                        i += 1
            
                    print("%-6s %-8s jac=%-5s " %
                          (name, kwargs.get('method', ''), j.func_name if j else None),
                          end='')
            
                    tvals = np.unique(tvals)
                    print("len(tvals) =", len(tvals))
            
            qid & accept id: (33759539, 33759639) query: For loop syntax in Python without using range() or xrange() soup:

            This is what list slicing is about, you can take part of your list from i'th element through

            \n
            lst[i:]\n
            \n

            furthermore, in order to have both index and value you need enumerate operation, which changes the list into list of pairs (index, value)

            \n

            thus

            \n
            for ind, i in enumerate(lst):\n    for j in lst[ind+1: ]: \n        #Do Something\n
            \n soup wrap:

            This is what list slicing is about, you can take part of your list from i'th element through

            lst[i:]
            

            furthermore, in order to have both index and value you need enumerate operation, which changes the list into list of pairs (index, value)

            thus

            for ind, i in enumerate(lst):
                for j in lst[ind+1: ]: 
                    #Do Something
            
            qid & accept id: (33795081, 33804233) query: Python Tkinter GUI Frame: How to call a class method from inside a function of another class? soup:

            show_frame is a method on the controller. You simply need to save a reference to the controller, and call it from anywhere. This is the entire purpose of the controller class - to control access to the other windows.

            \n

            The first step is to modify your classes to save a reference to the controller:

            \n
            class Login(tk.Frame):\n    def __init__(self, parent, controller):\n        self.controller = controller\n        ...\n\nclass WelcomePage(tk.Frame):\n    def __init__(self, parent, controller):\n        self.controller = controller\n        ...\n
            \n

            Now, you can call show_frame wherever you want:

            \n
            if actNum == act_num and pinNum == pin_num:\n    ...\n    self.controller.show_frame(WelcomePage)\n    ...\n
            \n

            For more information on the controller see this answer: https://stackoverflow.com/a/32865334/7432

            \n soup wrap:

            show_frame is a method on the controller. You simply need to save a reference to the controller, and call it from anywhere. This is the entire purpose of the controller class - to control access to the other windows.

            The first step is to modify your classes to save a reference to the controller:

            class Login(tk.Frame):
                def __init__(self, parent, controller):
                    self.controller = controller
                    ...
            
            class WelcomePage(tk.Frame):
                def __init__(self, parent, controller):
                    self.controller = controller
                    ...
            

            Now, you can call show_frame wherever you want:

            if actNum == act_num and pinNum == pin_num:
                ...
                self.controller.show_frame(WelcomePage)
                ...
            

            For more information on the controller see this answer: https://stackoverflow.com/a/32865334/7432

            qid & accept id: (33800210, 33800280) query: numpy get values in array of arrays of arrays for array of indices soup:

            Basically, you are selecting the second axis elements with indices_array corresponding to each position along the first axis for all the elements along the third axis. As such, you can do -

            \n
            list_arr[np.arange(list_arr.shape[0]),indices_array,:]\n
            \n

            Sample run -

            \n
            In [16]: list_arr\nOut[16]: \narray([[[ 1,  2,  3],\n        [ 4,  5,  6],\n        [ 7,  8,  9]],\n\n       [[10, 20, 30],\n        [40, 50, 60],\n        [70, 80, 90]],\n\n       [[15, 25, 35],\n        [45, 55, 65],\n        [75, 85, 95]]])\n\nIn [17]: indices_array\nOut[17]: array([1, 0, 2])\n\nIn [18]: list_arr[np.arange(list_arr.shape[0]),indices_array,:]\nOut[18]: \narray([[ 4,  5,  6],\n       [10, 20, 30],\n       [75, 85, 95]])\n
            \n soup wrap:

            Basically, you are selecting the second axis elements with indices_array corresponding to each position along the first axis for all the elements along the third axis. As such, you can do -

            list_arr[np.arange(list_arr.shape[0]),indices_array,:]
            

            Sample run -

            In [16]: list_arr
            Out[16]: 
            array([[[ 1,  2,  3],
                    [ 4,  5,  6],
                    [ 7,  8,  9]],
            
                   [[10, 20, 30],
                    [40, 50, 60],
                    [70, 80, 90]],
            
                   [[15, 25, 35],
                    [45, 55, 65],
                    [75, 85, 95]]])
            
            In [17]: indices_array
            Out[17]: array([1, 0, 2])
            
            In [18]: list_arr[np.arange(list_arr.shape[0]),indices_array,:]
            Out[18]: 
            array([[ 4,  5,  6],
                   [10, 20, 30],
                   [75, 85, 95]])
            
            qid & accept id: (33811240, 33811370) query: Python — Randomly fill 2D array with set number of 1's soup:

            Either shuffle a list of 16 1s and 48 0s:

            \n
            board = [1]*16 + 48*[0]\nrandom.shuffle(board)\nboard = [board[i:i+8] for i in xrange(0, 64, 8)]\n
            \n

            or fill the board with 0s and pick a random sample of 16 positions to put 1s in:

            \n
            board = [[0]*8 for i in xrange(8)]\nfor pos in random.sample(xrange(64), 16):\n    board[pos//8][pos%8] = 1\n
            \n soup wrap:

            Either shuffle a list of 16 1s and 48 0s:

            board = [1]*16 + 48*[0]
            random.shuffle(board)
            board = [board[i:i+8] for i in xrange(0, 64, 8)]
            

            or fill the board with 0s and pick a random sample of 16 positions to put 1s in:

            board = [[0]*8 for i in xrange(8)]
            for pos in random.sample(xrange(64), 16):
                board[pos//8][pos%8] = 1
            
            qid & accept id: (33857555, 33970685) query: Integrating a vector field (a numpy array) using scipy.integrate soup:

            I was going to suggest matplotlib.pyplot.streamplot which supports the keyword argument start_points as of version 1.5.0, however it's not practical and also very inaccurate.

            \n

            Your code examples are a bit confusing to me: if you have vx, vy vector field coordinates, then you should have two meshes: x and y. Using these you can indeed use scipy.interpolate.griddata to obtain a smooth vector field for integration, however that seemed to eat up too much memory when I tried to do that. Here's a similar solution based on scipy.interpolate.interp2d:

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\nimport scipy.interpolate as interp\nimport scipy.integrate as integrate\n\n#dummy input from the streamplot demo\ny, x = np.mgrid[-3:3:100j, -3:3:100j]\nvx = -1 - x**2 + y\nvy = 1 + x - y**2\n\n#dfun = lambda x,y: [interp.griddata((x,y),vx,np.array([[x,y]])), interp.griddata((x,y),vy,np.array([[x,y]]))]\ndfunx = interp.interp2d(x[:],y[:],vx[:])\ndfuny = interp.interp2d(x[:],y[:],vy[:])\ndfun = lambda xy,t: [dfunx(xy[0],xy[1])[0], dfuny(xy[0],xy[1])[0]]\n\np0 = (0.5,0.5)\ndt = 0.01\nt0 = 0\nt1 = 1\nt = np.arange(t0,t1+dt,dt)\n\nstreamline=integrate.odeint(dfun,p0,t)\n\n#plot it\nplt.figure()\nplt.plot(streamline[:,0],streamline[:,1])\nplt.axis('equal')\nmymask = (streamline[:,0].min()*0.9<=x) & (x<=streamline[:,0].max()*1.1) & (streamline[:,1].min()*0.9<=y) & (y<=streamline[:,1].max()*1.1)\nplt.quiver(x[mymask],y[mymask],vx[mymask],vy[mymask])\nplt.show()\n
            \n

            Note that I made the integration mesh more dense for additional precision, but it didn't change much in this case.

            \n

            Result:

            \n

            output

            \n

            Update

            \n

            After some notes in comments I revisited my original griddata-based approach. The reason for this was that while interp2d computes an interpolant for the entire data grid, griddata only computes the interpolating value at the points given to it, so in case of a few points the latter should be much faster.

            \n

            I fixed the bugs in my earlier griddata attempt and came up with

            \n
            xyarr = np.array(zip(x.flatten(),y.flatten()))\ndfun = lambda p,t: [interp.griddata(xyarr,vx.flatten(),np.array([p]))[0], interp.griddata(xyarr,vy.flatten(),np.array([p]))[0]]\n
            \n

            which is compatible with odeint. It computes the interpolated values for each p point given to it by odeint. This solution doesn't consume excessive memory, however it takes much much longer to run with the above parameters. This is probably due to a lot of evaluations of dfun in odeint, much more than what would be evident from the 100 time points given to it as input.

            \n

            However, the resulting streamline is much smoother than the one obtained with interp2d, even though both methods used the default linear interpolation method:

            \n

            improved result

            \n soup wrap:

            I was going to suggest matplotlib.pyplot.streamplot which supports the keyword argument start_points as of version 1.5.0, however it's not practical and also very inaccurate.

            Your code examples are a bit confusing to me: if you have vx, vy vector field coordinates, then you should have two meshes: x and y. Using these you can indeed use scipy.interpolate.griddata to obtain a smooth vector field for integration, however that seemed to eat up too much memory when I tried to do that. Here's a similar solution based on scipy.interpolate.interp2d:

            import numpy as np
            import matplotlib.pyplot as plt
            import scipy.interpolate as interp
            import scipy.integrate as integrate
            
            #dummy input from the streamplot demo
            y, x = np.mgrid[-3:3:100j, -3:3:100j]
            vx = -1 - x**2 + y
            vy = 1 + x - y**2
            
            #dfun = lambda x,y: [interp.griddata((x,y),vx,np.array([[x,y]])), interp.griddata((x,y),vy,np.array([[x,y]]))]
            dfunx = interp.interp2d(x[:],y[:],vx[:])
            dfuny = interp.interp2d(x[:],y[:],vy[:])
            dfun = lambda xy,t: [dfunx(xy[0],xy[1])[0], dfuny(xy[0],xy[1])[0]]
            
            p0 = (0.5,0.5)
            dt = 0.01
            t0 = 0
            t1 = 1
            t = np.arange(t0,t1+dt,dt)
            
            streamline=integrate.odeint(dfun,p0,t)
            
            #plot it
            plt.figure()
            plt.plot(streamline[:,0],streamline[:,1])
            plt.axis('equal')
            mymask = (streamline[:,0].min()*0.9<=x) & (x<=streamline[:,0].max()*1.1) & (streamline[:,1].min()*0.9<=y) & (y<=streamline[:,1].max()*1.1)
            plt.quiver(x[mymask],y[mymask],vx[mymask],vy[mymask])
            plt.show()
            

            Note that I made the integration mesh more dense for additional precision, but it didn't change much in this case.

            Result:

            output

            Update

            After some notes in comments I revisited my original griddata-based approach. The reason for this was that while interp2d computes an interpolant for the entire data grid, griddata only computes the interpolating value at the points given to it, so in case of a few points the latter should be much faster.

            I fixed the bugs in my earlier griddata attempt and came up with

            xyarr = np.array(zip(x.flatten(),y.flatten()))
            dfun = lambda p,t: [interp.griddata(xyarr,vx.flatten(),np.array([p]))[0], interp.griddata(xyarr,vy.flatten(),np.array([p]))[0]]
            

            which is compatible with odeint. It computes the interpolated values for each p point given to it by odeint. This solution doesn't consume excessive memory, however it takes much much longer to run with the above parameters. This is probably due to a lot of evaluations of dfun in odeint, much more than what would be evident from the 100 time points given to it as input.

            However, the resulting streamline is much smoother than the one obtained with interp2d, even though both methods used the default linear interpolation method:

            improved result

            qid & accept id: (33869234, 33869742) query: Creating a subplot instead of separate plots soup:

            Did you try

            \n
            fig, axes = plt.subplots(2)\n\nplt.subplot2grid((1,5), (0,0), colspan=3)\n# here plot something\n\nplt.subplot2grid((1,5), (0,3), colspan=2)\n# here plot something\n\nplt.show()\n
            \n

            for example

            \n
            import matplotlib.pyplot as plt\n\nfig, axes = plt.subplots(2)\n\nplt.subplot2grid((1,5), (0,0), colspan=3)\nplt.plot([1,2,3]) # plot something\n\nplt.subplot2grid((1,5), (0,3), colspan=2)\nplt.plot([1,2,1]) # plot something\n\nplt.show()\n
            \n
            \n

            EDIT:

            \n
            import pandas as pd\nimport numpy as np\n\ndef plot_bar(corr_df):\n\n    dfstacked = corr_df.stack().order()\n    dfstacked.plot(kind='bar', rot=60)\n\ndef plot_heatmap(corr_df):\n\n    corr_df = corr_df.fillna(value=0)\n    plt.pcolormesh(corr_df.values, cmap=plt.cm.Blues)\n    plt.yticks(np.arange(0.5, len(corr_df.index), 1), corr_df.index)\n    plt.xticks(np.arange(0.5, len(corr_df.columns), 1), corr_df.columns)\n\n\ndf = pd.DataFrame(range(10))\n\nfig, axes = plt.subplots(2)\n\nplt.subplot2grid((1,5), (0,0), colspan=3)\nplot_bar(df)\n\nplt.subplot2grid((1,5), (0,3), colspan=2)\nplot_heatmap(df)\n\nplt.show()\n
            \n

            http://i.imgur.com/wSiT1UP.png

            \n soup wrap:

            Did you try

            fig, axes = plt.subplots(2)
            
            plt.subplot2grid((1,5), (0,0), colspan=3)
            # here plot something
            
            plt.subplot2grid((1,5), (0,3), colspan=2)
            # here plot something
            
            plt.show()
            

            for example

            import matplotlib.pyplot as plt
            
            fig, axes = plt.subplots(2)
            
            plt.subplot2grid((1,5), (0,0), colspan=3)
            plt.plot([1,2,3]) # plot something
            
            plt.subplot2grid((1,5), (0,3), colspan=2)
            plt.plot([1,2,1]) # plot something
            
            plt.show()
            

            EDIT:

            import pandas as pd
            import numpy as np
            
            def plot_bar(corr_df):
            
                dfstacked = corr_df.stack().order()
                dfstacked.plot(kind='bar', rot=60)
            
            def plot_heatmap(corr_df):
            
                corr_df = corr_df.fillna(value=0)
                plt.pcolormesh(corr_df.values, cmap=plt.cm.Blues)
                plt.yticks(np.arange(0.5, len(corr_df.index), 1), corr_df.index)
                plt.xticks(np.arange(0.5, len(corr_df.columns), 1), corr_df.columns)
            
            
            df = pd.DataFrame(range(10))
            
            fig, axes = plt.subplots(2)
            
            plt.subplot2grid((1,5), (0,0), colspan=3)
            plot_bar(df)
            
            plt.subplot2grid((1,5), (0,3), colspan=2)
            plot_heatmap(df)
            
            plt.show()
            

            http://i.imgur.com/wSiT1UP.png

            qid & accept id: (33875238, 33876021) query: Export Pandas data frame with text column containg utf-8 text and URLs to Excel soup:

            I don't think it is currently possible to pass XlsxWriter constructor options via the Pandas API but you can workaround the strings_to_url issue as follows:

            \n
            import pandas as pd\n\ndf = pd.DataFrame({'Data': ['http://python.org']})\n\n# Create a Pandas Excel writer using XlsxWriter as the engine.\nwriter = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')\n\n# Don't convert url-like strings to urls.\nwriter.book.strings_to_urls = False\n\n# Convert the dataframe to an XlsxWriter Excel object.\ndf.to_excel(writer, sheet_name='Sheet1')\n\n# Close the Pandas Excel writer and output the Excel file.\nwriter.save()\n
            \n

            Update: In recent version of Pandas you can pass XlsxWriter constructor options to ExcelWriter() directly and you do not need to set writer.book.strings_to_urls indirectly:

            \n
            writer = pd.ExcelWriter('pandas_simple.xlsx', \n                        engine='xlsxwriter', \n                        options={'strings_to_urls': False})\n
            \n soup wrap:

            I don't think it is currently possible to pass XlsxWriter constructor options via the Pandas API but you can workaround the strings_to_url issue as follows:

            import pandas as pd
            
            df = pd.DataFrame({'Data': ['http://python.org']})
            
            # Create a Pandas Excel writer using XlsxWriter as the engine.
            writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
            
            # Don't convert url-like strings to urls.
            writer.book.strings_to_urls = False
            
            # Convert the dataframe to an XlsxWriter Excel object.
            df.to_excel(writer, sheet_name='Sheet1')
            
            # Close the Pandas Excel writer and output the Excel file.
            writer.save()
            

            Update: In recent version of Pandas you can pass XlsxWriter constructor options to ExcelWriter() directly and you do not need to set writer.book.strings_to_urls indirectly:

            writer = pd.ExcelWriter('pandas_simple.xlsx', 
                                    engine='xlsxwriter', 
                                    options={'strings_to_urls': False})
            
            qid & accept id: (33907941, 33909260) query: python, zip:Discard extra elements soup:

            You could step through and slice off the chunks that you want to join.

            \n

            The following function will either print the data by tab-separated columns or write it to a passed file:

            \n
            import sys\n\ndef write_cols(data,n,f = sys.stdout):\n    for chunk in (data[i:i+n] for i in range(0,len(data),n)):\n        print('\t'.join(chunk), file = f)\n
            \n

            For example, if data = ['a','b','c','d','e','f','g','h'] then:

            \n
            >>> write_cols(data,3)\na   b   c\nd   e   f\ng   h\n
            \n

            and

            \n
            >>> f = open("testfile.txt",'w')\n>>> write_cols(data,3,f)\n>>> f.close()\n
            \n

            will send the same output to that file.

            \n soup wrap:

            You could step through and slice off the chunks that you want to join.

            The following function will either print the data by tab-separated columns or write it to a passed file:

            import sys
            
            def write_cols(data,n,f = sys.stdout):
                for chunk in (data[i:i+n] for i in range(0,len(data),n)):
                    print('\t'.join(chunk), file = f)
            

            For example, if data = ['a','b','c','d','e','f','g','h'] then:

            >>> write_cols(data,3)
            a   b   c
            d   e   f
            g   h
            

            and

            >>> f = open("testfile.txt",'w')
            >>> write_cols(data,3,f)
            >>> f.close()
            

            will send the same output to that file.

            qid & accept id: (33927174, 33934981) query: Django reverse url to onetoonefield on success soup:

            When you create a one to one field to your user model,

            \n
            class Profile(models.Model):\n    user = models.OneToOneField(User)\n
            \n

            you can access the user from the profile

            \n
            profile.user\n
            \n

            and you can also access the profile from the user

            \n
            user.profile\n
            \n

            In your view, self.object is the user, so self.object.profile.id will give you the profile id.

            \n soup wrap:

            When you create a one to one field to your user model,

            class Profile(models.Model):
                user = models.OneToOneField(User)
            

            you can access the user from the profile

            profile.user
            

            and you can also access the profile from the user

            user.profile
            

            In your view, self.object is the user, so self.object.profile.id will give you the profile id.

            qid & accept id: (33928554, 33930620) query: two dimensional array for encryption in python soup:

            I took the liberty of leaving all of your integers as strings because you showed the tendency of concatenating them together.

            \n

            If you really want a 2D array, which is transposable, the least painful (and quite powerful) solution would be to use a numpy array. You will have to install numpy if you haven't done so already, but since you didn't specify any restriction for extension libraries, I assumed this solution to be acceptable.

            \n

            Array indexing starts with 0 and only accepts integer numbers. There are ways to have an array with custom indices, such as 23, 54, 34, 75 instead of 0, 1, 2, 3. If you decide to subclass numpy.array and override a lot of its methods, it may get more convoluted that expected.

            \n

            Instead, I offer the solution of creating a wrapper class which can handle your custom indexing, but not slicing (which might be a nonsense thing in your case anyway). Under the hood, when you request for '6523', it will split it up into 2-digit strings, '65' and '23' and then look for their position in the row/column lists. In this case, it would be (0, 0). Now you can use this index for the array to fetch the desired element. Finding the indices for an element works in the reverse way. At no point do we directly interact with the array structure, hence no need to override any of its methods.

            \n

            Code

            \n
            import numpy as np\n\nclass CustomIndexTable:\n    def __init__(self, rows, columns, elements):\n        self.rows = rows\n        self.columns = columns\n        self.data = np.array(elements, dtype=object)\n        self.data = self.data.reshape((len(rows), len(columns)))\n\n    def __getitem__(self, index):\n        x, y = index[:2], index[2:]\n        return self.data[self.rows.index(x),self.columns.index(y)]\n\n    def __setitem__(self, index, element):\n        x, y = index[:2], index[2:]\n        self.data[self.rows.index(x),self.columns.index(y)] = element\n\n    def _where(self, element):\n        x, y = np.where(self.data == element)\n        return self.rows[x] + self.columns[y]\n\n    def transpose(self):\n        self.rows, self.columns = self.columns, self.rows\n        self.data = self.data.T\n\n    def where(self, sequence):\n        elements = []\n        start = 0\n        for end in xrange(1, len(sequence)+1):\n            if sequence[start:end] in self.data:\n                elements.append(sequence[start:end])\n                start = end\n        return ''.join(self._where(e) for e in elements)\n\ndef input_matrix_data(text):\n    return raw_input(text).split()\n\ncol_indices = input_matrix_data("Column indices: ")\nrow_indices = input_matrix_data("Row indices: ")\ndata = input_matrix_data("All data, sorted by row: ")\n\ntable = CustomIndexTable(row_indices, col_indices, data)\n
            \n

            Comments

            \n

            You don't want the user to repeat the table indices when you input a new element, such as repeating 65 for both (65, 23) and (65, 54). You can simply ask the user to input the column and row indices once and we'll construct the individual table coordinates later. For the data, have the user input it all at once like reading lines in a book, i.e., line by line from left to right. For all inputs, the user should separate individual members with a space. For example, when inputting the column indices, he should write

            \n
            23 54 34 75\n
            \n

            and for the data

            \n
            AM h 9 C 56 in 13 ok\n
            \n

            Once we have the data in a 1D list, we can put them in an array and reshape it to 2D with the specified number of columns per row.

            \n

            The structure of the class makes a few assumptions required for functionality, which are implicit from your question.

            \n
              \n
            • All row/column labels are 2-digit integers (in string format for convenience).
            • \n
            • There are no two or more table/row/column elements with the same name, since list.index() or numpy.where() might not behave as you expect them to in that case. This assumption makes sense since the use of your table seems to be for encryption/decryption and as such, each element should uniquely map to another.
            • \n
            • When searching for the indices of a sequence of elements, it assumes no element in the table is a prefix of another one, i.e., '9' and '97'.
            • \n
            \n

            Usage

            \n

            Once you have constructed your table, you can view the data (don't edit the array directly!),

            \n
            >>> table.data\narray([['AM', 'h', '9', 'C'],\n       ['56', 'in', '13', 'ok']], dtype=object)\n
            \n

            access a specific element,

            \n
            >>> table['7834']\n'13'\n
            \n

            set a new value for an element,

            \n
            >>> table['7834']  = 'B'\n>>> table['7834']\n'B'\n
            \n

            find where an element resides,

            \n
            >>> table.where('9')   # this should work equally well for '9C'\n'6534'\n
            \n

            or permanently transpose the array.

            \n
            >>> table.transpose()\n>>> table.where('9')\n'3465'\n
            \n

            Finally, you can add more methods to this class as they serve your needs, e.g. adding/deleting a whole row of elements after the table has been created.

            \n soup wrap:

            I took the liberty of leaving all of your integers as strings because you showed the tendency of concatenating them together.

            If you really want a 2D array, which is transposable, the least painful (and quite powerful) solution would be to use a numpy array. You will have to install numpy if you haven't done so already, but since you didn't specify any restriction for extension libraries, I assumed this solution to be acceptable.

            Array indexing starts with 0 and only accepts integer numbers. There are ways to have an array with custom indices, such as 23, 54, 34, 75 instead of 0, 1, 2, 3. If you decide to subclass numpy.array and override a lot of its methods, it may get more convoluted that expected.

            Instead, I offer the solution of creating a wrapper class which can handle your custom indexing, but not slicing (which might be a nonsense thing in your case anyway). Under the hood, when you request for '6523', it will split it up into 2-digit strings, '65' and '23' and then look for their position in the row/column lists. In this case, it would be (0, 0). Now you can use this index for the array to fetch the desired element. Finding the indices for an element works in the reverse way. At no point do we directly interact with the array structure, hence no need to override any of its methods.

            Code

            import numpy as np
            
            class CustomIndexTable:
                def __init__(self, rows, columns, elements):
                    self.rows = rows
                    self.columns = columns
                    self.data = np.array(elements, dtype=object)
                    self.data = self.data.reshape((len(rows), len(columns)))
            
                def __getitem__(self, index):
                    x, y = index[:2], index[2:]
                    return self.data[self.rows.index(x),self.columns.index(y)]
            
                def __setitem__(self, index, element):
                    x, y = index[:2], index[2:]
                    self.data[self.rows.index(x),self.columns.index(y)] = element
            
                def _where(self, element):
                    x, y = np.where(self.data == element)
                    return self.rows[x] + self.columns[y]
            
                def transpose(self):
                    self.rows, self.columns = self.columns, self.rows
                    self.data = self.data.T
            
                def where(self, sequence):
                    elements = []
                    start = 0
                    for end in xrange(1, len(sequence)+1):
                        if sequence[start:end] in self.data:
                            elements.append(sequence[start:end])
                            start = end
                    return ''.join(self._where(e) for e in elements)
            
            def input_matrix_data(text):
                return raw_input(text).split()
            
            col_indices = input_matrix_data("Column indices: ")
            row_indices = input_matrix_data("Row indices: ")
            data = input_matrix_data("All data, sorted by row: ")
            
            table = CustomIndexTable(row_indices, col_indices, data)
            

            Comments

            You don't want the user to repeat the table indices when you input a new element, such as repeating 65 for both (65, 23) and (65, 54). You can simply ask the user to input the column and row indices once and we'll construct the individual table coordinates later. For the data, have the user input it all at once like reading lines in a book, i.e., line by line from left to right. For all inputs, the user should separate individual members with a space. For example, when inputting the column indices, he should write

            23 54 34 75
            

            and for the data

            AM h 9 C 56 in 13 ok
            

            Once we have the data in a 1D list, we can put them in an array and reshape it to 2D with the specified number of columns per row.

            The structure of the class makes a few assumptions required for functionality, which are implicit from your question.

            • All row/column labels are 2-digit integers (in string format for convenience).
            • There are no two or more table/row/column elements with the same name, since list.index() or numpy.where() might not behave as you expect them to in that case. This assumption makes sense since the use of your table seems to be for encryption/decryption and as such, each element should uniquely map to another.
            • When searching for the indices of a sequence of elements, it assumes no element in the table is a prefix of another one, i.e., '9' and '97'.

            Usage

            Once you have constructed your table, you can view the data (don't edit the array directly!),

            >>> table.data
            array([['AM', 'h', '9', 'C'],
                   ['56', 'in', '13', 'ok']], dtype=object)
            

            access a specific element,

            >>> table['7834']
            '13'
            

            set a new value for an element,

            >>> table['7834']  = 'B'
            >>> table['7834']
            'B'
            

            find where an element resides,

            >>> table.where('9')   # this should work equally well for '9C'
            '6534'
            

            or permanently transpose the array.

            >>> table.transpose()
            >>> table.where('9')
            '3465'
            

            Finally, you can add more methods to this class as they serve your needs, e.g. adding/deleting a whole row of elements after the table has been created.

            qid & accept id: (33948731, 33949022) query: Modifying a cooldown decorator to work for methods instead of functions soup:

            You can override your class's __get__ method to make it a descriptor. The __get__ method will be called when someone gets the decorated method from within its containing object, and is passed the containing object, which you will then be able to pass to the original method. It returns an object which implements the functionality you need.

            \n
            def __get__(self, obj, objtype):\n    return Wrapper(self, obj)\n
            \n

            The Wrapper object implements __call__, and any properties you want, so move those implementations into that object. It would look like:

            \n
            class Wrapper:\n    def __init__(self, cdfunc, obj):\n        self.cdfunc = cdfunc\n        self.obj = obj\n    def __call__(self, *args, **kwargs):\n        #do stuff...\n        self.cdfunc._func(self.obj, *args, **kwargs)\n    @property\n    def remaining(self):\n        #...get needed things from self.cdfunc\n
            \n soup wrap:

            You can override your class's __get__ method to make it a descriptor. The __get__ method will be called when someone gets the decorated method from within its containing object, and is passed the containing object, which you will then be able to pass to the original method. It returns an object which implements the functionality you need.

            def __get__(self, obj, objtype):
                return Wrapper(self, obj)
            

            The Wrapper object implements __call__, and any properties you want, so move those implementations into that object. It would look like:

            class Wrapper:
                def __init__(self, cdfunc, obj):
                    self.cdfunc = cdfunc
                    self.obj = obj
                def __call__(self, *args, **kwargs):
                    #do stuff...
                    self.cdfunc._func(self.obj, *args, **kwargs)
                @property
                def remaining(self):
                    #...get needed things from self.cdfunc
            
            qid & accept id: (33970546, 33970597) query: How to find defined sequence in the list? soup:

            You can use regular expressions

            \n
            >>> import re\n>>> l = ['A','B','A','A','B','B','A']\n>>> pat = re.compile(r'BAA')\n>>> sequences = pat.findall(''.join(l))\n>>> sequences\n['BAA']\n>>> len(sequences)\n1\n>>> \n
            \n

            But the best way to do this is using a generator function:

            \n
            >>> def find_sequences(sequences, events):\n...     i = 0\n...     events_len = len(events)\n...     sequences_len = len(sequences)\n...     while i < sequences_len:\n...             if sequences[i:i+events_len] == events: \n...                 yield True\n...             i = i + 1\n... \n>>> list(find_sequences(lst, events))\n>>> sum(find_sequences(['AB', 'A', 'BA', 'A', 'BA'], ['A', 'BA']))\n2\n
            \n soup wrap:

            You can use regular expressions

            >>> import re
            >>> l = ['A','B','A','A','B','B','A']
            >>> pat = re.compile(r'BAA')
            >>> sequences = pat.findall(''.join(l))
            >>> sequences
            ['BAA']
            >>> len(sequences)
            1
            >>> 
            

            But the best way to do this is using a generator function:

            >>> def find_sequences(sequences, events):
            ...     i = 0
            ...     events_len = len(events)
            ...     sequences_len = len(sequences)
            ...     while i < sequences_len:
            ...             if sequences[i:i+events_len] == events: 
            ...                 yield True
            ...             i = i + 1
            ... 
            >>> list(find_sequences(lst, events))
            >>> sum(find_sequences(['AB', 'A', 'BA', 'A', 'BA'], ['A', 'BA']))
            2
            
            qid & accept id: (33972303, 33973304) query: Splitting a dataframe based on column values soup:

            First, you can create group numbers by comparing the value column to zero and then taking a cumulative sum of these boolean values.

            \n
            df['group_no'] = (df.val == 0).cumsum()\n>>> df.head(6)\n      EndDate       val  group_no\n0  2007-10-31  0.000000         1\n1  2007-11-30 -0.033845         1\n2  2007-12-31 -0.033630         1\n3  2008-01-31 -0.009449         1\n4  2008-02-29  0.000000         2\n5  2008-03-31 -0.057450         2\n
            \n

            Next, you can use a dictionary comprehension together with loc to select the relevant group_no dataframe. To get the last group number, I get the last value using iat for location based indexing.

            \n
            d = {i: df.loc[df.group_no == i, ['EndDate', 'val']] \n     for i in range(1, df.group_no.iat[-1])}\n\n>>> d\n{1:       EndDate       val\n 0  2007-10-31  0.000000\n 1  2007-11-30 -0.033845\n 2  2007-12-31 -0.033630\n 3  2008-01-31 -0.009449, \n 2:       EndDate       val\n 4  2008-02-29  0.000000\n 5  2008-03-31 -0.057450\n 6  2008-04-30 -0.038694, \n 3:       EndDate       val\n 7  2008-05-31  0.000000\n 8  2008-06-30 -0.036245\n 9  2008-07-31 -0.005286}\n
            \n

            EDIT \nAs suggested by @DSM, using groupby appears to be about 6x faster based on a sample dataframe with 15k rows.

            \n
            d = {n: df2.ix[rows] \n     for n, rows in enumerate(df2.groupby('group_no').groups)}\n
            \n soup wrap:

            First, you can create group numbers by comparing the value column to zero and then taking a cumulative sum of these boolean values.

            df['group_no'] = (df.val == 0).cumsum()
            >>> df.head(6)
                  EndDate       val  group_no
            0  2007-10-31  0.000000         1
            1  2007-11-30 -0.033845         1
            2  2007-12-31 -0.033630         1
            3  2008-01-31 -0.009449         1
            4  2008-02-29  0.000000         2
            5  2008-03-31 -0.057450         2
            

            Next, you can use a dictionary comprehension together with loc to select the relevant group_no dataframe. To get the last group number, I get the last value using iat for location based indexing.

            d = {i: df.loc[df.group_no == i, ['EndDate', 'val']] 
                 for i in range(1, df.group_no.iat[-1])}
            
            >>> d
            {1:       EndDate       val
             0  2007-10-31  0.000000
             1  2007-11-30 -0.033845
             2  2007-12-31 -0.033630
             3  2008-01-31 -0.009449, 
             2:       EndDate       val
             4  2008-02-29  0.000000
             5  2008-03-31 -0.057450
             6  2008-04-30 -0.038694, 
             3:       EndDate       val
             7  2008-05-31  0.000000
             8  2008-06-30 -0.036245
             9  2008-07-31 -0.005286}
            

            EDIT As suggested by @DSM, using groupby appears to be about 6x faster based on a sample dataframe with 15k rows.

            d = {n: df2.ix[rows] 
                 for n, rows in enumerate(df2.groupby('group_no').groups)}
            
            qid & accept id: (33978569, 33978598) query: greedy regex split python every nth line soup:

            This is easy to do without regular expressions:

            \n
            >>> s = 'Four score and seven years ago.'\n>>> ss = s + 5*' '; [ss[i:i+6] for i in range(0, len(s) - 1, 6)]\n['Four s', 'core a', 'nd sev', 'en yea', 'rs ago', '.     ']\n
            \n

            This provides the blank spaces at the end that you asked for.

            \n

            Alternatively, if you must use regular expressions:

            \n
            >>> import re\n>>> re.findall('.{6}', ss)\n['Four s', 'core a', 'nd sev', 'en yea', 'rs ago', '.     ']\n
            \n

            The key in both cases is creating the string ss which has enough blank space at the end.

            \n soup wrap:

            This is easy to do without regular expressions:

            >>> s = 'Four score and seven years ago.'
            >>> ss = s + 5*' '; [ss[i:i+6] for i in range(0, len(s) - 1, 6)]
            ['Four s', 'core a', 'nd sev', 'en yea', 'rs ago', '.     ']
            

            This provides the blank spaces at the end that you asked for.

            Alternatively, if you must use regular expressions:

            >>> import re
            >>> re.findall('.{6}', ss)
            ['Four s', 'core a', 'nd sev', 'en yea', 'rs ago', '.     ']
            

            The key in both cases is creating the string ss which has enough blank space at the end.

            qid & accept id: (34008175, 34027011) query: PyQt change element in .ui file soup:

            You can access elements from .ui designs by their names. E.g. there is a design for main window with one button:

            \n
            \n\n MainWindow\n \n  ...\n  \n  ...\n \n\n
            \n

            You init widget object with it:

            \n
            class MainWindow(QMainWindow):\n    def __init__(self):\n        QMainWindow.__init__(self)\n        uic.loadUi('window.ui', self)\n\nmain_window = MainWindow()\n
            \n

            Then from your method you can get access to that button:

            \n
            @pyqtSlot()\ndef click_my_btn(self, sender):\n    main_window.btn.hide()\n
            \n soup wrap:

            You can access elements from .ui designs by their names. E.g. there is a design for main window with one button:

            
            
             MainWindow
             
              ...
              
              ...
             
            
            

            You init widget object with it:

            class MainWindow(QMainWindow):
                def __init__(self):
                    QMainWindow.__init__(self)
                    uic.loadUi('window.ui', self)
            
            main_window = MainWindow()
            

            Then from your method you can get access to that button:

            @pyqtSlot()
            def click_my_btn(self, sender):
                main_window.btn.hide()
            
            qid & accept id: (34014527, 34014827) query: Python Create Combinations from Multiple Data Frames soup:

            You could use expand_grid which is mentioned in docs cookbook:

            \n
            def expand_grid(data_dict):\n  rows = itertools.product(*data_dict.values())\n  return pd.DataFrame.from_records(rows, columns=data_dict.keys())\n\nexpand_grid({'val_1': [0.00789, 0.01448, 0.03157], 'val_2' : [0.5, 1.0]})\n\nIn [107]: expand_grid({'val_1': [0.00789, 0.01448, 0.03157], 'val_2' : [0.5, 1.0]})\nOut[107]:\n     val_1  val_2\n0  0.00789    0.5\n1  0.00789    1.0\n2  0.01448    0.5\n3  0.01448    1.0\n4  0.03157    0.5\n5  0.03157    1.0\n
            \n

            EDIT

            \n

            For existing dataframes you first will need to create one dictionary from your dataframes. You could combine to one with one of the answers to that question. Example for your case:

            \n
            expand_grid(dict(var_1.to_dict('list'), **var_2.to_dict('list')))\n\nIn [122]: expand_grid(dict(var_1.to_dict('list'), **var_2.to_dict('list')))\nOut[122]:\n     val_1  val_2\n0  0.00789    0.5\n1  0.00789    1.0\n2  0.01448    0.5\n3  0.01448    1.0\n4  0.03157    0.5\n5  0.03157    1.0\n
            \n soup wrap:

            You could use expand_grid which is mentioned in docs cookbook:

            def expand_grid(data_dict):
              rows = itertools.product(*data_dict.values())
              return pd.DataFrame.from_records(rows, columns=data_dict.keys())
            
            expand_grid({'val_1': [0.00789, 0.01448, 0.03157], 'val_2' : [0.5, 1.0]})
            
            In [107]: expand_grid({'val_1': [0.00789, 0.01448, 0.03157], 'val_2' : [0.5, 1.0]})
            Out[107]:
                 val_1  val_2
            0  0.00789    0.5
            1  0.00789    1.0
            2  0.01448    0.5
            3  0.01448    1.0
            4  0.03157    0.5
            5  0.03157    1.0
            

            EDIT

            For existing dataframes you first will need to create one dictionary from your dataframes. You could combine to one with one of the answers to that question. Example for your case:

            expand_grid(dict(var_1.to_dict('list'), **var_2.to_dict('list')))
            
            In [122]: expand_grid(dict(var_1.to_dict('list'), **var_2.to_dict('list')))
            Out[122]:
                 val_1  val_2
            0  0.00789    0.5
            1  0.00789    1.0
            2  0.01448    0.5
            3  0.01448    1.0
            4  0.03157    0.5
            5  0.03157    1.0
            
            qid & accept id: (34016799, 34018005) query: Python, using BeautifulSoup parsing values from a table soup:

            The following will extract your two columns using the span tag inside the li elements:

            \n
            html = """\n\n\n\n    \n\n
            \n
              \n
            • 15:00:1911.7505392?
            • \n
            • 14:56:5511.75017?
            • \n
            • 14:56:5211.750479?
            • \n
            • 14:56:4911.7406?
            • \n
            • 14:56:4611.740333?
            • \n
            • 14:56:4311.74021?
            • \n
            • 14:56:4011.74015?
            • \n
            • 14:56:3711.74035?
            • \n
            • 14:56:3411.75011?
            • \n
            • 14:56:3111.7403?
            • \n
            • 14:56:2811.74024?
            • \n
            • 14:56:2211.750291?
            • \n
            • 14:56:1911.740198?
            • \n
            • 14:56:1611.73015?
            • \n
            \n
            """\n\nsoup = BeautifulSoup(html)\n\ncol_3 = []\ncol_4 = []\n\nfor li in soup.find_all('table')[0].find_all("li"):\n cols = li.find_all("span")\n col_3.append(cols[2].text)\n col_4.append(cols[3].text)\n\nprint col_3 \nprint col_4\n
            \n

            This would give you the following output:

            \n
            [u'5392', u'17', u'479', u'6', u'333', u'21', u'15', u'35', u'11', u'3', u'24', u'291', u'198', u'15']\n[u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?']\n
            \n soup wrap:

            The following will extract your two columns using the span tag inside the li elements:

            html = """
            
            • 15:00:1911.7505392?
            • 14:56:5511.75017?
            • 14:56:5211.750479?
            • 14:56:4911.7406?
            • 14:56:4611.740333?
            • 14:56:4311.74021?
            • 14:56:4011.74015?
            • 14:56:3711.74035?
            • 14:56:3411.75011?
            • 14:56:3111.7403?
            • 14:56:2811.74024?
            • 14:56:2211.750291?
            • 14:56:1911.740198?
            • 14:56:1611.73015?
            """ soup = BeautifulSoup(html) col_3 = [] col_4 = [] for li in soup.find_all('table')[0].find_all("li"): cols = li.find_all("span") col_3.append(cols[2].text) col_4.append(cols[3].text) print col_3 print col_4

            This would give you the following output:

            [u'5392', u'17', u'479', u'6', u'333', u'21', u'15', u'35', u'11', u'3', u'24', u'291', u'198', u'15']
            [u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?']
            
            qid & accept id: (34030902, 34032463) query: Writing NetCDF time variable from start of year soup:

            You ask how to calculate:

            \n
            \n

            variable of seconds from the start of the year.

            \n
            \n

            You can do that using:

            \n
            import datetime\n\nnow = datetime.datetime.now()\nyearStart = datetime.datetime(now.year, 1, 1)\ndiffTime = now - yearStart\nyearSeconds = int(diffTime.total_seconds())\n
            \n

            You go on to ask about creating:

            \n
            \n

            temperature variable that has values recorded every 6 hours for a year (1459 elements).

            \n
            \n

            You could convert the year seconds to your sample index using:

            \n
            index = yearSeconds / (60*60*6) # / 60 sec/min * 60 min/hour * 6 hours\n
            \n

            But then you want:

            \n
            \n

            time ranging from the start of the year and stepping every 6 hours

            \n
            \n

            So maybe you need to go the other way. You have the index (sample number) and want the corresponding date. You can calculate that using:

            \n
            sampleDateTime = yearStart + datetime.timedelta(0, index * 60 * 60 * 6)\n
            \n

            Be sure you have set the correct yearStart for your data.

            \n soup wrap:

            You ask how to calculate:

            variable of seconds from the start of the year.

            You can do that using:

            import datetime
            
            now = datetime.datetime.now()
            yearStart = datetime.datetime(now.year, 1, 1)
            diffTime = now - yearStart
            yearSeconds = int(diffTime.total_seconds())
            

            You go on to ask about creating:

            temperature variable that has values recorded every 6 hours for a year (1459 elements).

            You could convert the year seconds to your sample index using:

            index = yearSeconds / (60*60*6) # / 60 sec/min * 60 min/hour * 6 hours
            

            But then you want:

            time ranging from the start of the year and stepping every 6 hours

            So maybe you need to go the other way. You have the index (sample number) and want the corresponding date. You can calculate that using:

            sampleDateTime = yearStart + datetime.timedelta(0, index * 60 * 60 * 6)
            

            Be sure you have set the correct yearStart for your data.

            qid & accept id: (34057756, 34255161) query: How to combine SQLAlchemy's @hybrid_property decorator with Werkzeug's cached_property decorator? soup:

            This is a bit tricky to get right, since both cached_property and hybrid_property expect to wrap a method and to return a property. You end up extending either one of them or both.

            \n

            The nicest thing I could come up is this. It basically inlines the logic of cached_property into hybrid_property's __get__. Note that it caches the property values for the instances, but not for the class.

            \n
            from sqlalchemy.ext.hybrid import hybrid_property\n\n_missing = object()   # sentinel object for missing values\n\n\nclass cached_hybrid_property(hybrid_property):\n    def __get__(self, instance, owner):\n        if instance is None:\n            # getting the property for the class\n            return self.expr(owner)\n        else:\n            # getting the property for an instance\n            name = self.fget.__name__\n            value = instance.__dict__.get(name, _missing)\n            if value is _missing:\n                value = self.fget(instance)\n                instance.__dict__[name] = value\n            return value\n\n\nclass Example(object):\n    @cached_hybrid_property\n    def foo(self):\n        return "expensive calculations"\n
            \n
            \n

            At first I thought you could simply use functools.lru_cache instead of cached_property. Then I realized that you likely want an instance-specific cache instead of global cache indexed by the instance, which is what lru_cache provides. There's no standard library utility for caching method calls per instance.

            \n

            To illustrate the problem with lru_cache, consider this simplistic version of caching:

            \n
            CACHE = {}\n\nclass Example(object):\n    @property\n    def foo(self):\n        if self not in CACHE:\n            CACHE[self] = ...  # do the actual computation\n        return CACHE[self]\n
            \n

            This will store the cached values of foo for every Example instance your program generates - in other words, it can leak memory. lru_cache is a bit smarter, since it limits the size of the cache, but then you might end up re-computing some of the values you needed if they go out of the cache. A better solution is to attach the cached values to instances of Example they belong to, like done by cached_property.

            \n soup wrap:

            This is a bit tricky to get right, since both cached_property and hybrid_property expect to wrap a method and to return a property. You end up extending either one of them or both.

            The nicest thing I could come up is this. It basically inlines the logic of cached_property into hybrid_property's __get__. Note that it caches the property values for the instances, but not for the class.

            from sqlalchemy.ext.hybrid import hybrid_property
            
            _missing = object()   # sentinel object for missing values
            
            
            class cached_hybrid_property(hybrid_property):
                def __get__(self, instance, owner):
                    if instance is None:
                        # getting the property for the class
                        return self.expr(owner)
                    else:
                        # getting the property for an instance
                        name = self.fget.__name__
                        value = instance.__dict__.get(name, _missing)
                        if value is _missing:
                            value = self.fget(instance)
                            instance.__dict__[name] = value
                        return value
            
            
            class Example(object):
                @cached_hybrid_property
                def foo(self):
                    return "expensive calculations"
            

            At first I thought you could simply use functools.lru_cache instead of cached_property. Then I realized that you likely want an instance-specific cache instead of global cache indexed by the instance, which is what lru_cache provides. There's no standard library utility for caching method calls per instance.

            To illustrate the problem with lru_cache, consider this simplistic version of caching:

            CACHE = {}
            
            class Example(object):
                @property
                def foo(self):
                    if self not in CACHE:
                        CACHE[self] = ...  # do the actual computation
                    return CACHE[self]
            

            This will store the cached values of foo for every Example instance your program generates - in other words, it can leak memory. lru_cache is a bit smarter, since it limits the size of the cache, but then you might end up re-computing some of the values you needed if they go out of the cache. A better solution is to attach the cached values to instances of Example they belong to, like done by cached_property.

            qid & accept id: (34066053, 34066433) query: From list of dictionaries to np array of arrays and vice-versa soup:

            You can do this easily with pandas:

            \n
            import pandas as pd\nlistOfDicts = [{"key1":10, "key3":19},\n               {"key1":20, "key2":25, "key3":29},\n               {"key1":30, "key2":35, "key3":39},\n               {"key1":40, "key2":45, "key3":49}]\n\ndf = pd.DataFrame(listOfDicts)\nvals = df.values\nvals\n\narray([[10, nan, 19],\n       [20, 25,  29],\n       [30, 35,  39],\n       [40, 45,  49]])\n
            \n

            To convert a NumPy array into a dictionary you can use:

            \n
            df2 = pd.DataFrame(vals, columns=df.columns)\ndf2.to_dict(orient='records')\n\n[{'key1': 10.0, 'key2': nan, 'key3': 19.0},\n {'key1': 20.0, 'key2': 25.0, 'key3': 29.0},\n {'key1': 30.0, 'key2': 35.0, 'key3': 39.0},\n {'key1': 40.0, 'key2': 45.0, 'key3': 49.0}]\n
            \n soup wrap:

            You can do this easily with pandas:

            import pandas as pd
            listOfDicts = [{"key1":10, "key3":19},
                           {"key1":20, "key2":25, "key3":29},
                           {"key1":30, "key2":35, "key3":39},
                           {"key1":40, "key2":45, "key3":49}]
            
            df = pd.DataFrame(listOfDicts)
            vals = df.values
            vals
            
            array([[10, nan, 19],
                   [20, 25,  29],
                   [30, 35,  39],
                   [40, 45,  49]])
            

            To convert a NumPy array into a dictionary you can use:

            df2 = pd.DataFrame(vals, columns=df.columns)
            df2.to_dict(orient='records')
            
            [{'key1': 10.0, 'key2': nan, 'key3': 19.0},
             {'key1': 20.0, 'key2': 25.0, 'key3': 29.0},
             {'key1': 30.0, 'key2': 35.0, 'key3': 39.0},
             {'key1': 40.0, 'key2': 45.0, 'key3': 49.0}]
            
            qid & accept id: (34069642, 34069931) query: Pythons 'with'-statement: correctly nest/derive classes with __enter__/__exit__ soup:

            Ideally, use contextlib.contextmanager. For the case of deriving:

            \n
            import contextlib\n\nclass context_mixin:\n    def __enter__(self):\n         self.__context = self.context()\n         return self.__context.__enter__()\n    def __exit__(self, *args):\n         return self.__context.__exit__(*args)\n\nclass class_a(context_mixin):\n    @contextlib.contextmanager\n    def context(self):\n         print('class_a enter')\n         try:\n             yield self\n         finally:\n             print('class_a exit')\n\nclass class_b(class_a):\n    @contextlib.contextmanager\n    def context(self):\n        with super().context():\n            print('class_b enter')\n            try:\n                yield self\n            finally:\n                print('class_b exit')\n
            \n

            In Python 2, super() needs to be super(class_b, self).

            \n

            There is a change in behaviour compared with your code: this code exits b before exiting a, meaning that the scopes nest. You've written your code to do them in the other order, although that's easy enough to change. Often it makes no difference, but when it does matter you usually want things to nest. So for an (admittedly-contrived) example, if class_a represents an open file, and class_b represents some file format, then the exit path for class_a will close the file, while the exit path for class_b will write any buffered changes that have yet to be committed. Clearly b should happen first!

            \n

            For the case of holding another object:

            \n
            class class_b(context_mixin):\n    def __init__(self):\n        self.a = class_a()\n    @contextlib.contextmanager\n    def context(self):\n        with self.a:\n            print('class_b enter')\n            try:\n                yield self\n            finally:\n                print('class_b exit')\n
            \n soup wrap:

            Ideally, use contextlib.contextmanager. For the case of deriving:

            import contextlib
            
            class context_mixin:
                def __enter__(self):
                     self.__context = self.context()
                     return self.__context.__enter__()
                def __exit__(self, *args):
                     return self.__context.__exit__(*args)
            
            class class_a(context_mixin):
                @contextlib.contextmanager
                def context(self):
                     print('class_a enter')
                     try:
                         yield self
                     finally:
                         print('class_a exit')
            
            class class_b(class_a):
                @contextlib.contextmanager
                def context(self):
                    with super().context():
                        print('class_b enter')
                        try:
                            yield self
                        finally:
                            print('class_b exit')
            

            In Python 2, super() needs to be super(class_b, self).

            There is a change in behaviour compared with your code: this code exits b before exiting a, meaning that the scopes nest. You've written your code to do them in the other order, although that's easy enough to change. Often it makes no difference, but when it does matter you usually want things to nest. So for an (admittedly-contrived) example, if class_a represents an open file, and class_b represents some file format, then the exit path for class_a will close the file, while the exit path for class_b will write any buffered changes that have yet to be committed. Clearly b should happen first!

            For the case of holding another object:

            class class_b(context_mixin):
                def __init__(self):
                    self.a = class_a()
                @contextlib.contextmanager
                def context(self):
                    with self.a:
                        print('class_b enter')
                        try:
                            yield self
                        finally:
                            print('class_b exit')
            
            qid & accept id: (34073053, 34073483) query: Python: separate list of values into x number of sections, and give each value in x a variable soup:

            Use some index trickery:

            \n
            >>> nums = [1, 2, 3, 4, 5, 6, 7, 8]\n>>> colors = ['red', 'green', 'orange', 'blue']\n>>> chunks = 4\n>>> for i,num in enumerate(nums):\n    print("%s:%s"%(num,colors[i*chunks//len(nums)%len(colors)]))\n1:red\n2:red\n3:green\n4:green\n5:orange\n6:orange\n7:blue\n8:blue\n
            \n

            the major part of this is colors[i*chunks//len(nums)%len(colors)] which can be broken down like this:

            \n
            colors[i*chunks//len(nums)%len(colors)]\n       ^                              index of num in nums\n        ^      ^                      multiply by chunks then later dividing by len is the\n                                      same as dividing by len/chunks\n               ^                      explicit integer divide is important for indexing\n                          ^           ensures that there is no index error if \n                                      chunks>len(colors) (check example)\n
            \n

            high value of chunks example:

            \n
            >>> chunks = 7\n>>> for i,num in enumerate(nums):\n    print("%s:%s"%(num,colors[i*chunks//len(nums)%len(colors)]))\n\n\n1:red\n2:red\n3:green\n4:orange\n5:blue\n6:red\n7:green\n8:orange\n
            \n soup wrap:

            Use some index trickery:

            >>> nums = [1, 2, 3, 4, 5, 6, 7, 8]
            >>> colors = ['red', 'green', 'orange', 'blue']
            >>> chunks = 4
            >>> for i,num in enumerate(nums):
                print("%s:%s"%(num,colors[i*chunks//len(nums)%len(colors)]))
            1:red
            2:red
            3:green
            4:green
            5:orange
            6:orange
            7:blue
            8:blue
            

            the major part of this is colors[i*chunks//len(nums)%len(colors)] which can be broken down like this:

            colors[i*chunks//len(nums)%len(colors)]
                   ^                              index of num in nums
                    ^      ^                      multiply by chunks then later dividing by len is the
                                                  same as dividing by len/chunks
                           ^                      explicit integer divide is important for indexing
                                      ^           ensures that there is no index error if 
                                                  chunks>len(colors) (check example)
            

            high value of chunks example:

            >>> chunks = 7
            >>> for i,num in enumerate(nums):
                print("%s:%s"%(num,colors[i*chunks//len(nums)%len(colors)]))
            
            
            1:red
            2:red
            3:green
            4:orange
            5:blue
            6:red
            7:green
            8:orange
            
            qid & accept id: (34079643, 34079899) query: Changing an input from an integer to a string back to an integer soup:

            Lets say you have the current time in String format:

            \n
            timeString = "10:59:16"\n
            \n

            You can use the split method to split this string at every instance of a colon (":"). This returns a list with 3 elements.

            \n
            timeList = timeString.split(":")\nprint(timeList) -> ["10","59","16"]\n
            \n

            You can store these elements and do whatever calculations you choose with them.

            \n
            hours = int(timeList[0]) -> 10\nminutes = int(timeList[1]) -> 59\nseconds = int(timeList[2]) -> 16\n
            \n

            Once you have finished your calculations, or adjusted the variables, you can combine them back into a string by concatenating.

            \n
            timeString = str(hours) + ":" + str(minutes) + ":" + str(seconds)\nprint(timeString) -> "10:59:16"\n
            \n

            I hope this helps. Good luck Cameron!

            \n soup wrap:

            Lets say you have the current time in String format:

            timeString = "10:59:16"
            

            You can use the split method to split this string at every instance of a colon (":"). This returns a list with 3 elements.

            timeList = timeString.split(":")
            print(timeList) -> ["10","59","16"]
            

            You can store these elements and do whatever calculations you choose with them.

            hours = int(timeList[0]) -> 10
            minutes = int(timeList[1]) -> 59
            seconds = int(timeList[2]) -> 16
            

            Once you have finished your calculations, or adjusted the variables, you can combine them back into a string by concatenating.

            timeString = str(hours) + ":" + str(minutes) + ":" + str(seconds)
            print(timeString) -> "10:59:16"
            

            I hope this helps. Good luck Cameron!

            qid & accept id: (34082065, 34083317) query: Convert a pandas dataframe in a transactional data format to a list - Python soup:

            The solution which you mentioned in a comment from answer to question:

            \n
            df.groupby(['id'])['purchased_item'].apply(list).values.tolist()\n\nIn [434]: df.groupby(['id'])['purchased_item'].apply(list).values.tolist()\nOut[434]:\n[['apple', 'banana', 'carrot'],\n ['banana'],\n ['apple'],\n ['apple', 'carrot', 'diet_coke'],\n ['banana', 'carrot'],\n ['banana', 'carrot']]\n
            \n

            EDIT

            \n

            Some test performance to compare with @Colonel Beauvel solution:

            \n
            In [472]: %timeit [gr['purchased_item'].tolist() for n, gr in df.groupby('id')]\n100 loops, best of 3: 2.1 ms per loop\n\nIn [473]: %timeit df.groupby(['id'])['purchased_item'].apply(list).values.tolist()\n1000 loops, best of 3: 1.36 ms per loop\n
            \n soup wrap:

            The solution which you mentioned in a comment from answer to question:

            df.groupby(['id'])['purchased_item'].apply(list).values.tolist()
            
            In [434]: df.groupby(['id'])['purchased_item'].apply(list).values.tolist()
            Out[434]:
            [['apple', 'banana', 'carrot'],
             ['banana'],
             ['apple'],
             ['apple', 'carrot', 'diet_coke'],
             ['banana', 'carrot'],
             ['banana', 'carrot']]
            

            EDIT

            Some test performance to compare with @Colonel Beauvel solution:

            In [472]: %timeit [gr['purchased_item'].tolist() for n, gr in df.groupby('id')]
            100 loops, best of 3: 2.1 ms per loop
            
            In [473]: %timeit df.groupby(['id'])['purchased_item'].apply(list).values.tolist()
            1000 loops, best of 3: 1.36 ms per loop
            
            qid & accept id: (34114554, 34115203) query: Faster alternative to for loop in for loop soup:

            Currently you are checking each key against every other key for a total of O(n^2) comparisons. The insight is that we only need to check against a very small fraction of the other keys.

            \n

            Suppose the alphabet over which the characters of each key has k distinct values. For example, if your keys are simple ASCII strings consisting of a-z and 0-9, then k = 26 + 10 = 30.

            \n

            Given any key, we can generate all possible keys which are one character away: there are 127 * k such strings. Whereas before you were comparing each key to ~150,000 other keys, now we only need to compare against 127 * k, which is 3810 for the case where k = 30. This reduces overall time complexity from O(n^2) to O(n * k), where k is a constant independent of n. This is where the real speedup is when you scale up n.

            \n

            Here's some code to generate all possible neighbors of a key:

            \n
            def generate_neighbors(key, alphabet):\n    for i in range(len(key)):\n        left, right = key[:i], key[i+1:]\n        for char in alphabet:\n            if char != key[i]:\n                yield left + char + right\n
            \n

            So, for example:

            \n
            >>> set(generate_neighbors('ab', {'a', 'b', 'c', 'd'}))\n{'aa', 'ac', 'ad', 'bb', 'cb', 'db'}\n
            \n

            Now we compute the neighborhoods of each key:

            \n
            def compute_neighborhoods(data, alphabet):\n    keyset = set(data.keys())\n    for key in data:\n        possible_neighbors = set(generate_neighbors(key, alphabet))\n        neighbors = possible_neighbors & keyset\n\n        identifier = data[key][0]\n\n        for neighbor in neighbors:\n            data[neighbor][1].append(identifier)\n
            \n

            Now an example. Suppose

            \n
            data = {\n '0a': [4, []],\n '1f': [9, []],\n '27': [3, []],\n '32': [8, []],\n '3f': [6, []],\n '47': [1, []],\n '7c': [2, []],\n 'a1': [0, []],\n 'c8': [7, []],\n 'e2': [5, []]\n}\n
            \n

            Then:

            \n
            >>> alphabet = set('abcdef01234567890')\n>>> compute_neighborhoods(data, alphabet)\n>>> data\n{'0a': [4, []],\n '1f': [9, [6]],\n '27': [3, [1]],\n '32': [8, [5, 6]],\n '3f': [6, [8, 9]],\n '47': [1, [3]],\n '7c': [2, []],\n 'a1': [0, []],\n 'c8': [7, []],\n 'e2': [5, [8]]}\n
            \n

            There are a few more optimizations that I haven't implemented here. First, you say that the keys mostly differ on their later characters, and that they differ at 11 positions, at most. This means that we can be smarter about computing the intersection possible_neighbors & keyset and in generating the neighborhood. First, we amend generate_neighbors to modify the trailing characters of the key first. Then, instead of generating the whole set of neighbors at once, we generate them one at a time and check for inclusion in the data dictionary. We keep track of how many we find, and if we find 11 we break.

            \n

            The reason I have not implemented this in my answer is that I'm not certain that it will result in a significant speedup, and might in fact be slower, since it means removing an optimized Python builtin (set intersection) with a pure-Python loop.

            \n soup wrap:

            Currently you are checking each key against every other key for a total of O(n^2) comparisons. The insight is that we only need to check against a very small fraction of the other keys.

            Suppose the alphabet over which the characters of each key has k distinct values. For example, if your keys are simple ASCII strings consisting of a-z and 0-9, then k = 26 + 10 = 30.

            Given any key, we can generate all possible keys which are one character away: there are 127 * k such strings. Whereas before you were comparing each key to ~150,000 other keys, now we only need to compare against 127 * k, which is 3810 for the case where k = 30. This reduces overall time complexity from O(n^2) to O(n * k), where k is a constant independent of n. This is where the real speedup is when you scale up n.

            Here's some code to generate all possible neighbors of a key:

            def generate_neighbors(key, alphabet):
                for i in range(len(key)):
                    left, right = key[:i], key[i+1:]
                    for char in alphabet:
                        if char != key[i]:
                            yield left + char + right
            

            So, for example:

            >>> set(generate_neighbors('ab', {'a', 'b', 'c', 'd'}))
            {'aa', 'ac', 'ad', 'bb', 'cb', 'db'}
            

            Now we compute the neighborhoods of each key:

            def compute_neighborhoods(data, alphabet):
                keyset = set(data.keys())
                for key in data:
                    possible_neighbors = set(generate_neighbors(key, alphabet))
                    neighbors = possible_neighbors & keyset
            
                    identifier = data[key][0]
            
                    for neighbor in neighbors:
                        data[neighbor][1].append(identifier)
            

            Now an example. Suppose

            data = {
             '0a': [4, []],
             '1f': [9, []],
             '27': [3, []],
             '32': [8, []],
             '3f': [6, []],
             '47': [1, []],
             '7c': [2, []],
             'a1': [0, []],
             'c8': [7, []],
             'e2': [5, []]
            }
            

            Then:

            >>> alphabet = set('abcdef01234567890')
            >>> compute_neighborhoods(data, alphabet)
            >>> data
            {'0a': [4, []],
             '1f': [9, [6]],
             '27': [3, [1]],
             '32': [8, [5, 6]],
             '3f': [6, [8, 9]],
             '47': [1, [3]],
             '7c': [2, []],
             'a1': [0, []],
             'c8': [7, []],
             'e2': [5, [8]]}
            

            There are a few more optimizations that I haven't implemented here. First, you say that the keys mostly differ on their later characters, and that they differ at 11 positions, at most. This means that we can be smarter about computing the intersection possible_neighbors & keyset and in generating the neighborhood. First, we amend generate_neighbors to modify the trailing characters of the key first. Then, instead of generating the whole set of neighbors at once, we generate them one at a time and check for inclusion in the data dictionary. We keep track of how many we find, and if we find 11 we break.

            The reason I have not implemented this in my answer is that I'm not certain that it will result in a significant speedup, and might in fact be slower, since it means removing an optimized Python builtin (set intersection) with a pure-Python loop.

            qid & accept id: (34117408, 34379303) query: Django User Model one-to-one with other model and Forms soup:

            When I started to learn Django, I used to use the function based view aka FBV and in my current project I decided to learn class based view CBV, I watched one DjangoCon videos by Andrew Pinkham to make this easier on me, and if you tried or planning to learn CBV, you will be confused about the class based views and the generic class based views inside Django, it’s some many of them, please watch the video to get your head around it. OK, now I’ve done my homework and it’s time to use CBV, believe me it’s easy and you will find the number of code lines inside our views will be decreased specially if you use GCBV. User Model and GCBV

            \n

            What is the relation between GCBV and User Model?

            \n

            Great question, while I’m working with one of my models fro example her the Teacher model, and teacher will has a user credentials in order to user the app. The easiest and the straightforward way is to make one-to-one relation with django.contrib.auth.models.User please read the quote from Django documentation:

            \n
            \n

            There are two ways to extend the default User model without substituting your own model. If the changes you need are purely behavioral, and don’t require any change to what is stored in the database, you can create a proxy model based on User. This allows for any of the features offered by proxy models including default ordering, custom managers, or custom model methods. If you wish to store information related to User, you can use a one-to-one relationship to a model containing the fields for additional information. This one-to-one model is often called a profile model, as it might store non-auth related information about a site user.

            \n
            \n

            So I decide to go with one-to-one relationship way.

            \n

            The Teacher Model

            \n
            FEMALE = 'F'\nMALE = 'M' \nclass Teacher(models.Model): \n    GENDER_CHOICES = ( \n        (MALE, _('Male')), \n        (FEMALE, _('Female')), \n        )  \n    gender = models.CharField(max_length=1, verbose_name=_('Gender'), choices=GENDER_CHOICES) \n    civil_id = models.CharField(max_length=12, verbose_name=_('Civil ID')) \n    phone_number = models.CharField(max_length=15, verbose_name=_('Phone Number')) \n    job_title = models.CharField(max_length=15, verbose_name=_('Title')) \n    user = models.OneToOneField(to=User, related_name='teacher_profile') \n\n    def enable(self): \n    """ \n    Enable teacher profile \n    :return: \n    """ \n    self.user.is_active = True \n    self.user.save() \n\n    def disable(self): \n    """ \n    Disable teacher profile \n    :return: \n    """ \n    self.user.is_active = False \n    self.user.save() \n\n    def get_absolute_url(self): \n       return reverse('teacher_details', args=(self.pk,))\n
            \n

            Issues

            \n

            The issue is I want to display all fields from Teacher and User form on one template page and handle the creation. so there are 2 issues:

            \n
              \n
            1. Display Teacher and User models fields’ on one template page using CreateView class.
            2. \n
            3. Handle the under the hood creation process.\nI posted a question over Stack Overflow regarding these issues.
            4. \n
            \n

            Solutions

            \n

            Thinking about a solution for the first issue

            \n

            After I studied the CreateView generic view class, it can take only one form using _form_class_ attribute. I knew Django render context variables on the template, and CreateView will pass the _form_class_ to the template to render it, so I thought about adding second form to the class and add it to the context before passing it to the template, thus I override _get_context_data()_ method and add the second form to the context.

            \n
            def get_context_data(self, **kwargs): \n    #Get the context \n    context = super(TeacherCreation, self).get_context_data(**kwargs) \n    #Adding the second form \n    context['user_form'] = self.second_form_class \n    return context \n
            \n

            Now I’m passing one form for Teacher model and the second form is for User model, and in the template display both forms.

            \n
            {% csrf_token %} \n
            \n
            \n

            Teacher Information

            \n
            \n
            \n {{ user_form }} \n {{ form }} \n \n
            \n
            \n
            \n
            \n

            The first issue solved.

            \n

            Thinking for a solution for the second issue

            \n

            Now I can display 2 forms on template using CreateView class, but what about posting or saving data/form. To do this I override _form_valid_ method and done the work there.

            \n
            def form_valid(self, form): \n    user_form = UserCreationForm(self.request.POST) \n    if user_form.is_valid(): \n        user = user_form.save() \n        teacher = form.save(commit=False) \n        teacher.user_id = user.id \n        teacher.save() \n        return HttpResponseRedirect(self.get_success_url())\n
            \n

            The second issue solved but what about the update, it's easy and almost the same as CreateView, so let's see How er can do it

            \n
            def get_context_data(self, **kwargs): \n    context = super(TeacherUpdate, self).get_context_data(**kwargs) \n    context['user_form'] = self.second_form_class(self.request.POST or None, instance=self.object.user) \n    return context \n\ndef form_valid(self, form): \n    user_form = UserChangeForm(self.request.POST, instance=self.object.user) \n    if user_form.is_valid(): \n        user_form.save() \n        return super(TeacherUpdate, self).form_valid(form)\n
            \n

            Full Example

            \n
            ########################\n# models.py\n########################\nFEMALE = 'F'\nMALE = 'M'\n\nclass Teacher(models.Model):\n    """\n    Halaqat teachers information\n    """\n    GENDER_CHOICES = (\n        (MALE, _('Male')),\n        (FEMALE, _('Female')),\n    )\n    gender = models.CharField(max_length=1, verbose_name=_('Gender'),\n                              choices=GENDER_CHOICES)\n    civil_id = models.CharField(max_length=12, verbose_name=_('Civil ID'))\n    phone_number = models.CharField(max_length=15,\n                                    verbose_name=_('Phone Number'))\n    job_title = models.CharField(max_length=15, verbose_name=_('Title'))\n    user = models.OneToOneField(to=User, related_name='teacher_profile')\n\n    def enable(self):\n        """\n        Enable teacher profile\n        :return:\n        """\n        self.user.is_active = True\n        self.user.save()\n\n    def disable(self):\n        """\n        Disable teacher profile\n        :return:\n        """\n        self.user.is_active = False\n        self.user.save()\n\n    def get_absolute_url(self):\n        return reverse('teacher_details', args=(self.pk,))\n\n########################\n# views.py\n########################\nclass TeacherCreation(SuccessMessageMixin, CreateView):\n    """\n    Creates new teacher\n    """\n    template_name = 'back_office/teacher_form.html'\n    form_class = TeacherForm\n    model = Teacher\n    second_form_class = UserCreationForm\n    success_message = 'Teacher profile saved successfully'\n\n    def get_context_data(self, **kwargs):\n        context = super(TeacherCreation, self).get_context_data(**kwargs)\n\n        context['user_form'] = self.second_form_class\n\n        return context\n\n    def form_valid(self, form):\n        user_form = UserCreationForm(self.request.POST)\n        if user_form.is_valid():\n            user = user_form.save()\n            teacher = form.save(commit=False)\n            teacher.user_id = user.id\n            teacher.save()\n        return HttpResponseRedirect(self.get_success_url())\n\nclass TeacherUpdate(SuccessMessageMixin, UpdateView):\n    """\n    Update teacher profile\n    """\n    model = Teacher\n    template_name = 'back_office/teacher_form.html'\n    form_class = TeacherForm\n    second_form_class = UserChangeForm\n    success_message = 'Teacher profile saved successfully'\n\n    def get_context_data(self, **kwargs):\n        context = super(TeacherUpdate, self).get_context_data(**kwargs)\n\n        context['user_form'] = self.second_form_class(self.request.POST or None, instance=self.object.user)\n\n        return context\n\n    def form_valid(self, form):\n        user_form = UserChangeForm(self.request.POST, instance=self.object.user)\n        if user_form.is_valid():\n            user_form.save()\n        return super(TeacherUpdate, self).form_valid(form)\n\n########################\n# teacher_form.html\n########################\n{% extends 'back_office/back_office_base.html' %}\n{% load crispy_forms_tags %}\n{% block title %}\n    New Teacher Form\n{% endblock %}\n{% block container %}\n    
            {% csrf_token %}\n
            \n
            \n

            Teacher Information

            \n
            \n
            \n {{ user_form|crispy }}\n {{ form|crispy }}\n \n
            \n
            \n
            \n{% endblock %}\n
            \n

            I posted the solution on my blog.

            \n soup wrap:

            When I started to learn Django, I used to use the function based view aka FBV and in my current project I decided to learn class based view CBV, I watched one DjangoCon videos by Andrew Pinkham to make this easier on me, and if you tried or planning to learn CBV, you will be confused about the class based views and the generic class based views inside Django, it’s some many of them, please watch the video to get your head around it. OK, now I’ve done my homework and it’s time to use CBV, believe me it’s easy and you will find the number of code lines inside our views will be decreased specially if you use GCBV. User Model and GCBV

            What is the relation between GCBV and User Model?

            Great question, while I’m working with one of my models fro example her the Teacher model, and teacher will has a user credentials in order to user the app. The easiest and the straightforward way is to make one-to-one relation with django.contrib.auth.models.User please read the quote from Django documentation:

            There are two ways to extend the default User model without substituting your own model. If the changes you need are purely behavioral, and don’t require any change to what is stored in the database, you can create a proxy model based on User. This allows for any of the features offered by proxy models including default ordering, custom managers, or custom model methods. If you wish to store information related to User, you can use a one-to-one relationship to a model containing the fields for additional information. This one-to-one model is often called a profile model, as it might store non-auth related information about a site user.

            So I decide to go with one-to-one relationship way.

            The Teacher Model

            FEMALE = 'F'
            MALE = 'M' 
            class Teacher(models.Model): 
                GENDER_CHOICES = ( 
                    (MALE, _('Male')), 
                    (FEMALE, _('Female')), 
                    )  
                gender = models.CharField(max_length=1, verbose_name=_('Gender'), choices=GENDER_CHOICES) 
                civil_id = models.CharField(max_length=12, verbose_name=_('Civil ID')) 
                phone_number = models.CharField(max_length=15, verbose_name=_('Phone Number')) 
                job_title = models.CharField(max_length=15, verbose_name=_('Title')) 
                user = models.OneToOneField(to=User, related_name='teacher_profile') 
            
                def enable(self): 
                """ 
                Enable teacher profile 
                :return: 
                """ 
                self.user.is_active = True 
                self.user.save() 
            
                def disable(self): 
                """ 
                Disable teacher profile 
                :return: 
                """ 
                self.user.is_active = False 
                self.user.save() 
            
                def get_absolute_url(self): 
                   return reverse('teacher_details', args=(self.pk,))
            

            Issues

            The issue is I want to display all fields from Teacher and User form on one template page and handle the creation. so there are 2 issues:

            1. Display Teacher and User models fields’ on one template page using CreateView class.
            2. Handle the under the hood creation process. I posted a question over Stack Overflow regarding these issues.

            Solutions

            Thinking about a solution for the first issue

            After I studied the CreateView generic view class, it can take only one form using _form_class_ attribute. I knew Django render context variables on the template, and CreateView will pass the _form_class_ to the template to render it, so I thought about adding second form to the class and add it to the context before passing it to the template, thus I override _get_context_data()_ method and add the second form to the context.

            def get_context_data(self, **kwargs): 
                #Get the context 
                context = super(TeacherCreation, self).get_context_data(**kwargs) 
                #Adding the second form 
                context['user_form'] = self.second_form_class 
                return context 
            

            Now I’m passing one form for Teacher model and the second form is for User model, and in the template display both forms.

            {% csrf_token %}

            Teacher Information

            {{ user_form }} {{ form }}

            The first issue solved.

            Thinking for a solution for the second issue

            Now I can display 2 forms on template using CreateView class, but what about posting or saving data/form. To do this I override _form_valid_ method and done the work there.

            def form_valid(self, form): 
                user_form = UserCreationForm(self.request.POST) 
                if user_form.is_valid(): 
                    user = user_form.save() 
                    teacher = form.save(commit=False) 
                    teacher.user_id = user.id 
                    teacher.save() 
                    return HttpResponseRedirect(self.get_success_url())
            

            The second issue solved but what about the update, it's easy and almost the same as CreateView, so let's see How er can do it

            def get_context_data(self, **kwargs): 
                context = super(TeacherUpdate, self).get_context_data(**kwargs) 
                context['user_form'] = self.second_form_class(self.request.POST or None, instance=self.object.user) 
                return context 
            
            def form_valid(self, form): 
                user_form = UserChangeForm(self.request.POST, instance=self.object.user) 
                if user_form.is_valid(): 
                    user_form.save() 
                    return super(TeacherUpdate, self).form_valid(form)
            

            Full Example

            ########################
            # models.py
            ########################
            FEMALE = 'F'
            MALE = 'M'
            
            class Teacher(models.Model):
                """
                Halaqat teachers information
                """
                GENDER_CHOICES = (
                    (MALE, _('Male')),
                    (FEMALE, _('Female')),
                )
                gender = models.CharField(max_length=1, verbose_name=_('Gender'),
                                          choices=GENDER_CHOICES)
                civil_id = models.CharField(max_length=12, verbose_name=_('Civil ID'))
                phone_number = models.CharField(max_length=15,
                                                verbose_name=_('Phone Number'))
                job_title = models.CharField(max_length=15, verbose_name=_('Title'))
                user = models.OneToOneField(to=User, related_name='teacher_profile')
            
                def enable(self):
                    """
                    Enable teacher profile
                    :return:
                    """
                    self.user.is_active = True
                    self.user.save()
            
                def disable(self):
                    """
                    Disable teacher profile
                    :return:
                    """
                    self.user.is_active = False
                    self.user.save()
            
                def get_absolute_url(self):
                    return reverse('teacher_details', args=(self.pk,))
            
            ########################
            # views.py
            ########################
            class TeacherCreation(SuccessMessageMixin, CreateView):
                """
                Creates new teacher
                """
                template_name = 'back_office/teacher_form.html'
                form_class = TeacherForm
                model = Teacher
                second_form_class = UserCreationForm
                success_message = 'Teacher profile saved successfully'
            
                def get_context_data(self, **kwargs):
                    context = super(TeacherCreation, self).get_context_data(**kwargs)
            
                    context['user_form'] = self.second_form_class
            
                    return context
            
                def form_valid(self, form):
                    user_form = UserCreationForm(self.request.POST)
                    if user_form.is_valid():
                        user = user_form.save()
                        teacher = form.save(commit=False)
                        teacher.user_id = user.id
                        teacher.save()
                    return HttpResponseRedirect(self.get_success_url())
            
            class TeacherUpdate(SuccessMessageMixin, UpdateView):
                """
                Update teacher profile
                """
                model = Teacher
                template_name = 'back_office/teacher_form.html'
                form_class = TeacherForm
                second_form_class = UserChangeForm
                success_message = 'Teacher profile saved successfully'
            
                def get_context_data(self, **kwargs):
                    context = super(TeacherUpdate, self).get_context_data(**kwargs)
            
                    context['user_form'] = self.second_form_class(self.request.POST or None, instance=self.object.user)
            
                    return context
            
                def form_valid(self, form):
                    user_form = UserChangeForm(self.request.POST, instance=self.object.user)
                    if user_form.is_valid():
                        user_form.save()
                    return super(TeacherUpdate, self).form_valid(form)
            
            ########################
            # teacher_form.html
            ########################
            {% extends 'back_office/back_office_base.html' %}
            {% load crispy_forms_tags %}
            {% block title %}
                New Teacher Form
            {% endblock %}
            {% block container %}
                
            {% csrf_token %}

            Teacher Information

            {{ user_form|crispy }} {{ form|crispy }}
            {% endblock %}

            I posted the solution on my blog.

            qid & accept id: (34124733, 34124837) query: pandas: detect the first/last record number from a time stamp of weekday only soup:

            Using the compare-cumsum-groupby pattern:

            \n
            df['first'] = (df\n               .groupby((df.weekday != df.weekday.shift()).cumsum())\n               .records\n               .transform('first'))\n\ndf['last'] = (df\n              .groupby((df.weekday != df.weekday.shift()).cumsum())\n              .records\n              .transform('last'))    \n>>> df\n    records    weekday  first  last\n0         1     Monday      1     3\n1         2     Monday      1     3\n2         3     Monday      1     3\n3         4    Tuesday      4     6\n4         6    Tuesday      4     6\n5         7  Wednesday      7     7\n6         8   Thursday      8    14\n7        12   Thursday      8    14\n8        14   Thursday      8    14\n9        15     Friday     15    19\n10       16     Friday     15    19\n11       19     Friday     15    19\n12       23   Saturday     23    23\n13       26     Sunday     26    26\n14       29     Monday     29    38\n15       38     Monday     29    38\n16       43    Tuesday     43    43\n17       59  Wednesday     59    61\n18       61  Wednesday     59    61\n
            \n

            The trick is to get unique indexes for each weekday (not just 1-7, but incrementing by one each time there is a new weekday).

            \n
            df['week_counter'] = (df.weekday != df.weekday.shift()).cumsum()\n>>> df\n    records    weekday  week_counter\n0         1     Monday             1\n1         2     Monday             1\n2         3     Monday             1\n3         4    Tuesday             2\n4         6    Tuesday             2\n5         7  Wednesday             3\n6         8   Thursday             4\n7        12   Thursday             4\n8        14   Thursday             4\n...\n16       43    Tuesday             9\n17       59  Wednesday            10\n18       61  Wednesday            10\n
            \n

            These week_counter values are then used in groupby to create groups of records, and transorm is used (to maintain the same shape as the original dataframe) taking both the first and last records of each group.

            \n soup wrap:

            Using the compare-cumsum-groupby pattern:

            df['first'] = (df
                           .groupby((df.weekday != df.weekday.shift()).cumsum())
                           .records
                           .transform('first'))
            
            df['last'] = (df
                          .groupby((df.weekday != df.weekday.shift()).cumsum())
                          .records
                          .transform('last'))    
            >>> df
                records    weekday  first  last
            0         1     Monday      1     3
            1         2     Monday      1     3
            2         3     Monday      1     3
            3         4    Tuesday      4     6
            4         6    Tuesday      4     6
            5         7  Wednesday      7     7
            6         8   Thursday      8    14
            7        12   Thursday      8    14
            8        14   Thursday      8    14
            9        15     Friday     15    19
            10       16     Friday     15    19
            11       19     Friday     15    19
            12       23   Saturday     23    23
            13       26     Sunday     26    26
            14       29     Monday     29    38
            15       38     Monday     29    38
            16       43    Tuesday     43    43
            17       59  Wednesday     59    61
            18       61  Wednesday     59    61
            

            The trick is to get unique indexes for each weekday (not just 1-7, but incrementing by one each time there is a new weekday).

            df['week_counter'] = (df.weekday != df.weekday.shift()).cumsum()
            >>> df
                records    weekday  week_counter
            0         1     Monday             1
            1         2     Monday             1
            2         3     Monday             1
            3         4    Tuesday             2
            4         6    Tuesday             2
            5         7  Wednesday             3
            6         8   Thursday             4
            7        12   Thursday             4
            8        14   Thursday             4
            ...
            16       43    Tuesday             9
            17       59  Wednesday            10
            18       61  Wednesday            10
            

            These week_counter values are then used in groupby to create groups of records, and transorm is used (to maintain the same shape as the original dataframe) taking both the first and last records of each group.

            qid & accept id: (34137398, 34137946) query: Manipulating rows of csv file in python soup:

            It's not quite clear what you're going for, but maybe this will work. You're currently re-using the variable name 'row', and the indentation is all wacky. Also, you shouldn't be including the first row (the headers) in your loops.

            \n
            import csv\nimport math\n\nf = open('citydata.csv')\n\ncsv_f = csv.reader(f)\ncontent = [row for row in csv_f]\n\nfor row in content[1:]:\n    x1 = float(row[2])\n    x2 = float(row[3])\n    for rowOther in content[1:]:\n        y1 = float(rowOther[2])\n        y2 = float(rowOther[3])\n\n        answer = (x1-(math.pow(x2,2))) - (y1-(math.pow(y2,2)))\n\n        print(answer)\n
            \n

            EDIT:

            \n

            I just realized that I think you had your x's and y's swapped in a couple places. Try this instead:

            \n
            import csv\nimport math\n\nf = open('citydata.csv')\n\ncsv_f = csv.reader(f)\ncontent = [row for row in csv_f]\n\nfor row in content[1:]:\n    x1 = float(row[2])\n    y1 = float(row[3])\n    for rowOther in content[1:]:\n        x2 = float(rowOther[2])\n        y2 = float(rowOther[3])\n\n        answer = (x1-(math.pow(x2,2))) - (y1-(math.pow(y2,2)))\n\n        print(answer)\n
            \n soup wrap:

            It's not quite clear what you're going for, but maybe this will work. You're currently re-using the variable name 'row', and the indentation is all wacky. Also, you shouldn't be including the first row (the headers) in your loops.

            import csv
            import math
            
            f = open('citydata.csv')
            
            csv_f = csv.reader(f)
            content = [row for row in csv_f]
            
            for row in content[1:]:
                x1 = float(row[2])
                x2 = float(row[3])
                for rowOther in content[1:]:
                    y1 = float(rowOther[2])
                    y2 = float(rowOther[3])
            
                    answer = (x1-(math.pow(x2,2))) - (y1-(math.pow(y2,2)))
            
                    print(answer)
            

            EDIT:

            I just realized that I think you had your x's and y's swapped in a couple places. Try this instead:

            import csv
            import math
            
            f = open('citydata.csv')
            
            csv_f = csv.reader(f)
            content = [row for row in csv_f]
            
            for row in content[1:]:
                x1 = float(row[2])
                y1 = float(row[3])
                for rowOther in content[1:]:
                    x2 = float(rowOther[2])
                    y2 = float(rowOther[3])
            
                    answer = (x1-(math.pow(x2,2))) - (y1-(math.pow(y2,2)))
            
                    print(answer)
            
            qid & accept id: (34137572, 34138994) query: python, how to run commands on remote hosts and show output in GUI in real time? soup:

            This is a broad question, but I'll give you few clues.

            \n

            Nice example is LogIo. Once you are willing to run some commands and than push output to GUI, using Node.js becomes natural approach. This app may contain few elements:

            \n
              \n
            • part one that runs commands and harvests output and pushes it to
            • \n
            • part two that receives output and saves it to DB/files. After save, this part is throwing event to
            • \n
            • part three, that should be a websocket server, which will handle users that are online and distribute events to
            • \n
            • part four, which would be preoperly scripted GUI that is able to connect via websocket to part three, log-in user, receive events and broadcast them to other GUI elements.
            • \n
            \n

            Once I assume you feel stronger with PHP than python, for you easiest approach would be to create part two as a PHP service to handle input (save harvested output to db) and than, let say use UDP package to part three's UDP listening-socket.

            \n

            Part one would be python script to just get command output and bypass it properly to part two. It should be as easy to hadle as usual grep case:

            \n
            tail -f /var/log/apache2/access.log | /usr/share/bin/myharvester \n
            \n

            at some point of developing it you will be in demand of passing there also user or unical task id as parameter after myharvester.

            \n

            Tricky but easier than you think will be to create a Node.js cript as part three. As a single instance script it should be able to receive input and bypass it to users as events. I've commited comething like this before:

            \n
            var config = {};\nvar app = require('http').createServer().listen(config.server.port);\n\nvar io = require('socket.io').listen(app);\n\nvar listenerDgram = require('dgram').createSocket('udp4');\nlistenerDgram.bind(config.listeners.udp.port);\n\nvar sprintf = require('sprintf').sprintf;\n\nvar users = [];\n\napp.on('error', function(er) {\n    console.log(sprintf('[%s] [ERROR] HTTP Server at port %s has thrown %s', Date(), config.server.port, er.toString()));\n    process.exit();\n});\n\nlistenerDgram.on('error', function(er) {\n    console.log(sprintf('[%s] [ERROR] UDP Listener at port %s has thrown %s', Date(), config.listeners.udp.port, er.toString()));\n    process.exit();\n});\n\nlistenerDgram.on('message', function(msg, rinfo) {\n    // handling, let's say, JSONized msg from part two script,\n    // buildinf a var frame and finally\n    if(user) {\n        // emit to single user based on what happened\n        // inside this method\n        users[user].emit('notification', frame);\n    } else {\n        // emit to all users\n        io.emit('notification', frame);\n    }\n\n});\n\nio.sockets.on('connection', function(socket) {\n    // handling user connection here and pushing users' sockets to\n    // users aray.\n});\n
            \n

            This scrap is basic example of not filled with logic what-you-need. Script should be able to open UDP listener on given port and to listen for users running into it within websockets. Honestly, once you become good in Node.js, you may want to fix both part two + part three with it, what will take UDP part off you as harvester will push output directly to script, that maintains websocket inside it. But it has a drawback of duplicating some logic from other back-end as CRM.

            \n

            Last (fourth) part would be to implement web interface with JavaScript inside, that connects currently logged user to socket server.

            \n

            I've used similar approach before, and it is working real-time, so we can show our Call-Center employees information about incoming call before even phone actually start to ring. Finally solutions (not counting interface of CRM) closes in two scripts - dedicated CRM API part (where all logic happen) to handle events from Asterisk and Node.js event forwarder.

            \n soup wrap:

            This is a broad question, but I'll give you few clues.

            Nice example is LogIo. Once you are willing to run some commands and than push output to GUI, using Node.js becomes natural approach. This app may contain few elements:

            • part one that runs commands and harvests output and pushes it to
            • part two that receives output and saves it to DB/files. After save, this part is throwing event to
            • part three, that should be a websocket server, which will handle users that are online and distribute events to
            • part four, which would be preoperly scripted GUI that is able to connect via websocket to part three, log-in user, receive events and broadcast them to other GUI elements.

            Once I assume you feel stronger with PHP than python, for you easiest approach would be to create part two as a PHP service to handle input (save harvested output to db) and than, let say use UDP package to part three's UDP listening-socket.

            Part one would be python script to just get command output and bypass it properly to part two. It should be as easy to hadle as usual grep case:

            tail -f /var/log/apache2/access.log | /usr/share/bin/myharvester 
            

            at some point of developing it you will be in demand of passing there also user or unical task id as parameter after myharvester.

            Tricky but easier than you think will be to create a Node.js cript as part three. As a single instance script it should be able to receive input and bypass it to users as events. I've commited comething like this before:

            var config = {};
            var app = require('http').createServer().listen(config.server.port);
            
            var io = require('socket.io').listen(app);
            
            var listenerDgram = require('dgram').createSocket('udp4');
            listenerDgram.bind(config.listeners.udp.port);
            
            var sprintf = require('sprintf').sprintf;
            
            var users = [];
            
            app.on('error', function(er) {
                console.log(sprintf('[%s] [ERROR] HTTP Server at port %s has thrown %s', Date(), config.server.port, er.toString()));
                process.exit();
            });
            
            listenerDgram.on('error', function(er) {
                console.log(sprintf('[%s] [ERROR] UDP Listener at port %s has thrown %s', Date(), config.listeners.udp.port, er.toString()));
                process.exit();
            });
            
            listenerDgram.on('message', function(msg, rinfo) {
                // handling, let's say, JSONized msg from part two script,
                // buildinf a var frame and finally
                if(user) {
                    // emit to single user based on what happened
                    // inside this method
                    users[user].emit('notification', frame);
                } else {
                    // emit to all users
                    io.emit('notification', frame);
                }
            
            });
            
            io.sockets.on('connection', function(socket) {
                // handling user connection here and pushing users' sockets to
                // users aray.
            });
            

            This scrap is basic example of not filled with logic what-you-need. Script should be able to open UDP listener on given port and to listen for users running into it within websockets. Honestly, once you become good in Node.js, you may want to fix both part two + part three with it, what will take UDP part off you as harvester will push output directly to script, that maintains websocket inside it. But it has a drawback of duplicating some logic from other back-end as CRM.

            Last (fourth) part would be to implement web interface with JavaScript inside, that connects currently logged user to socket server.

            I've used similar approach before, and it is working real-time, so we can show our Call-Center employees information about incoming call before even phone actually start to ring. Finally solutions (not counting interface of CRM) closes in two scripts - dedicated CRM API part (where all logic happen) to handle events from Asterisk and Node.js event forwarder.

            qid & accept id: (34141855, 34142071) query: How do I take an integer from a list intending to use it? soup:

            If you have a list of numbers:

            \n
            >>> one = [1, 2, 3, 4]\n
            \n

            you can access them via index, starting with 0:

            \n
            >>> one[0]\n1\n>>> one[2]\n2\n
            \n

            the last one:

            \n
            >>> one[-1]\n4\n
            \n

            You can calculate with them:

            \n
            >>> one[1] + one[1]\n4\n
            \n

            or store under a different name:

            \n
            >>> a = one[1]\n>>> a\n2\n>>> a + a\n4\n
            \n soup wrap:

            If you have a list of numbers:

            >>> one = [1, 2, 3, 4]
            

            you can access them via index, starting with 0:

            >>> one[0]
            1
            >>> one[2]
            2
            

            the last one:

            >>> one[-1]
            4
            

            You can calculate with them:

            >>> one[1] + one[1]
            4
            

            or store under a different name:

            >>> a = one[1]
            >>> a
            2
            >>> a + a
            4
            
            qid & accept id: (34146679, 34148596) query: Python/ Pandas CSV Parsing soup:

            This is useless text that is required to keep an answer from being downvoted by the moderators. Here is the data I used:

            \n
            "Date","Information","Type"\n"2015-12-07","First: Jim, Last: Jones, School: MCAA; First: Jane, Last: Jones,  School: MCAA;","Old"\n"2015-12-06","First: Tom, Last: Smith, School: MCAA; First: Tammy, Last: Smith, School: MCAA;","New"\n
            \n
            \n
            import pandas as pd\nimport numpy as np\nimport csv\nimport re\nimport itertools as it\nimport pprint\nimport datetime as dt\n\nrecords = [] #Construct a complete record for each person\n\ncolon_pairs = r"""\n    (\w+)   #Match a 'word' character, one or more times, captured in group 1, followed by..\n    :       #A colon, followed by...\n    \s*     #Whitespace, 0 or more times, followed by...\n    (\w+)   #A 'word' character, one or more times, captured in group 2.\n"""\n\ncolon_pairs_per_person = 3\n\nwith open("csv1.csv", encoding='utf-8') as f:\n    next(f) #skip header line\n    record = {}\n\n    for date, info, the_type in csv.reader(f):\n        info_parser = re.finditer(colon_pairs, info, flags=re.X)\n\n        for i, match_obj in enumerate(info_parser):\n            key, val = match_obj.groups()\n            record[key] = val\n\n            if (i+1) % colon_pairs_per_person == 0: #then done with info for a person\n                record['Date'] = dt.datetime.strptime(date, '%Y-%m-%d') #So that you can sort the DataFrame rows by date.\n                record['Type'] = the_type\n\n                records.append(record)\n                record = {}\n\npprint.pprint(records)\ndf = pd.DataFrame(\n        sorted(records, key=lambda record: record['Date'])\n)\nprint(df)\ndf.set_index('Date', inplace=True)\nprint(df)\n\n--output:--\n[{'Date': datetime.datetime(2015, 12, 7, 0, 0),\n  'First': 'Jim',\n  'Last': 'Jones',\n  'School': 'MCAA',\n  'Type': 'Old'},\n {'Date': datetime.datetime(2015, 12, 7, 0, 0),\n  'First': 'Jane',\n  'Last': 'Jones',\n  'School': 'MCAA',\n  'Type': 'Old'},\n {'Date': datetime.datetime(2015, 12, 6, 0, 0),\n  'First': 'Tom',\n  'Last': 'Smith',\n  'School': 'MCAA',\n  'Type': 'New'},\n {'Date': datetime.datetime(2015, 12, 6, 0, 0),\n  'First': 'Tammy',\n  'Last': 'Smith',\n  'School': 'MCAA',\n  'Type': 'New'}]\n\n        Date  First   Last School Type\n0 2015-12-06    Tom  Smith   MCAA  New\n1 2015-12-06  Tammy  Smith   MCAA  New\n2 2015-12-07    Jim  Jones   MCAA  Old\n3 2015-12-07   Jane  Jones   MCAA  Old\n\n            First   Last School Type\nDate                                \n2015-12-06    Tom  Smith   MCAA  New\n2015-12-06  Tammy  Smith   MCAA  New\n2015-12-07    Jim  Jones   MCAA  Old\n2015-12-07   Jane  Jones   MCAA  Old\n
            \n soup wrap:

            This is useless text that is required to keep an answer from being downvoted by the moderators. Here is the data I used:

            "Date","Information","Type"
            "2015-12-07","First: Jim, Last: Jones, School: MCAA; First: Jane, Last: Jones,  School: MCAA;","Old"
            "2015-12-06","First: Tom, Last: Smith, School: MCAA; First: Tammy, Last: Smith, School: MCAA;","New"
            

            import pandas as pd
            import numpy as np
            import csv
            import re
            import itertools as it
            import pprint
            import datetime as dt
            
            records = [] #Construct a complete record for each person
            
            colon_pairs = r"""
                (\w+)   #Match a 'word' character, one or more times, captured in group 1, followed by..
                :       #A colon, followed by...
                \s*     #Whitespace, 0 or more times, followed by...
                (\w+)   #A 'word' character, one or more times, captured in group 2.
            """
            
            colon_pairs_per_person = 3
            
            with open("csv1.csv", encoding='utf-8') as f:
                next(f) #skip header line
                record = {}
            
                for date, info, the_type in csv.reader(f):
                    info_parser = re.finditer(colon_pairs, info, flags=re.X)
            
                    for i, match_obj in enumerate(info_parser):
                        key, val = match_obj.groups()
                        record[key] = val
            
                        if (i+1) % colon_pairs_per_person == 0: #then done with info for a person
                            record['Date'] = dt.datetime.strptime(date, '%Y-%m-%d') #So that you can sort the DataFrame rows by date.
                            record['Type'] = the_type
            
                            records.append(record)
                            record = {}
            
            pprint.pprint(records)
            df = pd.DataFrame(
                    sorted(records, key=lambda record: record['Date'])
            )
            print(df)
            df.set_index('Date', inplace=True)
            print(df)
            
            --output:--
            [{'Date': datetime.datetime(2015, 12, 7, 0, 0),
              'First': 'Jim',
              'Last': 'Jones',
              'School': 'MCAA',
              'Type': 'Old'},
             {'Date': datetime.datetime(2015, 12, 7, 0, 0),
              'First': 'Jane',
              'Last': 'Jones',
              'School': 'MCAA',
              'Type': 'Old'},
             {'Date': datetime.datetime(2015, 12, 6, 0, 0),
              'First': 'Tom',
              'Last': 'Smith',
              'School': 'MCAA',
              'Type': 'New'},
             {'Date': datetime.datetime(2015, 12, 6, 0, 0),
              'First': 'Tammy',
              'Last': 'Smith',
              'School': 'MCAA',
              'Type': 'New'}]
            
                    Date  First   Last School Type
            0 2015-12-06    Tom  Smith   MCAA  New
            1 2015-12-06  Tammy  Smith   MCAA  New
            2 2015-12-07    Jim  Jones   MCAA  Old
            3 2015-12-07   Jane  Jones   MCAA  Old
            
                        First   Last School Type
            Date                                
            2015-12-06    Tom  Smith   MCAA  New
            2015-12-06  Tammy  Smith   MCAA  New
            2015-12-07    Jim  Jones   MCAA  Old
            2015-12-07   Jane  Jones   MCAA  Old
            
            qid & accept id: (34146928, 34147339) query: how can I have commas instead of space in a given set of number soup:
            >>> s = "39401.99865    7292.4753   8541.03675  6098.54185  106352.218  7300.4485   5699.983    5538.44755  5934.8514   7477.62475  5956.7409   9170.98 9481.5082   6063.4508   9380.92255" \n>>> [float(item) for item in s.split()]\n[39401.99865, 7292.4753, 8541.03675, 6098.54185, 106352.218, 7300.4485, 5699.983, 5538.44755, 5934.8514, 7477.62475, 5956.7409, 9170.98, 9481.5082, 6063.4508, 9380.92255]\n
            \n

            Or you can use map

            \n
            >>> map(float, s.split())\n[39401.99865, 7292.4753, 8541.03675, 6098.54185, 106352.218, 7300.4485, 5699.983, 5538.44755, 5934.8514, 7477.62475, 5956.7409, 9170.98, 9481.5082, 6063.4508, 9380.92255]\n
            \n soup wrap:
            >>> s = "39401.99865    7292.4753   8541.03675  6098.54185  106352.218  7300.4485   5699.983    5538.44755  5934.8514   7477.62475  5956.7409   9170.98 9481.5082   6063.4508   9380.92255" 
            >>> [float(item) for item in s.split()]
            [39401.99865, 7292.4753, 8541.03675, 6098.54185, 106352.218, 7300.4485, 5699.983, 5538.44755, 5934.8514, 7477.62475, 5956.7409, 9170.98, 9481.5082, 6063.4508, 9380.92255]
            

            Or you can use map

            >>> map(float, s.split())
            [39401.99865, 7292.4753, 8541.03675, 6098.54185, 106352.218, 7300.4485, 5699.983, 5538.44755, 5934.8514, 7477.62475, 5956.7409, 9170.98, 9481.5082, 6063.4508, 9380.92255]
            
            qid & accept id: (34155132, 34155715) query: Relating/adding data to a django object list soup:

            You could also for a few number of records append some field to the model query result the will be accessible from the template

            \n
            class Team(models.Model): \n    team_name = models.CharField(max_length=200)\n\n\ndef get_teams(request):\n    teams = Team.objects.all()\n    for team in teams:\n        team.team_win_percent = calculate_team_win(team)\n        team.team_lose_percent = calculate_team_loss(team)\n    ....\n
            \n

            In template

            \n
            {% for team in teams %}\n    team win percentage = {{ team.team_win_percent }}\n    team lose percentage = {{ team.team_lose_percent }}\n\n{% endfor %}\n
            \n soup wrap:

            You could also for a few number of records append some field to the model query result the will be accessible from the template

            class Team(models.Model): 
                team_name = models.CharField(max_length=200)
            
            
            def get_teams(request):
                teams = Team.objects.all()
                for team in teams:
                    team.team_win_percent = calculate_team_win(team)
                    team.team_lose_percent = calculate_team_loss(team)
                ....
            

            In template

            {% for team in teams %}
                team win percentage = {{ team.team_win_percent }}
                team lose percentage = {{ team.team_lose_percent }}
            
            {% endfor %}
            
            qid & accept id: (34168264, 34168345) query: Python:how to get keys with same values? soup:

            If you want this to work for any arbitrary key(s) you can use a defaultdict of OrderedDicts..

            \n
            from collections import defaultdict, OrderedDict\nresult_dict = defaultdict(OrderedDict)\ndata = [('Han Decane','12333'),('Can Decane','12333'),('AlRight','10110')]\nfor (v,k) in data:\n   result_dict[k][v]=True\n\n\n>>> list(result_dict['12333'].keys())\n['Han Decane', 'Can Decane']\n
            \n

            And if you want all the results that had multiple values

            \n
            >>> [k for k in result_dict if len(result_dict[k]) > 1 ]\n['12333']\n
            \n soup wrap:

            If you want this to work for any arbitrary key(s) you can use a defaultdict of OrderedDicts..

            from collections import defaultdict, OrderedDict
            result_dict = defaultdict(OrderedDict)
            data = [('Han Decane','12333'),('Can Decane','12333'),('AlRight','10110')]
            for (v,k) in data:
               result_dict[k][v]=True
            
            
            >>> list(result_dict['12333'].keys())
            ['Han Decane', 'Can Decane']
            

            And if you want all the results that had multiple values

            >>> [k for k in result_dict if len(result_dict[k]) > 1 ]
            ['12333']
            
            qid & accept id: (34193415, 34193687) query: How to make array of array of dictionaries in python soup:

            Let us keep it simple. All you need is one dictionary and two helper methods as below

            \n

            This considers the case the similarity between player1 & player2 remains the same irrespective of the order during put & get.

            \n
            similarities = {}\n\ndef set_sim(players, sim):\n    similarities[tuple(sorted(players))] = sim\n\ndef get_sim(players):\n    return similarities.get(tuple(sorted(players)))\n
            \n

            Here is how to use them

            \n
            >>> set_sim(['Player3', 'Player1'], 2)\n>>> set_sim(['Player1', 'Player2'], 3)\n>>> set_sim(['Player2', 'Player3'], 3)\n>>> get_sim(['Player3','Player2'])\n3\n>>> similarities\n{('Player1', 'Player2'): 3, ('Player2', 'Player3'): 3, ('Player1', 'Player3'): 2}\n
            \n

            If you are in need of finding other players, the helper method should be easy again.

            \n
            def get_other_players(player):\n    for pair in similarities.keys():\n        try:\n            other_player = pair[(pair.index(player)+1)%2]\n            print other_player, "=", similarities[pair]\n        except ValueError:\n            pass\n
            \n

            logs:

            \n
            >>> set_sim(['Player9','Player4'], .02)\n>>> set_sim(['Player3','Player4'], .8)\n>>> set_sim(['Player12','Player4'], 1.5)\n\n>>> get_other_players('Player4')\nPlayer9 = 0.02\nPlayer3 = 0.8\nPlayer12 = 1.5\n
            \n soup wrap:

            Let us keep it simple. All you need is one dictionary and two helper methods as below

            This considers the case the similarity between player1 & player2 remains the same irrespective of the order during put & get.

            similarities = {}
            
            def set_sim(players, sim):
                similarities[tuple(sorted(players))] = sim
            
            def get_sim(players):
                return similarities.get(tuple(sorted(players)))
            

            Here is how to use them

            >>> set_sim(['Player3', 'Player1'], 2)
            >>> set_sim(['Player1', 'Player2'], 3)
            >>> set_sim(['Player2', 'Player3'], 3)
            >>> get_sim(['Player3','Player2'])
            3
            >>> similarities
            {('Player1', 'Player2'): 3, ('Player2', 'Player3'): 3, ('Player1', 'Player3'): 2}
            

            If you are in need of finding other players, the helper method should be easy again.

            def get_other_players(player):
                for pair in similarities.keys():
                    try:
                        other_player = pair[(pair.index(player)+1)%2]
                        print other_player, "=", similarities[pair]
                    except ValueError:
                        pass
            

            logs:

            >>> set_sim(['Player9','Player4'], .02)
            >>> set_sim(['Player3','Player4'], .8)
            >>> set_sim(['Player12','Player4'], 1.5)
            
            >>> get_other_players('Player4')
            Player9 = 0.02
            Player3 = 0.8
            Player12 = 1.5
            
            qid & accept id: (34206921, 34207010) query: Converting date using to_datetime soup:

            You need to convert to str if necessary, then zfill the month col and pass this with a valid format to to_datetime:

            \n
            In [303]:\ndf['date'] = pd.to_datetime(df['year'].astype(str) + df['month'].astype(str).str.zfill(2), format='%Y%m')\ndf\n\nOut[303]:\n   year  month       pl       date\n0  2010      1  27.4376 2010-01-01\n1  2010      2  29.2314 2010-02-01\n2  2010      3  33.5714 2010-03-01\n3  2010      4  37.2986 2010-04-01\n4  2010      5  36.6971 2010-05-01\n5  2010      6  35.9329 2010-06-01\n
            \n

            If the conversion is unnecessary then the following should work:

            \n
            df['date'] = pd.to_datetime(df['year'] + df['month'].str.zfill(2), format='%Y%m')\n
            \n

            Your attempt failed as it treated the value as epoch time:

            \n
            In [305]:\npd.to_datetime(20101, format='%Y-%m')\n\nOut[305]:\nTimestamp('1970-01-01 00:00:00.000020101')\n
            \n soup wrap:

            You need to convert to str if necessary, then zfill the month col and pass this with a valid format to to_datetime:

            In [303]:
            df['date'] = pd.to_datetime(df['year'].astype(str) + df['month'].astype(str).str.zfill(2), format='%Y%m')
            df
            
            Out[303]:
               year  month       pl       date
            0  2010      1  27.4376 2010-01-01
            1  2010      2  29.2314 2010-02-01
            2  2010      3  33.5714 2010-03-01
            3  2010      4  37.2986 2010-04-01
            4  2010      5  36.6971 2010-05-01
            5  2010      6  35.9329 2010-06-01
            

            If the conversion is unnecessary then the following should work:

            df['date'] = pd.to_datetime(df['year'] + df['month'].str.zfill(2), format='%Y%m')
            

            Your attempt failed as it treated the value as epoch time:

            In [305]:
            pd.to_datetime(20101, format='%Y-%m')
            
            Out[305]:
            Timestamp('1970-01-01 00:00:00.000020101')
            
            qid & accept id: (34243214, 34243523) query: Pivot Pandas Dataframe with a Mix of Numeric and Text Fields soup:

            The trick is to assign a race number (e.g. either 1 or 2) to each row depending on whether it should be associated with Race#1 or Race#2:

            \n
            df['race'] = df.groupby('Athlete').cumcount()+1\n#      Athlete Distance Race  Rank    Time  race\n# 0    M.Smith     400m    A     1   48.57     1\n# 1    A.Moyet     400m    A     2   49.00     1\n# 2  C.Marconi     800m    B     5  104.12     1\n# 3    M.Smith     800m    B     3  102.66     2\n
            \n

            Then the desired DataFrame can be expressed as the result of a set_index/unstack operation:

            \n
            result = df.set_index(['Athlete', 'race']).unstack('race')\n#           Distance       Race      Rank        Time        \n# race             1     2    1    2    1   2       1       2\n# Athlete                                                    \n# A.Moyet       400m   NaN    A  NaN    2 NaN   49.00     NaN\n# C.Marconi     800m   NaN    B  NaN    5 NaN  104.12     NaN\n# M.Smith       400m  800m    A    B    1   3   48.57  102.66\n
            \n

            set_index moves the Athlete and race columns into the index. The unstack operation moves the race index level into a column level.

            \n

            That, along with a little touching up to get the columns in the desired format:

            \n
            import pandas as pd\ndf = pd.DataFrame({'Athlete': ['M.Smith', 'A.Moyet', 'C.Marconi', 'M.Smith'],\n                   'Distance': ['400m', '400m', '800m', '800m'],\n                   'Race': ['A', 'A', 'B', 'B'],\n                   'Rank': [1, 2, 5, 3],\n                   'Time': [48.57, 49.0, 104.12, 102.66]})\n\ndf['race'] = df.groupby('Athlete').cumcount()+1\nresult = df.set_index(['Athlete', 'race']).unstack('race')\nresult = result.sortlevel('race', axis='columns')\nresult.columns = ['{}#{}'.format(col, n) for col, n in result.columns]\nprint(result)\n
            \n

            yields

            \n
                      Distance#1 Race#1  Rank#1  Time#1 Distance#2 Race#2  Rank#2  Time#2\nAthlete                                                                      \nA.Moyet         400m      A       2   49.00        NaN    NaN     NaN     NaN\nC.Marconi       800m      B       5  104.12        NaN    NaN     NaN     NaN\nM.Smith         400m      A       1   48.57       800m      B       3  102.66\n
            \n soup wrap:

            The trick is to assign a race number (e.g. either 1 or 2) to each row depending on whether it should be associated with Race#1 or Race#2:

            df['race'] = df.groupby('Athlete').cumcount()+1
            #      Athlete Distance Race  Rank    Time  race
            # 0    M.Smith     400m    A     1   48.57     1
            # 1    A.Moyet     400m    A     2   49.00     1
            # 2  C.Marconi     800m    B     5  104.12     1
            # 3    M.Smith     800m    B     3  102.66     2
            

            Then the desired DataFrame can be expressed as the result of a set_index/unstack operation:

            result = df.set_index(['Athlete', 'race']).unstack('race')
            #           Distance       Race      Rank        Time        
            # race             1     2    1    2    1   2       1       2
            # Athlete                                                    
            # A.Moyet       400m   NaN    A  NaN    2 NaN   49.00     NaN
            # C.Marconi     800m   NaN    B  NaN    5 NaN  104.12     NaN
            # M.Smith       400m  800m    A    B    1   3   48.57  102.66
            

            set_index moves the Athlete and race columns into the index. The unstack operation moves the race index level into a column level.

            That, along with a little touching up to get the columns in the desired format:

            import pandas as pd
            df = pd.DataFrame({'Athlete': ['M.Smith', 'A.Moyet', 'C.Marconi', 'M.Smith'],
                               'Distance': ['400m', '400m', '800m', '800m'],
                               'Race': ['A', 'A', 'B', 'B'],
                               'Rank': [1, 2, 5, 3],
                               'Time': [48.57, 49.0, 104.12, 102.66]})
            
            df['race'] = df.groupby('Athlete').cumcount()+1
            result = df.set_index(['Athlete', 'race']).unstack('race')
            result = result.sortlevel('race', axis='columns')
            result.columns = ['{}#{}'.format(col, n) for col, n in result.columns]
            print(result)
            

            yields

                      Distance#1 Race#1  Rank#1  Time#1 Distance#2 Race#2  Rank#2  Time#2
            Athlete                                                                      
            A.Moyet         400m      A       2   49.00        NaN    NaN     NaN     NaN
            C.Marconi       800m      B       5  104.12        NaN    NaN     NaN     NaN
            M.Smith         400m      A       1   48.57       800m      B       3  102.66
            
            qid & accept id: (34244726, 34369745) query: Computing 16-bit checksum of ICMPv6 header soup:

            I'll try to address your questions with an example.

            \n

            Let's take this sample capture from the Wireshark wiki so we have the same packet, open it in Wireshark and let's take the first ICMPv6 packet (frame 3).

            \n

            Note at least one important thing for this packet: the payload length for the IPv6 layer is 32 (0x20).

            \n

            Note: to extract a packet as a string on Wireshark, select the packet and the desired layer (e.g Ipv6) and then: right click > copy > bytes > hex stream

            \n

            Building the pseudo header

            \n

            To calculate the checksum, the 1st thing to do is to build the pseudo header according to RFC 2460 section 8.1.

            \n

            The checksum is calculated on the pseudo-header and the ICMPv6 packet.

            \n
            \n

            The IPv6 version of ICMP [ICMPv6] includes the above pseudo-header in\n its checksum computation

            \n
            \n

            To build the pseudo header we need:

            \n
              \n
            • Source IP
            • \n
            • Dest IP
            • \n
            • Upper-Layer Packet Length
            • \n
            • Next Header
            • \n
            \n

            Source and Dest IPs are from the IPv6 layer.

            \n

            Next Header field is fixed to 58:

            \n
            \n

            The Next Header field in the pseudo-header for ICMP contains the value 58, which identifies the IPv6 version of ICMP.

            \n
            \n

            Upper-Layer Packet Length :

            \n
            \n

            The Upper-Layer Packet Length in the pseudo-header is the length of\n the upper-layer header and data (e.g., TCP header plus TCP data). \n Some upper-layer protocols carry their own length information (e.g.,\n the Length field in the UDP header); for such protocols, that is the\n length used in the pseudo- header. Other protocols (such as TCP) do\n not carry their own length information, in which case the length used\n in the pseudo-header is the Payload Length from the IPv6 header, minus\n the length of any extension headers present between the IPv6 header\n and the upper-layer header.

            \n
            \n

            In our case, the upper layer (ICMPv6) doesn't carry a length field, so in this case, we have to use the payload length field from the IPv6 layer, which is 32 (0x20) for this packet.

            \n

            Let try some code:

            \n
            def build_pseudo_header(src_ip, dest_ip, payload_len):\n    source_ip_bytes = bytearray.fromhex(src_ip)\n    dest_ip_bytes = bytearray.fromhex(dest_ip)\n    next_header = struct.pack(">I", 58)\n    upper_layer_len = struct.pack(">I", payload_len)\n    return source_ip_bytes + dest_ip_bytes + upper_layer_len + next_header\n
            \n

            Code should be called like this :

            \n
            SOURCE_IP = "fe80000000000000020086fffe0580da"\nDEST_IP = "fe80000000000000026097fffe0769ea"\npseudo_header = build_pseudo_header(SOURCE_IP, DEST_IP, 32)\n
            \n

            Building the ICMPV6 packet

            \n

            As mentionned in the rfc 4443 section 2.3 the checksum field must be set to 0 prior any calculation.

            \n
            \n

            For computing the checksum, the checksum field is first set to zero.

            \n
            \n

            In this case I use the type and code fields from ICMPv6 as a signle 16-bit value. The checksum field is removed and the remainder of the packet is simply called "remainder":

            \n
            TYPE_CODE = "8700"\nREMAINDER = "00000000fe80000000000000026097fffe0769ea01010000860580da"\n
            \n

            Building the ICMPv6 part of the packet for checksum calculation:

            \n
            def build_icmpv6_chunk(type_and_code, other):\n    type_code_bytes = bytearray.fromhex(type_and_code)\n    checksum = struct.pack(">I", 0)  # make sure checksum is set to 0 here\n    other_bytes = bytearray.fromhex(other)\n    return type_code_bytes + checksum + other_bytes\n
            \n

            Called as follow:

            \n
            TYPE_CODE = "8700"\nREMAINDER = "00000000fe80000000000000026097fffe0769ea01010000860580da"\nicmpv6_chunk = build_icmpv6_chunk(TYPE_CODE, REMAINDER)\n
            \n

            Calculating the checksum

            \n

            Calculating the checksum is done according to RFC 1701. The main difficulty in Python is to wrap the sum in a 16-bit quantity.

            \n

            The input to the calc_checksum() function is the concatenation of the pseudo header and the ICMPv6 part of the packet (with the checksum set to 0):

            \n

            Python example:

            \n
            def calc_checksum(packet):\n    total = 0\n\n    # Add up 16-bit words\n    num_words = len(packet) // 2\n    for chunk in struct.unpack("!%sH" % num_words, packet[0:num_words*2]):\n        total += chunk\n\n    # Add any left over byte\n    if len(packet) % 2:\n        total += ord(packet[-1]) << 8\n\n    # Fold 32-bits into 16-bits\n    total = (total >> 16) + (total & 0xffff)\n    total += total >> 16\n    return (~total + 0x10000 & 0xffff)\n
            \n

            Code example

            \n

            The code is quite ugly but returns the correct checksum. In our example, this code returns 0x68db which is right according to wireshark.

            \n
            #!/usr/local/bin/python3\n# -*- coding: utf8 -*-\n\nimport struct\n\nSOURCE_IP = "fe80000000000000020086fffe0580da"\nDEST_IP = "fe80000000000000026097fffe0769ea"\nTYPE_CODE = "8700"\nREMAINDER = "00000000fe80000000000000026097fffe0769ea01010000860580da"\n\n\ndef calc_checksum(packet):\n    total = 0\n\n    # Add up 16-bit words\n    num_words = len(packet) // 2\n    for chunk in struct.unpack("!%sH" % num_words, packet[0:num_words*2]):\n        total += chunk\n\n    # Add any left over byte\n    if len(packet) % 2:\n        total += ord(packet[-1]) << 8\n\n    # Fold 32-bits into 16-bits\n    total = (total >> 16) + (total & 0xffff)\n    total += total >> 16\n    return (~total + 0x10000 & 0xffff)\n\n\ndef build_pseudo_header(src_ip, dest_ip, payload_len):\n    source_ip_bytes = bytearray.fromhex(src_ip)\n    dest_ip_bytes = bytearray.fromhex(dest_ip)\n    next_header = struct.pack(">I", 58)\n    upper_layer_len = struct.pack(">I", payload_len)\n    return source_ip_bytes + dest_ip_bytes + upper_layer_len + next_header\n\n\ndef build_icmpv6_chunk(type_and_code, other):\n    type_code_bytes = bytearray.fromhex(type_and_code)\n    checksum = struct.pack(">I", 0)\n    other_bytes = bytearray.fromhex(other)\n    return type_code_bytes + checksum + other_bytes\n\n\ndef main():\n    icmpv6_chunk = build_icmpv6_chunk(TYPE_CODE, REMAINDER)\n    pseudo_header = build_pseudo_header(SOURCE_IP, DEST_IP, 32)\n    icmpv6_packet = pseudo_header + icmpv6_chunk\n    checksum = calc_checksum(icmpv6_packet)\n\n    print("checksum: {:#x}".format(checksum))\n\nif __name__ == '__main__':\n    main()\n
            \n soup wrap:

            I'll try to address your questions with an example.

            Let's take this sample capture from the Wireshark wiki so we have the same packet, open it in Wireshark and let's take the first ICMPv6 packet (frame 3).

            Note at least one important thing for this packet: the payload length for the IPv6 layer is 32 (0x20).

            Note: to extract a packet as a string on Wireshark, select the packet and the desired layer (e.g Ipv6) and then: right click > copy > bytes > hex stream

            Building the pseudo header

            To calculate the checksum, the 1st thing to do is to build the pseudo header according to RFC 2460 section 8.1.

            The checksum is calculated on the pseudo-header and the ICMPv6 packet.

            The IPv6 version of ICMP [ICMPv6] includes the above pseudo-header in its checksum computation

            To build the pseudo header we need:

            • Source IP
            • Dest IP
            • Upper-Layer Packet Length
            • Next Header

            Source and Dest IPs are from the IPv6 layer.

            Next Header field is fixed to 58:

            The Next Header field in the pseudo-header for ICMP contains the value 58, which identifies the IPv6 version of ICMP.

            Upper-Layer Packet Length :

            The Upper-Layer Packet Length in the pseudo-header is the length of the upper-layer header and data (e.g., TCP header plus TCP data). Some upper-layer protocols carry their own length information (e.g., the Length field in the UDP header); for such protocols, that is the length used in the pseudo- header. Other protocols (such as TCP) do not carry their own length information, in which case the length used in the pseudo-header is the Payload Length from the IPv6 header, minus the length of any extension headers present between the IPv6 header and the upper-layer header.

            In our case, the upper layer (ICMPv6) doesn't carry a length field, so in this case, we have to use the payload length field from the IPv6 layer, which is 32 (0x20) for this packet.

            Let try some code:

            def build_pseudo_header(src_ip, dest_ip, payload_len):
                source_ip_bytes = bytearray.fromhex(src_ip)
                dest_ip_bytes = bytearray.fromhex(dest_ip)
                next_header = struct.pack(">I", 58)
                upper_layer_len = struct.pack(">I", payload_len)
                return source_ip_bytes + dest_ip_bytes + upper_layer_len + next_header
            

            Code should be called like this :

            SOURCE_IP = "fe80000000000000020086fffe0580da"
            DEST_IP = "fe80000000000000026097fffe0769ea"
            pseudo_header = build_pseudo_header(SOURCE_IP, DEST_IP, 32)
            

            Building the ICMPV6 packet

            As mentionned in the rfc 4443 section 2.3 the checksum field must be set to 0 prior any calculation.

            For computing the checksum, the checksum field is first set to zero.

            In this case I use the type and code fields from ICMPv6 as a signle 16-bit value. The checksum field is removed and the remainder of the packet is simply called "remainder":

            TYPE_CODE = "8700"
            REMAINDER = "00000000fe80000000000000026097fffe0769ea01010000860580da"
            

            Building the ICMPv6 part of the packet for checksum calculation:

            def build_icmpv6_chunk(type_and_code, other):
                type_code_bytes = bytearray.fromhex(type_and_code)
                checksum = struct.pack(">I", 0)  # make sure checksum is set to 0 here
                other_bytes = bytearray.fromhex(other)
                return type_code_bytes + checksum + other_bytes
            

            Called as follow:

            TYPE_CODE = "8700"
            REMAINDER = "00000000fe80000000000000026097fffe0769ea01010000860580da"
            icmpv6_chunk = build_icmpv6_chunk(TYPE_CODE, REMAINDER)
            

            Calculating the checksum

            Calculating the checksum is done according to RFC 1701. The main difficulty in Python is to wrap the sum in a 16-bit quantity.

            The input to the calc_checksum() function is the concatenation of the pseudo header and the ICMPv6 part of the packet (with the checksum set to 0):

            Python example:

            def calc_checksum(packet):
                total = 0
            
                # Add up 16-bit words
                num_words = len(packet) // 2
                for chunk in struct.unpack("!%sH" % num_words, packet[0:num_words*2]):
                    total += chunk
            
                # Add any left over byte
                if len(packet) % 2:
                    total += ord(packet[-1]) << 8
            
                # Fold 32-bits into 16-bits
                total = (total >> 16) + (total & 0xffff)
                total += total >> 16
                return (~total + 0x10000 & 0xffff)
            

            Code example

            The code is quite ugly but returns the correct checksum. In our example, this code returns 0x68db which is right according to wireshark.

            #!/usr/local/bin/python3
            # -*- coding: utf8 -*-
            
            import struct
            
            SOURCE_IP = "fe80000000000000020086fffe0580da"
            DEST_IP = "fe80000000000000026097fffe0769ea"
            TYPE_CODE = "8700"
            REMAINDER = "00000000fe80000000000000026097fffe0769ea01010000860580da"
            
            
            def calc_checksum(packet):
                total = 0
            
                # Add up 16-bit words
                num_words = len(packet) // 2
                for chunk in struct.unpack("!%sH" % num_words, packet[0:num_words*2]):
                    total += chunk
            
                # Add any left over byte
                if len(packet) % 2:
                    total += ord(packet[-1]) << 8
            
                # Fold 32-bits into 16-bits
                total = (total >> 16) + (total & 0xffff)
                total += total >> 16
                return (~total + 0x10000 & 0xffff)
            
            
            def build_pseudo_header(src_ip, dest_ip, payload_len):
                source_ip_bytes = bytearray.fromhex(src_ip)
                dest_ip_bytes = bytearray.fromhex(dest_ip)
                next_header = struct.pack(">I", 58)
                upper_layer_len = struct.pack(">I", payload_len)
                return source_ip_bytes + dest_ip_bytes + upper_layer_len + next_header
            
            
            def build_icmpv6_chunk(type_and_code, other):
                type_code_bytes = bytearray.fromhex(type_and_code)
                checksum = struct.pack(">I", 0)
                other_bytes = bytearray.fromhex(other)
                return type_code_bytes + checksum + other_bytes
            
            
            def main():
                icmpv6_chunk = build_icmpv6_chunk(TYPE_CODE, REMAINDER)
                pseudo_header = build_pseudo_header(SOURCE_IP, DEST_IP, 32)
                icmpv6_packet = pseudo_header + icmpv6_chunk
                checksum = calc_checksum(icmpv6_packet)
            
                print("checksum: {:#x}".format(checksum))
            
            if __name__ == '__main__':
                main()
            
            qid & accept id: (34251684, 34251838) query: Python full-screen graphics soup:

            The standard way to make python GUI and graphics is with the tkinter package, this tutorial should get you started. As for full screen graphics,\nadd the fullscreen attribute to your tk object:

            \n
            Tk.attributes("-fullscreen", True)\n
            \n

            Check out this question for alternate answers.

            \n

            If you want to stick with graphics.py, I would give the window the same height and width as your resolution, on windows:

            \n
            from win32api import GetSystemMetrics\n\nwidth = GetSystemMetrics(0)\nheight = GetSystemMetrics(1)\n\nwin = GraphWin('Face', width, height)\n
            \n

            Based on this, not so sure on the linux way.

            \n

            Also check out PyGTK for another way to make a GUI.

            \n soup wrap:

            The standard way to make python GUI and graphics is with the tkinter package, this tutorial should get you started. As for full screen graphics, add the fullscreen attribute to your tk object:

            Tk.attributes("-fullscreen", True)
            

            Check out this question for alternate answers.

            If you want to stick with graphics.py, I would give the window the same height and width as your resolution, on windows:

            from win32api import GetSystemMetrics
            
            width = GetSystemMetrics(0)
            height = GetSystemMetrics(1)
            
            win = GraphWin('Face', width, height)
            

            Based on this, not so sure on the linux way.

            Also check out PyGTK for another way to make a GUI.

            qid & accept id: (34260522, 34260663) query: Selection of rows by condition soup:

            Once you got a boolean array you can select only the rows where it is True by doing df[boolean_array] or only the rows where it is False by adding ~, df[~boolean_array].

            \n

            As for your question, you can either use the drop method or do it yourself:

            \n
            df_total_data[df_total_data.apply(lambda x: 'secure' not in  x['BBBlink'],1 ).values]\n
            \n

            Just remember that this is not inplace so you need to either assign the returned value to a new dataframe or re-assign it to the existing one.

            \n

            By the way, you can simplify your condition a bit:

            \n
             df_total_data[df_total_data['BBBlink'].apply(lambda x: 'secure' not in  x)]\n
            \n soup wrap:

            Once you got a boolean array you can select only the rows where it is True by doing df[boolean_array] or only the rows where it is False by adding ~, df[~boolean_array].

            As for your question, you can either use the drop method or do it yourself:

            df_total_data[df_total_data.apply(lambda x: 'secure' not in  x['BBBlink'],1 ).values]
            

            Just remember that this is not inplace so you need to either assign the returned value to a new dataframe or re-assign it to the existing one.

            By the way, you can simplify your condition a bit:

             df_total_data[df_total_data['BBBlink'].apply(lambda x: 'secure' not in  x)]
            
            qid & accept id: (34271014, 34271789) query: Using pandas to plot data soup:

            Another method would include pivot. Starting from your dataframe df, I would set the index to Date:

            \n
            df = df.set_index('Date')\n
            \n

            and then pivot the table according to your values:

            \n
            d = pd.pivot_table(df,index=df.index, columns='Name', values='Activity').fillna(0)\n
            \n

            This returns this structure:

            \n
            Name        A    B  C\nDate                 \n2015-01-02  1  1.5  0\n2015-01-03  2  1.0  0\n2015-01-04  2  5.0  0\n2015-01-31  0  0.0  1\n
            \n

            And in base of your needs you can simply plot it with:

            \n
            d.plot()\n
            \n

            Actually you have some duplicate values in the example, but now the plot looks like the following. Hope that helps.

            \n

            enter image description here

            \n soup wrap:

            Another method would include pivot. Starting from your dataframe df, I would set the index to Date:

            df = df.set_index('Date')
            

            and then pivot the table according to your values:

            d = pd.pivot_table(df,index=df.index, columns='Name', values='Activity').fillna(0)
            

            This returns this structure:

            Name        A    B  C
            Date                 
            2015-01-02  1  1.5  0
            2015-01-03  2  1.0  0
            2015-01-04  2  5.0  0
            2015-01-31  0  0.0  1
            

            And in base of your needs you can simply plot it with:

            d.plot()
            

            Actually you have some duplicate values in the example, but now the plot looks like the following. Hope that helps.

            enter image description here

            qid & accept id: (34286165, 34286968) query: python click usage of standalone_mode soup:

            Just sending standalone_mode as a keyword argument worked for me:

            \n
            from __future__ import print_function\nimport click\n\n@click.command()\n@click.option('--name', help='Enter Name')\n@click.pass_context\ndef gatherarguments(ctx, name):\n    return ctx\n\ndef usectx(ctx):\n    print("Name is %s" % ctx.params['name'])\n\nif __name__ == '__main__':\n    ctx = gatherarguments(standalone_mode=False)\n    print(ctx)\n    usectx(ctx)\n
            \n

            Output:

            \n
            ./clickme.py --name something\n\nName is something\n
            \n soup wrap:

            Just sending standalone_mode as a keyword argument worked for me:

            from __future__ import print_function
            import click
            
            @click.command()
            @click.option('--name', help='Enter Name')
            @click.pass_context
            def gatherarguments(ctx, name):
                return ctx
            
            def usectx(ctx):
                print("Name is %s" % ctx.params['name'])
            
            if __name__ == '__main__':
                ctx = gatherarguments(standalone_mode=False)
                print(ctx)
                usectx(ctx)
            

            Output:

            ./clickme.py --name something
            
            Name is something
            
            qid & accept id: (34300499, 34301077) query: Django permissions mixin on CBV, how to apply on 'publish blog' method soup:

            Too ways are possible:

            \n
              \n
            1. create CBV for publish and override post method:

              \n
              class PublishView(UpdateView):\n\n    model = Blog\n\n    def post(self, request, *args, **kwargs):\n        pk = self.kwargs.get('pk', None)\n        Blog.objects.filter(pk=pk).update(publish_date=datetime.datetime.now())\n        return HttpResponseRedirect("/blogs/" + pk)\n
              \n
                \n
              1. Inside the Blog UpdateView define publish as staticmethod:
              2. \n
            2. \n
            \n
                 class BlogUpdateView(UpdateView):\n\n            model = Blog\n\n            @staticmethod\n            def publish(request, pk):\n               if request.method == "GET":\n               Blog.objects.filter(pk=pk).update(publish_date=datetime.datetime.now())\n               return HttpResponseRedirect("/blogs/" + pk)\n
            \n

            \n in urls you access your function directly your publish method

            \n
            url(r"^(?P[0-9]+)/publish/$", views.BlogUpdateView.publish, name="publish"),\n
            \n soup wrap:

            Too ways are possible:

            1. create CBV for publish and override post method:

              class PublishView(UpdateView):
              
                  model = Blog
              
                  def post(self, request, *args, **kwargs):
                      pk = self.kwargs.get('pk', None)
                      Blog.objects.filter(pk=pk).update(publish_date=datetime.datetime.now())
                      return HttpResponseRedirect("/blogs/" + pk)
              
              1. Inside the Blog UpdateView define publish as staticmethod:
                 class BlogUpdateView(UpdateView):
            
                        model = Blog
            
                        @staticmethod
                        def publish(request, pk):
                           if request.method == "GET":
                           Blog.objects.filter(pk=pk).update(publish_date=datetime.datetime.now())
                           return HttpResponseRedirect("/blogs/" + pk)
            

            in urls you access your function directly your publish method

            url(r"^(?P[0-9]+)/publish/$", views.BlogUpdateView.publish, name="publish"),
            
            qid & accept id: (34301088, 34302107) query: Reading/Writing out a dictionary to csv file in python soup:

            I don't find enough reasons to use Pandas here for a relatively simple problem. Also note to OP, if you want to store the value in file and read it back go for JSON instead of CSV. Exporting to CSV will be helpful to interactive with other people(potentially Excel users).

            \n

            Here is how I will store it in CSV

            \n
            value1 = 'one'\nvalue2 = 'two'\nd = { \n        'key1': (value1, value2), \n        'key2': (value1, value2), \n        'key3': (value1, value2)\n    }\nCSV ="\n".join([k+','+",".join(v) for k,v in d.items()]) \nprint CSV #You can store this string variable to file as you wish\n# with open("filename.csv", "w") as file:\n    # file.write(CSV)\n
            \n

            This code explains what happens inside the list comrpehension.

            \n
            CSV = ""\nfor k,v in d.items():\n    line = "{},{}\n".format(k, ",".join(v))\n    CSV+=line\nprint CSV \n
            \n soup wrap:

            I don't find enough reasons to use Pandas here for a relatively simple problem. Also note to OP, if you want to store the value in file and read it back go for JSON instead of CSV. Exporting to CSV will be helpful to interactive with other people(potentially Excel users).

            Here is how I will store it in CSV

            value1 = 'one'
            value2 = 'two'
            d = { 
                    'key1': (value1, value2), 
                    'key2': (value1, value2), 
                    'key3': (value1, value2)
                }
            CSV ="\n".join([k+','+",".join(v) for k,v in d.items()]) 
            print CSV #You can store this string variable to file as you wish
            # with open("filename.csv", "w") as file:
                # file.write(CSV)
            

            This code explains what happens inside the list comrpehension.

            CSV = ""
            for k,v in d.items():
                line = "{},{}\n".format(k, ",".join(v))
                CSV+=line
            print CSV 
            
            qid & accept id: (34371807, 34371850) query: How to determine if a decimal fraction can be represented exactly as Python float? soup:

            You could use the fractions module to check if a given fraction can be represented:

            \n
            from fractions import Fraction\n\ndef can_be_represented(num, den):\n    f = Fraction(num, den)\n    return Fraction.from_float(float(f)) == f\n
            \n

            Because floating point numbers use binary fractions, you'll soon find that this can be simplified to checking for a denominator that is a power of two:

            \n
            def can_be_represented(num, den):\n    f = Fraction(num, den)\n    return f.denominator & (f.denominator - 1) == 0\n
            \n soup wrap:

            You could use the fractions module to check if a given fraction can be represented:

            from fractions import Fraction
            
            def can_be_represented(num, den):
                f = Fraction(num, den)
                return Fraction.from_float(float(f)) == f
            

            Because floating point numbers use binary fractions, you'll soon find that this can be simplified to checking for a denominator that is a power of two:

            def can_be_represented(num, den):
                f = Fraction(num, den)
                return f.denominator & (f.denominator - 1) == 0
            
            qid & accept id: (34374393, 34374551) query: Python find which order element is in in a list soup:

            I think, you can do this really easy with

            \n
            words = 'the horse and the rider'.split(' ')\nlook_for = 'the'\nindices = [i for i, word in enumerate(words) if word == look_for]\nprint(indices)\n
            \n

            This would print [0, 3]

            \n

            Edit:

            \n

            This solutions works for multiple occurences.

            \n

            After the OP has rephrased his question, I would suggest something like

            \n
            text = "My name is Alice and his name is Bob"\nwords = text.split(' ')\nindices = [i+2 for i, word in enumerate(words) if word == 'name']\nnames = [words[i] for i in indices if i < len(words)]\n
            \n soup wrap:

            I think, you can do this really easy with

            words = 'the horse and the rider'.split(' ')
            look_for = 'the'
            indices = [i for i, word in enumerate(words) if word == look_for]
            print(indices)
            

            This would print [0, 3]

            Edit:

            This solutions works for multiple occurences.

            After the OP has rephrased his question, I would suggest something like

            text = "My name is Alice and his name is Bob"
            words = text.split(' ')
            indices = [i+2 for i, word in enumerate(words) if word == 'name']
            names = [words[i] for i in indices if i < len(words)]
            
            qid & accept id: (34382144, 34389783) query: Changing number representation in IDLE soup:

            IDLE does not have a number representation. It sends the code you enter to a Python interpreter and displays the string sent back in response. In this sense, it is irrelevant that IDLE is written in Python. The same is true of any IDE or REPL for Python code.

            \n

            That said, the CPython sys module has a displayhook function. For 3.5:

            \n
            >>> help(sys.displayhook)\nHelp on built-in function displayhook in module sys:\n\ndisplayhook(...)\n    displayhook(object) -> None\n\n    Print an object to sys.stdout and also save it in builtins._\n
            \n

            That actually should be __builtins__._, as in the example below. Note that the input is any Python object. For IDLE, the default sys.displayhook is a function defined in idlelib/rpc.py. Here is an example relevant to your question.

            \n
            >>> def new_hook(ob):\n    if type(ob) is int:\n        ob = hex(ob)\n    __builtins__._ = ob\n    print(ob)\n\n\n>>> sys.displayhook = new_hook\n>>> 33\n0x21\n>>> 0x21\n0x21\n
            \n

            This gives you the more important half of what you asked for. Before actually using anything in IDLE, I would look at the default version to make sure I did not miss anything. One could write an extension to add menu entries that would switch displayhooks.

            \n

            Python intentionally does not have an input preprocessor function. GvR wants the contents of a .py file to always be python code as defined in some version of the reference manual.

            \n

            I have thought about the possibility of adding an inputhook to IDLE, but I would not allow one to be active when running a .py file from the editor. If there were one added for the Shell, I would change the prompt from '>>>' to something else, such as 'hex>' or 'bin>'.

            \n

            EDIT:\nOne could also write an extension to rewrite input code when explicitly requested either with a menu selection or a hot key or key binding. Or one could edit the current idlelib/ScriptBinding.py to make rewriting automatic. The hook I have thought about would make this easier, but not expand what can be done now.

            \n soup wrap:

            IDLE does not have a number representation. It sends the code you enter to a Python interpreter and displays the string sent back in response. In this sense, it is irrelevant that IDLE is written in Python. The same is true of any IDE or REPL for Python code.

            That said, the CPython sys module has a displayhook function. For 3.5:

            >>> help(sys.displayhook)
            Help on built-in function displayhook in module sys:
            
            displayhook(...)
                displayhook(object) -> None
            
                Print an object to sys.stdout and also save it in builtins._
            

            That actually should be __builtins__._, as in the example below. Note that the input is any Python object. For IDLE, the default sys.displayhook is a function defined in idlelib/rpc.py. Here is an example relevant to your question.

            >>> def new_hook(ob):
                if type(ob) is int:
                    ob = hex(ob)
                __builtins__._ = ob
                print(ob)
            
            
            >>> sys.displayhook = new_hook
            >>> 33
            0x21
            >>> 0x21
            0x21
            

            This gives you the more important half of what you asked for. Before actually using anything in IDLE, I would look at the default version to make sure I did not miss anything. One could write an extension to add menu entries that would switch displayhooks.

            Python intentionally does not have an input preprocessor function. GvR wants the contents of a .py file to always be python code as defined in some version of the reference manual.

            I have thought about the possibility of adding an inputhook to IDLE, but I would not allow one to be active when running a .py file from the editor. If there were one added for the Shell, I would change the prompt from '>>>' to something else, such as 'hex>' or 'bin>'.

            EDIT: One could also write an extension to rewrite input code when explicitly requested either with a menu selection or a hot key or key binding. Or one could edit the current idlelib/ScriptBinding.py to make rewriting automatic. The hook I have thought about would make this easier, but not expand what can be done now.

            qid & accept id: (34400455, 34400484) query: Updating a value in a dictionary inside a dictionary soup:

            You could just nest loops:

            \n
            for contact_dict in list_of_dicts:\n    for phone_dict in contact_dict['phoneNumbers']:\n        phone_dict['phone'] = phone_dict['phone'].replace('-', '')\n
            \n

            This alters the values in-place.

            \n

            Or you could create a whole new copy of the structure, with the alterations made:

            \n
            [dict(contact, phoneNumbers=[\n    dict(phone_dict, phone=phone_dict['phone'].replace('-', '')) \n    for phone_dict in contact['phoneNumbers']])\n for contact in list_of_dicts]\n
            \n

            This creates a semi-shallow copy; only the phoneNumbers key is explicitly copied, but any other mutable values are just referenced by the new dictionaries.

            \n

            Demo:

            \n
            >>> list_of_dicts = [{'name': 'Rob', 'phoneNumbers': [{'phone': '123-3214', 'type': 'home'}, {'phone': '456-3216', 'type': 'work'}]}]\n>>> [dict(contact, phoneNumbers=[\n...     dict(phone_dict, phone=phone_dict['phone'].replace('-', ''))\n...     for phone_dict in contact['phoneNumbers']])\n...  for contact in list_of_dicts]\n[{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}]\n>>> for contact_dict in list_of_dicts:\n...     for phone_dict in contact_dict['phoneNumbers']:\n...         phone_dict['phone'] = phone_dict['phone'].replace('-', '')\n...\n>>> list_of_dicts\n[{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}]\n
            \n soup wrap:

            You could just nest loops:

            for contact_dict in list_of_dicts:
                for phone_dict in contact_dict['phoneNumbers']:
                    phone_dict['phone'] = phone_dict['phone'].replace('-', '')
            

            This alters the values in-place.

            Or you could create a whole new copy of the structure, with the alterations made:

            [dict(contact, phoneNumbers=[
                dict(phone_dict, phone=phone_dict['phone'].replace('-', '')) 
                for phone_dict in contact['phoneNumbers']])
             for contact in list_of_dicts]
            

            This creates a semi-shallow copy; only the phoneNumbers key is explicitly copied, but any other mutable values are just referenced by the new dictionaries.

            Demo:

            >>> list_of_dicts = [{'name': 'Rob', 'phoneNumbers': [{'phone': '123-3214', 'type': 'home'}, {'phone': '456-3216', 'type': 'work'}]}]
            >>> [dict(contact, phoneNumbers=[
            ...     dict(phone_dict, phone=phone_dict['phone'].replace('-', ''))
            ...     for phone_dict in contact['phoneNumbers']])
            ...  for contact in list_of_dicts]
            [{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}]
            >>> for contact_dict in list_of_dicts:
            ...     for phone_dict in contact_dict['phoneNumbers']:
            ...         phone_dict['phone'] = phone_dict['phone'].replace('-', '')
            ...
            >>> list_of_dicts
            [{'phoneNumbers': [{'phone': '1233214', 'type': 'home'}, {'phone': '4563216', 'type': 'work'}], 'name': 'Rob'}]
            
            qid & accept id: (34403494, 34403541) query: How to write variable and array on the same line for a text file? soup:

            precede the for-loop by\nthefile.write("%s ", name)

            \n

            Your lat lines would look like this:

            \n
            thefile.write("%s " % name)\nfor item in thelist:\n  thefile.write("%s,"% item)\n
            \n

            In addition, to remove the annoying comma at the end, you can consider

            \n
            outStr = ",".join(map(str, thelist))\nthefile.write("%s %s" %(name, outStr))\n
            \n soup wrap:

            precede the for-loop by thefile.write("%s ", name)

            Your lat lines would look like this:

            thefile.write("%s " % name)
            for item in thelist:
              thefile.write("%s,"% item)
            

            In addition, to remove the annoying comma at the end, you can consider

            outStr = ",".join(map(str, thelist))
            thefile.write("%s %s" %(name, outStr))
            
            qid & accept id: (34419926, 34420260) query: How to make QtGui window process events whenever it is brought forward on the screen? soup:

            You should look into QThread.

            \n

            Threads allow you to run long, complicated tasks in a worker thread while a background thread keeps the GUI responsive, such as updating a QProgressBar, ensuring it responds to motion events.

            \n

            The basic idea is this:

            \n
            # load modules\nimport time\n\nfrom PySide import QtCore, QtGui\n\n\n# APPLICATION STUFF\n# -----------------\n\nAPP = QtGui.QApplication([])\n\n\n# THREADS\n# -------\n\n\nclass WorkerThread(QtCore.QThread):\n    '''Does the work'''\n\n    def __init__(self):\n        super(WorkerThread, self).__init__()\n\n        self.running = True\n\n    def run(self):\n        '''This starts the thread on the start() call'''\n\n        # this goes over 1000 numbers, at 10 a second, will take\n        # 100 seconds to complete, over a minute\n        for i in range(1000):\n            print(i)\n            time.sleep(0.1)\n\n        self.running = False\n\n\nclass BackgroundThread(QtCore.QThread):\n    '''Keeps the main loop responsive'''\n\n    def __init__(self, worker):\n        super(BackgroundThread, self).__init__()\n\n        self.worker = worker\n\n    def run(self):\n        '''This starts the thread on the start() call'''\n\n        while self.worker.running:\n            APP.processEvents()\n            print("Updating the main loop")\n            time.sleep(0.1)\n\n\n# MAIN\n# ----\n\n\ndef main():\n    # make threads\n    worker = WorkerThread()\n    background = BackgroundThread(worker)\n\n    # start the threads\n    worker.start()\n    background.start()\n    # wait until done\n    worker.wait()\n\nif __name__ == '__main__':\n    main()\n
            \n

            The output you get is something like this, showing how it takes turns at doing the long calculation and updating the main loop:

            \n
            0\n Updating the main loop\n1\nUpdating the main loop\n2\nUpdating the main loop\n3\nUpdating the main loop\n4\nUpdating the main loop\n5\nUpdating the main loop\n6\nUpdating the main loop\nUpdating the main loop7\n\n8\nUpdating the main loop\n9\n
            \n

            This along with a QFocusEvent override should allow you to do whatever you wish. But it's better to separate updating the GUI and running your desired long thread.

            \n

            As for overriding the QFocusEvent you can do something as follows:

            \n
            def focusInEvent(self, event):\n    event.accept()\n\n    # insert your code here\n
            \n

            And if you choose to implement threads to avoid GUI blocking, you should read about the basics of threading (as threads have a lot of nuances unless you know about their potential pitfalls).

            \n soup wrap:

            You should look into QThread.

            Threads allow you to run long, complicated tasks in a worker thread while a background thread keeps the GUI responsive, such as updating a QProgressBar, ensuring it responds to motion events.

            The basic idea is this:

            # load modules
            import time
            
            from PySide import QtCore, QtGui
            
            
            # APPLICATION STUFF
            # -----------------
            
            APP = QtGui.QApplication([])
            
            
            # THREADS
            # -------
            
            
            class WorkerThread(QtCore.QThread):
                '''Does the work'''
            
                def __init__(self):
                    super(WorkerThread, self).__init__()
            
                    self.running = True
            
                def run(self):
                    '''This starts the thread on the start() call'''
            
                    # this goes over 1000 numbers, at 10 a second, will take
                    # 100 seconds to complete, over a minute
                    for i in range(1000):
                        print(i)
                        time.sleep(0.1)
            
                    self.running = False
            
            
            class BackgroundThread(QtCore.QThread):
                '''Keeps the main loop responsive'''
            
                def __init__(self, worker):
                    super(BackgroundThread, self).__init__()
            
                    self.worker = worker
            
                def run(self):
                    '''This starts the thread on the start() call'''
            
                    while self.worker.running:
                        APP.processEvents()
                        print("Updating the main loop")
                        time.sleep(0.1)
            
            
            # MAIN
            # ----
            
            
            def main():
                # make threads
                worker = WorkerThread()
                background = BackgroundThread(worker)
            
                # start the threads
                worker.start()
                background.start()
                # wait until done
                worker.wait()
            
            if __name__ == '__main__':
                main()
            

            The output you get is something like this, showing how it takes turns at doing the long calculation and updating the main loop:

            0
             Updating the main loop
            1
            Updating the main loop
            2
            Updating the main loop
            3
            Updating the main loop
            4
            Updating the main loop
            5
            Updating the main loop
            6
            Updating the main loop
            Updating the main loop7
            
            8
            Updating the main loop
            9
            

            This along with a QFocusEvent override should allow you to do whatever you wish. But it's better to separate updating the GUI and running your desired long thread.

            As for overriding the QFocusEvent you can do something as follows:

            def focusInEvent(self, event):
                event.accept()
            
                # insert your code here
            

            And if you choose to implement threads to avoid GUI blocking, you should read about the basics of threading (as threads have a lot of nuances unless you know about their potential pitfalls).

            qid & accept id: (34428730, 34429190) query: In python convert day of year to month and fortnight soup:

            You can use the replace method:

            \n
            In [11]: d\nOut[11]: datetime.datetime(2004, 3, 28, 0, 0)\n\nIn [12]: d.replace(day=1 if d.day < 15 else 15)\nOut[12]: datetime.datetime(2004, 3, 15, 0, 0)\n\nIn [13]: t = pd.Timestamp(d)\n\nIn [14]: t.replace(day=1 if t.day < 15 else 15)\nOut[14]: Timestamp('2004-03-15 00:00:00')\n
            \n

            The reason this returns a new datetime rather than updating is because datetime objects are immutable (they are can't be updated).

            \n

            Note: there's a format for that day of the month:

            \n
            In [21]: datetime.datetime.strptime("2004+88", "%Y+%j")\nOut[21]: datetime.datetime(2004, 3, 28, 0, 0)\n\nIn [22]: pd.to_datetime("2004+88", format="%Y+%j")\nOut[22]: Timestamp('2004-03-28 00:00:00')\n
            \n soup wrap:

            You can use the replace method:

            In [11]: d
            Out[11]: datetime.datetime(2004, 3, 28, 0, 0)
            
            In [12]: d.replace(day=1 if d.day < 15 else 15)
            Out[12]: datetime.datetime(2004, 3, 15, 0, 0)
            
            In [13]: t = pd.Timestamp(d)
            
            In [14]: t.replace(day=1 if t.day < 15 else 15)
            Out[14]: Timestamp('2004-03-15 00:00:00')
            

            The reason this returns a new datetime rather than updating is because datetime objects are immutable (they are can't be updated).

            Note: there's a format for that day of the month:

            In [21]: datetime.datetime.strptime("2004+88", "%Y+%j")
            Out[21]: datetime.datetime(2004, 3, 28, 0, 0)
            
            In [22]: pd.to_datetime("2004+88", format="%Y+%j")
            Out[22]: Timestamp('2004-03-28 00:00:00')
            
            qid & accept id: (34439723, 34440702) query: setting unique abbreviation for every column in python soup:

            Your problem isn't completely specified but seems fun. I took a stab at it. I wrote a function which takes a list of phrases and returns a dictionary where the abbreviations function as keys. It starts by taking the first two letters of each word and joining them for a candidate abbreviation. If that abbreviation has been used before it gradually brings into play more and more letters from the beginning of each word until you get a unique abbreviation. I then tested it on your sample data. You will almost certainly want to modify it but it should give you some ideas:

            \n
            def makeAbbreviations(headers):\n    abbreviations = {}\n    for header in headers:\n        header = header.lower()\n        words = header.split()\n        n = max(len(w) for w in words)\n        i = 2\n        starts = [w[:i] for w in words]\n        abbrev = ''.join(starts)\n\n        while abbrev in abbreviations and i <= n:\n            i += 1\n            for j,w in enumerate(words):\n                starts[j] = w[:i]\n                abbrev = ''.join(starts)\n                if not abbrev in abbreviations: break\n        abbreviations[abbrev] = header\n    return abbreviations\n\nmyHeaders = ['Ad Group', 'Annuity Calculator', 'Tax Deferred Annuity',\n             'Annuity Tables', 'annuities calculator', 'annuity formula',\n             'Annuities Explained', 'Deferred Annuies Calculator',\n             'Current Annuity Rates', 'Forbes.com', 'Annuity Definition',\n             'fixed income', 'Immediate fixed Annuities',\n             'Deferred Variable Annuities', '401k Rollover',\n             'Deferred Annuity Rates', 'Deferred Annuities',\n             'Immediate Annuities Definition', 'Immediate Variable Annuities',\n             'Variable Annuity', 'Aig Annuities', 'Retirement Income', 'retirment system',\n             'Online Financial Planner', 'Certified Financial Planner']\n\nd = makeAbbreviations(myHeaders)\nfor (k,v) in d.items(): print(k,v,sep = " = ")\n
            \n

            Output:

            \n
            imande = immediate annuities definition\nadgr = ad group\nfiin = fixed income\n40ro = 401k rollover\nresy = retirment system\nvaan = variable annuity\ndevaan = deferred variable annuities\nrein = retirement income\nimvaan = immediate variable annuities\nfo = forbes.com\nimfian = immediate fixed annuities\ndean = deferred annuities\nanca = annuity calculator\ncuanra = current annuity rates\nannca = annuities calculator\nonfipl = online financial planner\naian = aig annuities\nande = annuity definition\nanfo = annuity formula\ncefipl = certified financial planner\ntadean = tax deferred annuity\ndeanca = deferred annuies calculator\nanex = annuities explained\nanta = annuity tables\ndeanra = deferred annuity rates\n
            \n soup wrap:

            Your problem isn't completely specified but seems fun. I took a stab at it. I wrote a function which takes a list of phrases and returns a dictionary where the abbreviations function as keys. It starts by taking the first two letters of each word and joining them for a candidate abbreviation. If that abbreviation has been used before it gradually brings into play more and more letters from the beginning of each word until you get a unique abbreviation. I then tested it on your sample data. You will almost certainly want to modify it but it should give you some ideas:

            def makeAbbreviations(headers):
                abbreviations = {}
                for header in headers:
                    header = header.lower()
                    words = header.split()
                    n = max(len(w) for w in words)
                    i = 2
                    starts = [w[:i] for w in words]
                    abbrev = ''.join(starts)
            
                    while abbrev in abbreviations and i <= n:
                        i += 1
                        for j,w in enumerate(words):
                            starts[j] = w[:i]
                            abbrev = ''.join(starts)
                            if not abbrev in abbreviations: break
                    abbreviations[abbrev] = header
                return abbreviations
            
            myHeaders = ['Ad Group', 'Annuity Calculator', 'Tax Deferred Annuity',
                         'Annuity Tables', 'annuities calculator', 'annuity formula',
                         'Annuities Explained', 'Deferred Annuies Calculator',
                         'Current Annuity Rates', 'Forbes.com', 'Annuity Definition',
                         'fixed income', 'Immediate fixed Annuities',
                         'Deferred Variable Annuities', '401k Rollover',
                         'Deferred Annuity Rates', 'Deferred Annuities',
                         'Immediate Annuities Definition', 'Immediate Variable Annuities',
                         'Variable Annuity', 'Aig Annuities', 'Retirement Income', 'retirment system',
                         'Online Financial Planner', 'Certified Financial Planner']
            
            d = makeAbbreviations(myHeaders)
            for (k,v) in d.items(): print(k,v,sep = " = ")
            

            Output:

            imande = immediate annuities definition
            adgr = ad group
            fiin = fixed income
            40ro = 401k rollover
            resy = retirment system
            vaan = variable annuity
            devaan = deferred variable annuities
            rein = retirement income
            imvaan = immediate variable annuities
            fo = forbes.com
            imfian = immediate fixed annuities
            dean = deferred annuities
            anca = annuity calculator
            cuanra = current annuity rates
            annca = annuities calculator
            onfipl = online financial planner
            aian = aig annuities
            ande = annuity definition
            anfo = annuity formula
            cefipl = certified financial planner
            tadean = tax deferred annuity
            deanca = deferred annuies calculator
            anex = annuities explained
            anta = annuity tables
            deanra = deferred annuity rates
            
            qid & accept id: (34444319, 34444673) query: How to split a string by a string except when the string is in quotes in python? soup:

            You can use the following regex with re.findall:

            \n
            ((?:(?!\band\b)[^'])*(?:'[^'\\]*(?:\\.[^'\\]*)*'(?:(?!\band\b)[^'])*)*)(?:and|$)\n
            \n

            See the regex demo.

            \n

            The regular expression consists of an unwrapped sequences of either anything but a ' up to the first and (with the tempered greedy token (?:(?!\band\b)[^'])*) and anything (supporting escaped entities) between and including single apostrophes (with '[^'\\]*(?:\\.[^'\\]*)*' - which is also an unwrapped version of ([^'\\]|\\.)*).

            \n

            Python code demo:

            \n
            import re\np = re.compile(r'((?:(?!\band\b)[^\'])*(?:\'[^\'\\]*(?:\\.[^\'\\]*)*\'(?:(?!\band\b)[^\'])*)*)(?:and|$)')\ns = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"\nprint([x for x in p.findall(s) if x])\n
            \n soup wrap:

            You can use the following regex with re.findall:

            ((?:(?!\band\b)[^'])*(?:'[^'\\]*(?:\\.[^'\\]*)*'(?:(?!\band\b)[^'])*)*)(?:and|$)
            

            See the regex demo.

            The regular expression consists of an unwrapped sequences of either anything but a ' up to the first and (with the tempered greedy token (?:(?!\band\b)[^'])*) and anything (supporting escaped entities) between and including single apostrophes (with '[^'\\]*(?:\\.[^'\\]*)*' - which is also an unwrapped version of ([^'\\]|\\.)*).

            Python code demo:

            import re
            p = re.compile(r'((?:(?!\band\b)[^\'])*(?:\'[^\'\\]*(?:\\.[^\'\\]*)*\'(?:(?!\band\b)[^\'])*)*)(?:and|$)')
            s = "section_category_name = 'computer and equipment expense' and date >= 2015-01-01 and date <= 2015-03-31"
            print([x for x in p.findall(s) if x])
            
            qid & accept id: (34445707, 34446839) query: PLY YACC pythonic syntax for accumulating list of comma-separated values soup:

            There are two productions. Use two separate functions. (There is no extra cost :-) )

            \n
            def p_type_list_1(p):\n    '''type_list : type'''\n    p[0] = [p[1]]\n\ndef p_type_list_2(p):\n    '''type_list : type_list COMMA type'''\n    p[0] = p[1] + [p[3]]\n
            \n

            Note: I fixed your grammar to use left-recursion. With bottom-up parsing, left-recursion is almost always what you want, because it avoids unnecessary parser stack usage, and more importantly because it often simplifies actions. In this case, I could have written the second function as:

            \n
            def p_type_list_2(p):\n    '''type_list : type_list COMMA type'''\n    p[0] = p[1]\n    p[0] += [p[3]]\n
            \n

            which avoids a list copy.

            \n soup wrap:

            There are two productions. Use two separate functions. (There is no extra cost :-) )

            def p_type_list_1(p):
                '''type_list : type'''
                p[0] = [p[1]]
            
            def p_type_list_2(p):
                '''type_list : type_list COMMA type'''
                p[0] = p[1] + [p[3]]
            

            Note: I fixed your grammar to use left-recursion. With bottom-up parsing, left-recursion is almost always what you want, because it avoids unnecessary parser stack usage, and more importantly because it often simplifies actions. In this case, I could have written the second function as:

            def p_type_list_2(p):
                '''type_list : type_list COMMA type'''
                p[0] = p[1]
                p[0] += [p[3]]
            

            which avoids a list copy.

            qid & accept id: (34452644, 34453795) query: Parse Specific Text File to CSV Format with Headers soup:

            You can try this.

            \n

            First of all, I called the csv library to reduce the job of putting commas and quotes.

            \n
            import csv\n
            \n

            Then I made a function that takes a single line from your log file and outputs a dictionary with the fields passed in the header. If the current line hasn't a particular field from header, it will stay filled with an empty string.

            \n
            def convert_to_dict(line, header):\n    d = {}\n    for cell in header:\n        d[cell] = ''\n\n    row = line.strip().split(';')    \n    for cell in row:\n        if cell:\n            key, value = cell.split('=')\n            d[key] = value\n\n    return d\n
            \n

            Since the header and the number of fields can vary between your files, I made a function extracting them. For this, I employed a set, a collection of unique elements, but also unordered. So I converted to a list and used the sorted function. Don't forget that seek(0) call, to rewind the file!

            \n
            def extract_fields(logfile):\n    fields = set()\n    for line in logfile:\n        row = line.strip().split(';')\n        for cell in row:\n            if cell:\n                key, value = cell.split('=')\n                fields.add(key)\n\n    logfile.seek(0)\n    return sorted(list(fields))\n
            \n

            Lastly, I made the main piece of code, in which open both the log file to read and the csv file to write. Then, it extracts and writes the header, and writes each converted line.

            \n
            if __name__ == '__main__':\n    with open('report.log', 'r') as logfile:\n        with open('report.csv', 'wb') as csvfile:\n            csvwriter = csv.writer(csvfile)\n\n            header = extract_fields(logfile)\n            csvwriter.writerow(header)\n\n            for line in logfile:\n                d = convert_to_dict(line, header)\n                csvwriter.writerow([d[cell] for cell in header])\n
            \n

            These are the files I used as an example:

            \n

            report.log

            \n
            Sequence=3433;Status=true;Report=223313;Profile=xxxx;\nSequence=0323;Status=true;Header=The;Report=43838;Profile=xxxx;\nSequence=5323;Status=true;Report=6541998;Profile=xxxx;\n
            \n

            report.csv

            \n
            Header,Profile,Report,Sequence,Status\n,xxxx,223313,3433,true\nThe,xxxx,43838,0323,true\n,xxxx,6541998,5323,true\n
            \n

            I hope it helps! :D

            \n

            EDIT: I added support for different headers.

            \n soup wrap:

            You can try this.

            First of all, I called the csv library to reduce the job of putting commas and quotes.

            import csv
            

            Then I made a function that takes a single line from your log file and outputs a dictionary with the fields passed in the header. If the current line hasn't a particular field from header, it will stay filled with an empty string.

            def convert_to_dict(line, header):
                d = {}
                for cell in header:
                    d[cell] = ''
            
                row = line.strip().split(';')    
                for cell in row:
                    if cell:
                        key, value = cell.split('=')
                        d[key] = value
            
                return d
            

            Since the header and the number of fields can vary between your files, I made a function extracting them. For this, I employed a set, a collection of unique elements, but also unordered. So I converted to a list and used the sorted function. Don't forget that seek(0) call, to rewind the file!

            def extract_fields(logfile):
                fields = set()
                for line in logfile:
                    row = line.strip().split(';')
                    for cell in row:
                        if cell:
                            key, value = cell.split('=')
                            fields.add(key)
            
                logfile.seek(0)
                return sorted(list(fields))
            

            Lastly, I made the main piece of code, in which open both the log file to read and the csv file to write. Then, it extracts and writes the header, and writes each converted line.

            if __name__ == '__main__':
                with open('report.log', 'r') as logfile:
                    with open('report.csv', 'wb') as csvfile:
                        csvwriter = csv.writer(csvfile)
            
                        header = extract_fields(logfile)
                        csvwriter.writerow(header)
            
                        for line in logfile:
                            d = convert_to_dict(line, header)
                            csvwriter.writerow([d[cell] for cell in header])
            

            These are the files I used as an example:

            report.log

            Sequence=3433;Status=true;Report=223313;Profile=xxxx;
            Sequence=0323;Status=true;Header=The;Report=43838;Profile=xxxx;
            Sequence=5323;Status=true;Report=6541998;Profile=xxxx;
            

            report.csv

            Header,Profile,Report,Sequence,Status
            ,xxxx,223313,3433,true
            The,xxxx,43838,0323,true
            ,xxxx,6541998,5323,true
            

            I hope it helps! :D

            EDIT: I added support for different headers.

            qid & accept id: (34456661, 34458424) query: Checkbox to determine if an action is completed or not soup:

            First, you should iterate directly over the list rather than using a counter and a while loop:

            \n
            for client in dict_list:\n    currentClient = Label(text='Client: ' + client['Client']).grid(row=[i], column=1)\n    ...\n
            \n

            Second, if you do x=Label(...).grid(...), x will always be None. Best practice is to use two different statements. In this case the point is moot since you never use currentClient, but you should get in the habit of always separating them. Group your widget creation together, and your layout together, and your GUI will be much easier to manage:

            \n
            for client in dict_list:\n    clientLabel = Label(...)\n    contactLabel = Label(...)\n    emailLabel = Label(...)\n\n    clientLabel.grid(...)\n    contactLabel.grid(...)\n    emailLabel.grid(...)\n
            \n

            Third -- and this is the answer to your question -- you can create an instance of IntVar for each checkbutton, and store them either in a separate data structure or right along with your data. For example, to store them by business name you might do it like this:

            \n
            cbVars = {}\nfor client in dict_list:\n    ...\n    bizname = client["Business Name"]\n    cbVars[bizname] = IntVar()\n    cb = Checkbutton(..., onvalue=1, offvalue = 0, variable = cbVars[bizname])\n    ...\n
            \n soup wrap:

            First, you should iterate directly over the list rather than using a counter and a while loop:

            for client in dict_list:
                currentClient = Label(text='Client: ' + client['Client']).grid(row=[i], column=1)
                ...
            

            Second, if you do x=Label(...).grid(...), x will always be None. Best practice is to use two different statements. In this case the point is moot since you never use currentClient, but you should get in the habit of always separating them. Group your widget creation together, and your layout together, and your GUI will be much easier to manage:

            for client in dict_list:
                clientLabel = Label(...)
                contactLabel = Label(...)
                emailLabel = Label(...)
            
                clientLabel.grid(...)
                contactLabel.grid(...)
                emailLabel.grid(...)
            

            Third -- and this is the answer to your question -- you can create an instance of IntVar for each checkbutton, and store them either in a separate data structure or right along with your data. For example, to store them by business name you might do it like this:

            cbVars = {}
            for client in dict_list:
                ...
                bizname = client["Business Name"]
                cbVars[bizname] = IntVar()
                cb = Checkbutton(..., onvalue=1, offvalue = 0, variable = cbVars[bizname])
                ...
            
            qid & accept id: (34463966, 34464023) query: How do I obtain the reference of a getter/setter method created through @property in Python? soup:

            You'll have to use a lambda, because you need a bound property to get the right context:

            \n
            someWidget.valueChanged.connect(lambda v: setattr(player, 'health', v))\n
            \n

            Property objects do have .fget and .fset attributes, and the property object itself can be accessed on the class:

            \n
            Player.health.fset\nPlayer.health.fget\n
            \n

            but these give you access to the original unbound function objects, which require the self parameter still.

            \n

            You could use those functions too, but then you'd have to bind them to your instance first:

            \n
            someWidget.valueChanged.connect(Player.health.fset.__get__(player))\n
            \n

            The __get__ method on the function (which is a descriptor) to provide you with a bound method which passes in the self argument for you (the player instance object in this case).

            \n soup wrap:

            You'll have to use a lambda, because you need a bound property to get the right context:

            someWidget.valueChanged.connect(lambda v: setattr(player, 'health', v))
            

            Property objects do have .fget and .fset attributes, and the property object itself can be accessed on the class:

            Player.health.fset
            Player.health.fget
            

            but these give you access to the original unbound function objects, which require the self parameter still.

            You could use those functions too, but then you'd have to bind them to your instance first:

            someWidget.valueChanged.connect(Player.health.fset.__get__(player))
            

            The __get__ method on the function (which is a descriptor) to provide you with a bound method which passes in the self argument for you (the player instance object in this case).

            qid & accept id: (34468751, 34468767) query: Get value to 2 attribute from a xpath node for anchor tag soup:

            Just call .xpath("@href|text()") on every element this way:

            \n
            for item in list:\n    href, text = item.xpath("@href|text()")\n    print(href, text)\n
            \n

            Demo:

            \n
            >>> from lxml.html import fromstring\n>>> \n>>> data = """\n... \n...     Jason Weston\n...     Pierre Baldi\n...     Yair Weiss\n...     Peter Belhumeur\n...     Serge Belongie\n... \n... """\n>>> \n>>> tree = fromstring(data)\n>>> \n>>> for item in tree.xpath("//a"):\n...     print(item.xpath("@href|text()"))\n... \n['/citations?user=lMkTx0EAAAAJ&hl=en&oe=ASCII', 'Jason Weston']\n['/citations?user=RhFhIIgAAAAJ&hl=en&oe=ASCII', 'Pierre Baldi']\n['/citations?user=9DXQi8gAAAAJ&hl=en&oe=ASCII', 'Yair Weiss']\n['/citations?user=J8YyZugAAAAJ&hl=en&oe=ASCII', 'Peter Belhumeur']\n['/citations?user=ORr4XJYAAAAJ&hl=en&oe=ASCII', 'Serge Belongie']\n
            \n soup wrap:

            Just call .xpath("@href|text()") on every element this way:

            for item in list:
                href, text = item.xpath("@href|text()")
                print(href, text)
            

            Demo:

            >>> from lxml.html import fromstring
            >>> 
            >>> data = """
            ... 
            ...     Jason Weston
            ...     Pierre Baldi
            ...     Yair Weiss
            ...     Peter Belhumeur
            ...     Serge Belongie
            ... 
            ... """
            >>> 
            >>> tree = fromstring(data)
            >>> 
            >>> for item in tree.xpath("//a"):
            ...     print(item.xpath("@href|text()"))
            ... 
            ['/citations?user=lMkTx0EAAAAJ&hl=en&oe=ASCII', 'Jason Weston']
            ['/citations?user=RhFhIIgAAAAJ&hl=en&oe=ASCII', 'Pierre Baldi']
            ['/citations?user=9DXQi8gAAAAJ&hl=en&oe=ASCII', 'Yair Weiss']
            ['/citations?user=J8YyZugAAAAJ&hl=en&oe=ASCII', 'Peter Belhumeur']
            ['/citations?user=ORr4XJYAAAAJ&hl=en&oe=ASCII', 'Serge Belongie']
            
            qid & accept id: (34478011, 34485724) query: Using descriptor class to raise RuntimeError when user tries to change object's value soup:

            Improved descriptor Computation

            \n

            This would allow only the initial setting of a Computations descriptor:

            \n
            class Computations(object):\n    def __init__(self, name):\n        self.name = name   # default value for area, circumference, distance to origin\n\n    def __get__(self, instance, cls):\n        if instance is None:\n            print('this is the __get__ if statement running')\n            return self\n        else:\n            print('this is the __get__ else statement running')\n            return instance.__dict__[self.name]\n\n    def __set__(self, instance, value):\n        if hasattr(instance, self.name + '_is_set'):\n            raise ValueError('Cannot set {} again.'.format(self.name[1:]))\n        if isinstance(value, int):\n            raise RuntimeError('Cant set formulas')\n        else:\n            instance.__dict__[self.name] = value\n            setattr(instance, self.name + '_is_set', True)\n
            \n

            The trick is to generate a new attribute on the instance:

            \n
            setattr(instance, self.name + '_is_set', True) \n
            \n

            For the instance circle and the attribute circumference this means:

            \n
            circle._circumference_is_set = True \n
            \n

            This checks if this attribute exists:

            \n
            if hasattr(instance, self.name + '_is_set')\n
            \n

            Again for our case this means:

            \n
            if hasattr(circle, '_circumference_is_set')\n
            \n

            The first time __set__ is called for circumference is in the class Circle:

            \n
            self.circumference = 2 * pi * self.r \n
            \n

            Now _circumference_is_set exists and the next try to set it will result in an exception.

            \n

            Putting it all together

            \n

            Your code with my new descriptor Computation:

            \n
            from math import pi, sqrt\n\nclass Integer(object):\n    def __init__(self, name):\n        self.name = name        # stores name of the managed object's attribute\n\n    def __get__(self, instance, cls):\n        if instance is None:\n            return self\n        else:\n            return instance.__dict__[self.name]\n\n    def __set__(self, instance, value):\n        if not isinstance(value, int):\n            raise TypeError('Expected an int')\n        else:\n            instance.__dict__[self.name] = value\n\nclass Computations(object):\n    def __init__(self, name):\n        self.name = name   # default value for area, circumference, distance to origin\n\n    def __get__(self, instance, cls):\n        if instance is None:\n            print('this is the __get__ if statement running')\n            return self\n        else:\n            print('this is the __get__ else statement running')\n            return instance.__dict__[self.name]\n\n    def __set__(self, instance, value):\n        if hasattr(instance, self.name + 'is_set'):\n            raise ValueError('Cannot set {} again.'.format(self.name[1:]))\n        if isinstance(value, int):\n            raise RuntimeError('Cant set formulas')\n        else:\n            instance.__dict__[self.name] = value\n            setattr(instance, self.name + 'is_set', True)\n\n\nclass Circle(object):\n    x = Integer('_x')   # Use _x and _y as the __dict__ key of a Point\n    y = Integer('_y')   # These will be the storage names for a Point\n    r = Integer('_r')\n\n    area = Computations('_area')   # class variable of Computations\n    circumference = Computations('_circumference')\n    distance_to_origin = Computations('_distance_to_origin')\n\n    def __init__(self, x, y, r):\n        self.x = x      # invokes Integer.x.__set__\n        self.y = y      # invokes Integer.y.__set__\n        self.r = r      # for radius/invokes Integer.r.\n        self.area = pi * self.r * self.r\n        self.circumference = 2 * pi * self.r\n        self.distance_to_origin = abs(sqrt(self.x * self.x + self.y * self.y) - self.r)\n
            \n

            Testing

            \n

            Now trying to set circle.circumference raises an exception:

            \n
            # Testing code\nif __name__ == '__main__':\n\n    circle = Circle(x=3, y=4, r=5)\n    print('circumference', circle.circumference)\n\n    print('try setting circumference')\n    circle.circumference = 12.5\n
            \n

            Output:

            \n
            this is the __get__ else statement running\ncircumference 31.41592653589793\ntry setting circumference\n\n---------------------------------------------------------------------------\nValueError                                Traceback (most recent call last)\n in ()\n     64 \n     65     print('try setting circumference')\n---> 66     circle.circumference = 12.5\n\n in __set__(self, instance, value)\n     31     def __set__(self, instance, value):\n     32         if hasattr(instance, self.name + 'is_set'):\n---> 33             raise ValueError('Cannot set {} again.'.format(self.name[1:]))\n     34         if isinstance(value, int):\n     35             raise RuntimeError('Cant set formulas')\n\nValueError: Cannot set circumference again.\n
            \n

            Your tests:

            \n
            if __name__ == '__main__':\n\n    circle = Circle(x=3, y=4, r=5)\n    print(circle.x)\n    print(circle.y)\n    print(circle.r)\n    print(circle.area)\n   # circle.area = 12\n    print(circle.area)\n    print(circle.circumference)\n    print(circle.distance_to_origin)\n    tests = [('circle.x = 12.3', "print('Setting circle.x to non-integer fails')"),\n             ('circle.y = 23.4', "print('Setting circle.y to non-integer fails')"),\n             ('circle.area = 23.4', "print('Setting circle.area fails')"),\n             ('circle.circumference = 23.4', "print('Setting circle.circumference fails')"),\n             ('circle.distance_to_origin = 23.4', "print('Setting circle.distance_to_origin fails')"),\n             ('circle.z = 5.6', "print('Setting circle.z fails')"),\n             ('print(circle.z)', "print('Printing circle.z fails')")]\n    for test in tests:\n        try:\n            exec(test[0])\n        except:\n            exec(test[1])\n
            \n

            generate this output:

            \n
            3\n4\n5\nthis is the __get__ else statement running\n78.53981633974483\nthis is the __get__ else statement running\n78.53981633974483\nthis is the __get__ else statement running\n31.41592653589793\nthis is the __get__ else statement running\n0.0\nSetting circle.x to non-integer fails\nSetting circle.y to non-integer fails\nSetting circle.area fails\nSetting circle.circumference fails\nSetting circle.distance_to_origin fails\n5.6\n
            \n soup wrap:

            Improved descriptor Computation

            This would allow only the initial setting of a Computations descriptor:

            class Computations(object):
                def __init__(self, name):
                    self.name = name   # default value for area, circumference, distance to origin
            
                def __get__(self, instance, cls):
                    if instance is None:
                        print('this is the __get__ if statement running')
                        return self
                    else:
                        print('this is the __get__ else statement running')
                        return instance.__dict__[self.name]
            
                def __set__(self, instance, value):
                    if hasattr(instance, self.name + '_is_set'):
                        raise ValueError('Cannot set {} again.'.format(self.name[1:]))
                    if isinstance(value, int):
                        raise RuntimeError('Cant set formulas')
                    else:
                        instance.__dict__[self.name] = value
                        setattr(instance, self.name + '_is_set', True)
            

            The trick is to generate a new attribute on the instance:

            setattr(instance, self.name + '_is_set', True) 
            

            For the instance circle and the attribute circumference this means:

            circle._circumference_is_set = True 
            

            This checks if this attribute exists:

            if hasattr(instance, self.name + '_is_set')
            

            Again for our case this means:

            if hasattr(circle, '_circumference_is_set')
            

            The first time __set__ is called for circumference is in the class Circle:

            self.circumference = 2 * pi * self.r 
            

            Now _circumference_is_set exists and the next try to set it will result in an exception.

            Putting it all together

            Your code with my new descriptor Computation:

            from math import pi, sqrt
            
            class Integer(object):
                def __init__(self, name):
                    self.name = name        # stores name of the managed object's attribute
            
                def __get__(self, instance, cls):
                    if instance is None:
                        return self
                    else:
                        return instance.__dict__[self.name]
            
                def __set__(self, instance, value):
                    if not isinstance(value, int):
                        raise TypeError('Expected an int')
                    else:
                        instance.__dict__[self.name] = value
            
            class Computations(object):
                def __init__(self, name):
                    self.name = name   # default value for area, circumference, distance to origin
            
                def __get__(self, instance, cls):
                    if instance is None:
                        print('this is the __get__ if statement running')
                        return self
                    else:
                        print('this is the __get__ else statement running')
                        return instance.__dict__[self.name]
            
                def __set__(self, instance, value):
                    if hasattr(instance, self.name + 'is_set'):
                        raise ValueError('Cannot set {} again.'.format(self.name[1:]))
                    if isinstance(value, int):
                        raise RuntimeError('Cant set formulas')
                    else:
                        instance.__dict__[self.name] = value
                        setattr(instance, self.name + 'is_set', True)
            
            
            class Circle(object):
                x = Integer('_x')   # Use _x and _y as the __dict__ key of a Point
                y = Integer('_y')   # These will be the storage names for a Point
                r = Integer('_r')
            
                area = Computations('_area')   # class variable of Computations
                circumference = Computations('_circumference')
                distance_to_origin = Computations('_distance_to_origin')
            
                def __init__(self, x, y, r):
                    self.x = x      # invokes Integer.x.__set__
                    self.y = y      # invokes Integer.y.__set__
                    self.r = r      # for radius/invokes Integer.r.
                    self.area = pi * self.r * self.r
                    self.circumference = 2 * pi * self.r
                    self.distance_to_origin = abs(sqrt(self.x * self.x + self.y * self.y) - self.r)
            

            Testing

            Now trying to set circle.circumference raises an exception:

            # Testing code
            if __name__ == '__main__':
            
                circle = Circle(x=3, y=4, r=5)
                print('circumference', circle.circumference)
            
                print('try setting circumference')
                circle.circumference = 12.5
            

            Output:

            this is the __get__ else statement running
            circumference 31.41592653589793
            try setting circumference
            
            ---------------------------------------------------------------------------
            ValueError                                Traceback (most recent call last)
             in ()
                 64 
                 65     print('try setting circumference')
            ---> 66     circle.circumference = 12.5
            
             in __set__(self, instance, value)
                 31     def __set__(self, instance, value):
                 32         if hasattr(instance, self.name + 'is_set'):
            ---> 33             raise ValueError('Cannot set {} again.'.format(self.name[1:]))
                 34         if isinstance(value, int):
                 35             raise RuntimeError('Cant set formulas')
            
            ValueError: Cannot set circumference again.
            

            Your tests:

            if __name__ == '__main__':
            
                circle = Circle(x=3, y=4, r=5)
                print(circle.x)
                print(circle.y)
                print(circle.r)
                print(circle.area)
               # circle.area = 12
                print(circle.area)
                print(circle.circumference)
                print(circle.distance_to_origin)
                tests = [('circle.x = 12.3', "print('Setting circle.x to non-integer fails')"),
                         ('circle.y = 23.4', "print('Setting circle.y to non-integer fails')"),
                         ('circle.area = 23.4', "print('Setting circle.area fails')"),
                         ('circle.circumference = 23.4', "print('Setting circle.circumference fails')"),
                         ('circle.distance_to_origin = 23.4', "print('Setting circle.distance_to_origin fails')"),
                         ('circle.z = 5.6', "print('Setting circle.z fails')"),
                         ('print(circle.z)', "print('Printing circle.z fails')")]
                for test in tests:
                    try:
                        exec(test[0])
                    except:
                        exec(test[1])
            

            generate this output:

            3
            4
            5
            this is the __get__ else statement running
            78.53981633974483
            this is the __get__ else statement running
            78.53981633974483
            this is the __get__ else statement running
            31.41592653589793
            this is the __get__ else statement running
            0.0
            Setting circle.x to non-integer fails
            Setting circle.y to non-integer fails
            Setting circle.area fails
            Setting circle.circumference fails
            Setting circle.distance_to_origin fails
            5.6
            
            qid & accept id: (34503246, 34539454) query: List names of all available MS SQL databases on server using python soup:

            If all you really want to do is avoid importing pandas then the following works fine for me:

            \n
            from sqlalchemy import create_engine\nengine = create_engine('mssql+pymssql://sa:saPassword@localhost:52865/myDb')\nconn = engine.connect()\nrows = conn.execute("select name FROM sys.databases;")\nfor row in rows:\n    print(row["name"])\n
            \n

            producing

            \n
            master\ntempdb\nmodel\nmsdb\nmyDb\n
            \n soup wrap:

            If all you really want to do is avoid importing pandas then the following works fine for me:

            from sqlalchemy import create_engine
            engine = create_engine('mssql+pymssql://sa:saPassword@localhost:52865/myDb')
            conn = engine.connect()
            rows = conn.execute("select name FROM sys.databases;")
            for row in rows:
                print(row["name"])
            

            producing

            master
            tempdb
            model
            msdb
            myDb
            
            qid & accept id: (34505283, 34505477) query: Sorting a list with a dictionary at items soup:

            The sorted function, with the key parameter is what you're looking for!

            \n

            I'm not sure about your case, but maybe the code you are looking for is:

            \n
            return sorted(fights, key=(lambda fight:fight["Date"]))\n
            \n

            (To replace what you currently are returning.)

            \n

            The sorted function sorts a list. The key parameter is a function to apply to each element before sorting. For example:

            \n
            sorted([1, 2, 3], key=(lambda x:(-x)))\n
            \n

            Would return, [3, 2, 1], because it sorts [-1, -2, -3], but then uses the original numbers in the output, if that makes any sense. (Although in that particular case, using reversed=True) would be better.

            \n soup wrap:

            The sorted function, with the key parameter is what you're looking for!

            I'm not sure about your case, but maybe the code you are looking for is:

            return sorted(fights, key=(lambda fight:fight["Date"]))
            

            (To replace what you currently are returning.)

            The sorted function sorts a list. The key parameter is a function to apply to each element before sorting. For example:

            sorted([1, 2, 3], key=(lambda x:(-x)))
            

            Would return, [3, 2, 1], because it sorts [-1, -2, -3], but then uses the original numbers in the output, if that makes any sense. (Although in that particular case, using reversed=True) would be better.

            qid & accept id: (34512219, 34754354) query: Inserting a folder containing specific routes to a bottle application in Python soup:

            Bottle can run multiple bottle apps as a single instance.\nYou can use something like this on main.py

            \n
            import bottle\nfrom web.bottleApp import app\nfrom configure.config import configure_app\n\nmain = bottle.Bottle()\nmain.mount("/config/",configure)\nmain.mount("/",app)\n\nmain.run(host = 'localhost', port=8080)\n
            \n

            and on configure/config.py something like this:

            \n
            import bottle\n\nconfig_app = bottle.Bottle()\n\n@config_app.route('/config1')\ndef config1():    \n    return 'some config data'\n
            \n soup wrap:

            Bottle can run multiple bottle apps as a single instance. You can use something like this on main.py

            import bottle
            from web.bottleApp import app
            from configure.config import configure_app
            
            main = bottle.Bottle()
            main.mount("/config/",configure)
            main.mount("/",app)
            
            main.run(host = 'localhost', port=8080)
            

            and on configure/config.py something like this:

            import bottle
            
            config_app = bottle.Bottle()
            
            @config_app.route('/config1')
            def config1():    
                return 'some config data'
            
            qid & accept id: (34514629, 34515434) query: New Python Gmail API - Only Retrieve Messages from Yesterday soup:

            You can pass queries to the messages.list method that searches for messages within a date range. You can actually use any query supported by Gmail's advanced search.

            \n

            You do this, which will just return messages.

            \n
            message = service.users().messages().list(userId='me').execute()\n
            \n

            But can do this to search for messages sent yesterday, by passing the q keyword argument, and a query specifying the before: and after: keywords.

            \n
            from datetime import date, timedelta\n\ntoday = date.today()\nyesterday = today - timedelta(1)\n\n# do your setup...\n\nuser_id = 'user email address'\n\n# Dates have to formatted in YYYY/MM/DD format for gmail\nquery = "before: {0} after: {1}".format(today.strftime('%Y/%m/%d'),\n                                        yesterday.strftime('%Y/%m/%d'))\n\nresponse = service.users().messages().list(userId=user_id,\n                                           q=query).execute()\n# Process the response for messages...\n
            \n

            You can also try this against their GMail messages.list reference page.

            \n soup wrap:

            You can pass queries to the messages.list method that searches for messages within a date range. You can actually use any query supported by Gmail's advanced search.

            You do this, which will just return messages.

            message = service.users().messages().list(userId='me').execute()
            

            But can do this to search for messages sent yesterday, by passing the q keyword argument, and a query specifying the before: and after: keywords.

            from datetime import date, timedelta
            
            today = date.today()
            yesterday = today - timedelta(1)
            
            # do your setup...
            
            user_id = 'user email address'
            
            # Dates have to formatted in YYYY/MM/DD format for gmail
            query = "before: {0} after: {1}".format(today.strftime('%Y/%m/%d'),
                                                    yesterday.strftime('%Y/%m/%d'))
            
            response = service.users().messages().list(userId=user_id,
                                                       q=query).execute()
            # Process the response for messages...
            

            You can also try this against their GMail messages.list reference page.

            qid & accept id: (34521703, 34522224) query: Python Flatten Dataframe With Multiple Columns all n-length soup:

            How's about this:

            \n
            In [11]: df1 = df[["Misc", "Year"] + [c for c in df.columns if c[-1] == "1"]]\n\nIn [12]: df1 = df1.rename(columns=lambda x: x[:-1] if x[-1] == "1" else x)\n\nIn [13]: df1\nOut[13]:\n  Misc  Year   a  b    c\n0    A  1991  10  h  4.1\n1    R  1992  20  i  4.2\n2    B  1993  30  j  4.3\n\nIn [14]: df2 = df[["Misc", "Year"] + [c for c in df.columns if c[-1] == "2"]]\n\nIn [15]: df2 = df2.rename(columns=lambda x: x[:-1] if x[-1] == "2" else x)\n\nIn [16]: pd.concat([df1, df2])\nOut[16]:\n  Misc  Year   a  b    c\n0    A  1991  10  h  4.1\n1    R  1992  20  i  4.2\n2    B  1993  30  j  4.3\n0    A  1991  40  k  4.4\n1    R  1992  50  l  4.5\n2    B  1993  60  m  4.6\n
            \n

            You could do this as a comprehension, or function, more generally:

            \n
            In [21]: pd.concat([df[["Misc", "Year"] + [c for c in df.columns if c[-1] == str(i)]]\n                     .rename(columns=lambda x: x[:-1] if x[-1] == str(i) else x)\n                    for i in range(1, 3)])\nOut[21]:\n  Misc  Year   a  b    c\n0    A  1991  10  h  4.1\n1    R  1992  20  i  4.2\n2    B  1993  30  j  4.3\n0    A  1991  40  k  4.4\n1    R  1992  50  l  4.5\n2    B  1993  60  m  4.6\n
            \n
            \n

            If you want to eke out some more performance, you're going to want to do this concat in numpy and then repeat the index (though I'm not convinced it's worth the small gain that will give you).

            \n soup wrap:

            How's about this:

            In [11]: df1 = df[["Misc", "Year"] + [c for c in df.columns if c[-1] == "1"]]
            
            In [12]: df1 = df1.rename(columns=lambda x: x[:-1] if x[-1] == "1" else x)
            
            In [13]: df1
            Out[13]:
              Misc  Year   a  b    c
            0    A  1991  10  h  4.1
            1    R  1992  20  i  4.2
            2    B  1993  30  j  4.3
            
            In [14]: df2 = df[["Misc", "Year"] + [c for c in df.columns if c[-1] == "2"]]
            
            In [15]: df2 = df2.rename(columns=lambda x: x[:-1] if x[-1] == "2" else x)
            
            In [16]: pd.concat([df1, df2])
            Out[16]:
              Misc  Year   a  b    c
            0    A  1991  10  h  4.1
            1    R  1992  20  i  4.2
            2    B  1993  30  j  4.3
            0    A  1991  40  k  4.4
            1    R  1992  50  l  4.5
            2    B  1993  60  m  4.6
            

            You could do this as a comprehension, or function, more generally:

            In [21]: pd.concat([df[["Misc", "Year"] + [c for c in df.columns if c[-1] == str(i)]]
                                 .rename(columns=lambda x: x[:-1] if x[-1] == str(i) else x)
                                for i in range(1, 3)])
            Out[21]:
              Misc  Year   a  b    c
            0    A  1991  10  h  4.1
            1    R  1992  20  i  4.2
            2    B  1993  30  j  4.3
            0    A  1991  40  k  4.4
            1    R  1992  50  l  4.5
            2    B  1993  60  m  4.6
            

            If you want to eke out some more performance, you're going to want to do this concat in numpy and then repeat the index (though I'm not convinced it's worth the small gain that will give you).

            qid & accept id: (34546949, 34547430) query: Send HEX values to SPI on a Raspberry PI B+ soup:

            Constructing those byte strings is easy: just use \x escape codes.

            \n

            Here's a simple example, which I tested on Python 2.6, but it should work ok on Python 3, too.

            \n
            hdr = b'\x00' * 4\nblocksize = 51\nleds = (\n    #LED off\n    hdr + b'\x80\x00' * blocksize,\n    #LED on\n    hdr + b'\xff\xff' * blocksize,\n)\n\nfname = '/dev/stdout'\nwith open(fname, 'wb') as f:\n    f.write(leds[0])\n
            \n

            That code creates the file to turn the LED off; to turn it on simply do f.write(leds[1]).

            \n

            The b prefix on the strings indicate that the strings are byte strings. That prefix isn't required on Python 2, since Python 2 strings are byte string objects, but it should be used in Python 3, since Python 3 strings are Unicode string objects.

            \n

            My code writes to /dev/stdout to simplify testing, since I don't have a Raspberry Pi, but you can easily change the filename to /dev/spidev-0.0.

            \n

            Here's a hexdump of its output:

            \n
            00000000  00 00 00 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|\n00000010  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|\n00000020  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|\n00000030  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|\n00000040  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|\n00000050  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|\n00000060  80 00 80 00 80 00 80 00  80 00                    |..........|\n0000006a\n
            \n soup wrap:

            Constructing those byte strings is easy: just use \x escape codes.

            Here's a simple example, which I tested on Python 2.6, but it should work ok on Python 3, too.

            hdr = b'\x00' * 4
            blocksize = 51
            leds = (
                #LED off
                hdr + b'\x80\x00' * blocksize,
                #LED on
                hdr + b'\xff\xff' * blocksize,
            )
            
            fname = '/dev/stdout'
            with open(fname, 'wb') as f:
                f.write(leds[0])
            

            That code creates the file to turn the LED off; to turn it on simply do f.write(leds[1]).

            The b prefix on the strings indicate that the strings are byte strings. That prefix isn't required on Python 2, since Python 2 strings are byte string objects, but it should be used in Python 3, since Python 3 strings are Unicode string objects.

            My code writes to /dev/stdout to simplify testing, since I don't have a Raspberry Pi, but you can easily change the filename to /dev/spidev-0.0.

            Here's a hexdump of its output:

            00000000  00 00 00 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|
            00000010  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|
            00000020  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|
            00000030  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|
            00000040  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|
            00000050  80 00 80 00 80 00 80 00  80 00 80 00 80 00 80 00  |................|
            00000060  80 00 80 00 80 00 80 00  80 00                    |..........|
            0000006a
            
            qid & accept id: (34556645, 34558734) query: how to get the class type in lua / translation from python soup:

            There are 8 types in Lua: nil, boolean, number, string, function, thread, table and userdata. You can find out which of these basic types your object belongs to using built-in type() function:

            \n
            type('Hello world')                    == 'string'\ntype(3.14)                             == 'number'\ntype(true)                             == 'boolean'\ntype(nil)                              == 'nil'\ntype(print)                            == 'function'\ntype(coroutine.create(function() end)) == 'thread'\ntype({})                               == 'table'\ntype(torch.Tensor())                   == 'userdata'\n
            \n

            Note that the type of torch.Tensor is userdata. That makes sense since torch library is written in C.

            \n
            \n

            The type userdata is provided to allow arbitrary C data to be stored\n in Lua variables. A userdata value is a pointer to a block of raw\n memory. Userdata has no predefined operations in Lua, except\n assignment and identity test.

            \n

            The metatable for the userdata is put in the registry, and the __index\n field points to the table of methods so that the object:method()\n syntax will work.

            \n
            \n

            So, dealing with a userdata object, we do not know what it is but have a list of its methods and can invoke them.

            \n

            It would be great if custom objects had some mechanism (a method or something) to see their custom types. And guess what? Torch objects have one:

            \n
            t = torch.Tensor()\ntype(t)       == 'userdata' # Because the class was written in C\ntorch.type(t) == 'torch.DoubleTensor'\n# or\nt:type()      == 'torch.DoubleTensor'\n
            \n

            Speaking of Torch. It has its own object system emulator, and you are free to create some torch classes yourself and check their types the same way. For Lua, however, such classes/objects are nothing more than ordinary tables.

            \n
            local A = torch.class('ClassA')\nfunction A:__init(val)\n    self.val = val\nend\n\nlocal B, parent = torch.class('ClassB', 'ClassA')\nfunction B:__init(val)\n    parent.__init(self, val)\nend\n\nb = ClassB(5)\ntype(b)       == 'table' # Because the class was written in Lua\ntorch.type(b) == 'ClassB'\nb:type() # exception; Custom Torch classes have no :type() method by defauld\n
            \n soup wrap:

            There are 8 types in Lua: nil, boolean, number, string, function, thread, table and userdata. You can find out which of these basic types your object belongs to using built-in type() function:

            type('Hello world')                    == 'string'
            type(3.14)                             == 'number'
            type(true)                             == 'boolean'
            type(nil)                              == 'nil'
            type(print)                            == 'function'
            type(coroutine.create(function() end)) == 'thread'
            type({})                               == 'table'
            type(torch.Tensor())                   == 'userdata'
            

            Note that the type of torch.Tensor is userdata. That makes sense since torch library is written in C.

            The type userdata is provided to allow arbitrary C data to be stored in Lua variables. A userdata value is a pointer to a block of raw memory. Userdata has no predefined operations in Lua, except assignment and identity test.

            The metatable for the userdata is put in the registry, and the __index field points to the table of methods so that the object:method() syntax will work.

            So, dealing with a userdata object, we do not know what it is but have a list of its methods and can invoke them.

            It would be great if custom objects had some mechanism (a method or something) to see their custom types. And guess what? Torch objects have one:

            t = torch.Tensor()
            type(t)       == 'userdata' # Because the class was written in C
            torch.type(t) == 'torch.DoubleTensor'
            # or
            t:type()      == 'torch.DoubleTensor'
            

            Speaking of Torch. It has its own object system emulator, and you are free to create some torch classes yourself and check their types the same way. For Lua, however, such classes/objects are nothing more than ordinary tables.

            local A = torch.class('ClassA')
            function A:__init(val)
                self.val = val
            end
            
            local B, parent = torch.class('ClassB', 'ClassA')
            function B:__init(val)
                parent.__init(self, val)
            end
            
            b = ClassB(5)
            type(b)       == 'table' # Because the class was written in Lua
            torch.type(b) == 'ClassB'
            b:type() # exception; Custom Torch classes have no :type() method by defauld
            
            qid & accept id: (34563454, 34563512) query: Django ImageField upload_to path soup:

            Your image would be upload to media folder, so beter change path in model like images/, and they will be upload to media/images

            \n

            In settings.py add this

            \n

            MEDIA_URL = '/media/'\n MEDIA_ROOT = os.path.join(BASE_DIR, 'media')

            \n

            In url.py

            \n
            from django.conf.urls.static import static\nfrom django.conf import settings\nurlpatterns = [....\n]+ static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)\n
            \n

            And then, if you want to display all this image, use something like this\nin view.py
            \nBlogContent.objects.all()

            \n

            And render it like this:

            \n
            {% for img in your_object %}\n\n{% endfor %}\n
            \n soup wrap:

            Your image would be upload to media folder, so beter change path in model like images/, and they will be upload to media/images

            In settings.py add this

            MEDIA_URL = '/media/' MEDIA_ROOT = os.path.join(BASE_DIR, 'media')

            In url.py

            from django.conf.urls.static import static
            from django.conf import settings
            urlpatterns = [....
            ]+ static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
            

            And then, if you want to display all this image, use something like this in view.py
            BlogContent.objects.all()

            And render it like this:

            {% for img in your_object %}
            
            {% endfor %}
            
            qid & accept id: (34569966, 34570026) query: Remove duplicates in python list but remember the index soup:

            Use enumerate to keep track of the index and a set to keep track of element seen:

            \n
            l = [1, 1, 2, 3]\ninds = []\nseen = set()\nfor i, ele in enumerate(l):\n    if ele not in seen:\n        inds.append(i)\n    seen.add(ele)\n
            \n

            If you want both:

            \n
            inds = []\nseen = set()\nfor i, ele in enumerate(l):\n    if ele not in seen:\n        inds.append((i,ele))\n    seen.add(ele)\n
            \n

            Or if you want both in different lists:

            \n
            l = [1, 1, 2, 3]\ninds, unq = [],[]\nseen = set()\nfor i, ele in enumerate(l):\n    if ele not in seen:\n        inds.append(i)\n        unq.append(ele)\n    seen.add(ele)\n
            \n

            Using a set is by far the best approach:

            \n
            In [13]: l = [randint(1,10000) for _ in range(10000)]     \n\nIn [14]: %%timeit                                         \ninds = []\nseen = set()\nfor i, ele in enumerate(l):\n    if ele not in seen:\n        inds.append((i,ele))\n    seen.add(ele)\n   ....: \n100 loops, best of 3: 3.08 ms per loop\n\nIn [15]: timeit  OrderedDict((x, l.index(x)) for x in l)\n1 loops, best of 3: 442 ms per loop\n\nIn [16]: l = [randint(1,10000) for _ in range(100000)]      \nIn [17]: timeit  OrderedDict((x, l.index(x)) for x in l)\n1 loops, best of 3: 10.3 s per loop\n\nIn [18]: %%timeit                                       \ninds = []\nseen = set()\nfor i, ele in enumerate(l):\n    if ele not in seen:\n        inds.append((i,ele))\n    seen.add(ele)\n   ....: \n10 loops, best of 3: 22.6 ms per loop\n
            \n

            So for 100k elements 10.3 seconds vs 22.6 ms, if you try with anything larger with less dupes like [randint(1,100000) for _ in range(100000)] you will have time to read a book. Creating two lists is marginally slower but still orders of magnitude faster than using list.index.

            \n

            If you want to get a value at a time you can use a generator function:

            \n
            def yield_un(l):\n    seen = set()\n    for i, ele in enumerate(l):\n        if ele not in seen:\n            yield (i,ele)\n        seen.add(ele)\n
            \n soup wrap:

            Use enumerate to keep track of the index and a set to keep track of element seen:

            l = [1, 1, 2, 3]
            inds = []
            seen = set()
            for i, ele in enumerate(l):
                if ele not in seen:
                    inds.append(i)
                seen.add(ele)
            

            If you want both:

            inds = []
            seen = set()
            for i, ele in enumerate(l):
                if ele not in seen:
                    inds.append((i,ele))
                seen.add(ele)
            

            Or if you want both in different lists:

            l = [1, 1, 2, 3]
            inds, unq = [],[]
            seen = set()
            for i, ele in enumerate(l):
                if ele not in seen:
                    inds.append(i)
                    unq.append(ele)
                seen.add(ele)
            

            Using a set is by far the best approach:

            In [13]: l = [randint(1,10000) for _ in range(10000)]     
            
            In [14]: %%timeit                                         
            inds = []
            seen = set()
            for i, ele in enumerate(l):
                if ele not in seen:
                    inds.append((i,ele))
                seen.add(ele)
               ....: 
            100 loops, best of 3: 3.08 ms per loop
            
            In [15]: timeit  OrderedDict((x, l.index(x)) for x in l)
            1 loops, best of 3: 442 ms per loop
            
            In [16]: l = [randint(1,10000) for _ in range(100000)]      
            In [17]: timeit  OrderedDict((x, l.index(x)) for x in l)
            1 loops, best of 3: 10.3 s per loop
            
            In [18]: %%timeit                                       
            inds = []
            seen = set()
            for i, ele in enumerate(l):
                if ele not in seen:
                    inds.append((i,ele))
                seen.add(ele)
               ....: 
            10 loops, best of 3: 22.6 ms per loop
            

            So for 100k elements 10.3 seconds vs 22.6 ms, if you try with anything larger with less dupes like [randint(1,100000) for _ in range(100000)] you will have time to read a book. Creating two lists is marginally slower but still orders of magnitude faster than using list.index.

            If you want to get a value at a time you can use a generator function:

            def yield_un(l):
                seen = set()
                for i, ele in enumerate(l):
                    if ele not in seen:
                        yield (i,ele)
                    seen.add(ele)
            
            qid & accept id: (34576433, 34576466) query: Copy 2D array to a 3D one - Python / NumPy soup:

            You could reshape with np.reshape & then re-arrange dimensions with np.transpose, like so -

            \n
            H = data.reshape(N,Nt,N).transpose(0,2,1)\n
            \n

            Instead of np.transpose, one can also use np.swapaxes as basically we are swapping axes 1,2 there, like so -

            \n
            H = data.reshape(N,Nt,N).swapaxes(1,2)\n
            \n

            Sample run -

            \n
            In [300]: N = 2\n     ...: Nt = 3\n     ...: data = np.random.randint(0,9,(N*Nt,N))\n     ...: \n\nIn [301]: data\nOut[301]: \narray([[3, 6],\n       [7, 4],\n       [8, 1],\n       [8, 7],\n       [4, 8],\n       [2, 3]])\n\nIn [302]: H = np.zeros((N,N,Nt),dtype=data.dtype)\n     ...: for k in np.arange(N):\n     ...:     for l in np.arange(N):            \n     ...:         for m in np.arange(Nt):    \n     ...:             H[k,l,m] = data[m+Nt*k,l]\n     ...:             \n\nIn [303]: H\nOut[303]: \narray([[[3, 7, 8],\n        [6, 4, 1]],\n\n       [[8, 4, 2],\n        [7, 8, 3]]])\n\nIn [304]: data.reshape(N,Nt,N).transpose(0,2,1)\nOut[304]: \narray([[[3, 7, 8],\n        [6, 4, 1]],\n\n       [[8, 4, 2],\n        [7, 8, 3]]])\n
            \n

            Runtime test -

            \n
            In [8]: # Input\n   ...: N = 10\n   ...: Nt = 10*50\n   ...: data = np.random.randint(0,9,(N*Nt,N))\n   ...: \n   ...: def original_app(data):\n   ...:     H = np.zeros((N,N,Nt),dtype=data.dtype)\n   ...:     for k in np.arange(N):\n   ...:         for l in np.arange(N):            \n   ...:             for m in np.arange(Nt):    \n   ...:                 H[k,l,m] = data[m+Nt*k,l]\n   ...:     return H\n   ...: \n\nIn [9]: np.allclose(original_app(data),data.reshape(N,Nt,N).transpose(0,2,1))\nOut[9]: True\n\nIn [10]: %timeit original_app(data)\n10 loops, best of 3: 56.1 ms per loop\n\nIn [11]: %timeit data.reshape(N,Nt,N).transpose(0,2,1)\n1000000 loops, best of 3: 1.25 µs per loop\n
            \n soup wrap:

            You could reshape with np.reshape & then re-arrange dimensions with np.transpose, like so -

            H = data.reshape(N,Nt,N).transpose(0,2,1)
            

            Instead of np.transpose, one can also use np.swapaxes as basically we are swapping axes 1,2 there, like so -

            H = data.reshape(N,Nt,N).swapaxes(1,2)
            

            Sample run -

            In [300]: N = 2
                 ...: Nt = 3
                 ...: data = np.random.randint(0,9,(N*Nt,N))
                 ...: 
            
            In [301]: data
            Out[301]: 
            array([[3, 6],
                   [7, 4],
                   [8, 1],
                   [8, 7],
                   [4, 8],
                   [2, 3]])
            
            In [302]: H = np.zeros((N,N,Nt),dtype=data.dtype)
                 ...: for k in np.arange(N):
                 ...:     for l in np.arange(N):            
                 ...:         for m in np.arange(Nt):    
                 ...:             H[k,l,m] = data[m+Nt*k,l]
                 ...:             
            
            In [303]: H
            Out[303]: 
            array([[[3, 7, 8],
                    [6, 4, 1]],
            
                   [[8, 4, 2],
                    [7, 8, 3]]])
            
            In [304]: data.reshape(N,Nt,N).transpose(0,2,1)
            Out[304]: 
            array([[[3, 7, 8],
                    [6, 4, 1]],
            
                   [[8, 4, 2],
                    [7, 8, 3]]])
            

            Runtime test -

            In [8]: # Input
               ...: N = 10
               ...: Nt = 10*50
               ...: data = np.random.randint(0,9,(N*Nt,N))
               ...: 
               ...: def original_app(data):
               ...:     H = np.zeros((N,N,Nt),dtype=data.dtype)
               ...:     for k in np.arange(N):
               ...:         for l in np.arange(N):            
               ...:             for m in np.arange(Nt):    
               ...:                 H[k,l,m] = data[m+Nt*k,l]
               ...:     return H
               ...: 
            
            In [9]: np.allclose(original_app(data),data.reshape(N,Nt,N).transpose(0,2,1))
            Out[9]: True
            
            In [10]: %timeit original_app(data)
            10 loops, best of 3: 56.1 ms per loop
            
            In [11]: %timeit data.reshape(N,Nt,N).transpose(0,2,1)
            1000000 loops, best of 3: 1.25 µs per loop
            
            qid & accept id: (34585582, 34586887) query: how to mask the specific array data based on the shapefile soup:

            Step 1. Rasterize shapefile

            \n

            Create a function that can determine whether a point at coordinates (x, y) is or is not in the area. See here for more details on how to rasterize your shapefile into an array of the same dimensions as your target mask

            \n
            def point_is_in_mask(mask, point):\n    # this is just pseudocode\n    return mask.contains(point) \n
            \n

            Step 2. Create your mask

            \n
            mask = np.zeros((height, width))\nvalue = np.zeros((height, width))\nfor y in range(height):\n    for x in range(width):\n        if not point_is_in_mask(mask, (x, y)):\n            value[y][x] = np.nan\n
            \n soup wrap:

            Step 1. Rasterize shapefile

            Create a function that can determine whether a point at coordinates (x, y) is or is not in the area. See here for more details on how to rasterize your shapefile into an array of the same dimensions as your target mask

            def point_is_in_mask(mask, point):
                # this is just pseudocode
                return mask.contains(point) 
            

            Step 2. Create your mask

            mask = np.zeros((height, width))
            value = np.zeros((height, width))
            for y in range(height):
                for x in range(width):
                    if not point_is_in_mask(mask, (x, y)):
                        value[y][x] = np.nan
            
            qid & accept id: (34596082, 34596578) query: Writing to a specific column of a text file in python soup:

            This could be done with simple string formatting:

            \n
            arr = ['ABCD', '1', 'P', '15-06-2015', '0', 'Name of the account']\nprint "{:16}{:3}{:3}{:29}{:3}{:40}".format(*arr)\n
            \n

            Values there are not positions but lengths of each item.

            \n

            EDIT

            \n

            If items in your array are not only strings but also numbers it will still work, though you will probably want them left aligned (strings are by default):

            \n
            arr = ['ABCD', 1, 'P', '15-06-2015', 0, 'Name of the account']\nprint "{:16}{:<3}{:3}{:29}{:<3}{:40}".format(*arr)\n
            \n

            Here's doc.

            \n soup wrap:

            This could be done with simple string formatting:

            arr = ['ABCD', '1', 'P', '15-06-2015', '0', 'Name of the account']
            print "{:16}{:3}{:3}{:29}{:3}{:40}".format(*arr)
            

            Values there are not positions but lengths of each item.

            EDIT

            If items in your array are not only strings but also numbers it will still work, though you will probably want them left aligned (strings are by default):

            arr = ['ABCD', 1, 'P', '15-06-2015', 0, 'Name of the account']
            print "{:16}{:<3}{:3}{:29}{:<3}{:40}".format(*arr)
            

            Here's doc.

            qid & accept id: (34598020, 34598497) query: Consolidate duplicate rows of an array soup:

            A pure NumPy solution could work like this (I've named your starting array a):

            \n
            >>> b = a[np.argsort(a[:, 0])]\n>>> grps, idx = np.unique(b[:, 0], return_index=True)\n>>> counts = np.add.reduceat(b[:, 1:], idx)\n>>> np.column_stack((grps, counts))\narray([[117,   1,   1,   0,   0,   1],\n       [120,   0,   1,   1,   0,   0],\n       [163,   1,   0,   0,   0,   0],\n       [189,   0,   0,   0,   1,   0]])\n
            \n

            This returns the rows in sorted order (by label).

            \n

            A solution in pandas is possible in fewer lines (and potentially uses less additional memory than the NumPy method):

            \n
            >>> df = pd.DataFrame(a)\n>>> df.groupby(0, sort=False, as_index=False).sum().values\narray([[117,   1,   1,   0,   0,   1],\n       [163,   1,   0,   0,   0,   0],\n       [120,   0,   1,   1,   0,   0],\n       [189,   0,   0,   0,   1,   0]])\n
            \n

            The sort=False parameter means that the rows are returned in the order the unique labels were first encountered.

            \n soup wrap:

            A pure NumPy solution could work like this (I've named your starting array a):

            >>> b = a[np.argsort(a[:, 0])]
            >>> grps, idx = np.unique(b[:, 0], return_index=True)
            >>> counts = np.add.reduceat(b[:, 1:], idx)
            >>> np.column_stack((grps, counts))
            array([[117,   1,   1,   0,   0,   1],
                   [120,   0,   1,   1,   0,   0],
                   [163,   1,   0,   0,   0,   0],
                   [189,   0,   0,   0,   1,   0]])
            

            This returns the rows in sorted order (by label).

            A solution in pandas is possible in fewer lines (and potentially uses less additional memory than the NumPy method):

            >>> df = pd.DataFrame(a)
            >>> df.groupby(0, sort=False, as_index=False).sum().values
            array([[117,   1,   1,   0,   0,   1],
                   [163,   1,   0,   0,   0,   0],
                   [120,   0,   1,   1,   0,   0],
                   [189,   0,   0,   0,   1,   0]])
            

            The sort=False parameter means that the rows are returned in the order the unique labels were first encountered.

            qid & accept id: (34600056, 34600228) query: Using Pandas to fill NaN entries based on values in a different column, using a dictionary as a guide soup:

            Select the relevant rows using boolean indexing (see docs), and map your dictionary to translate A to B values where necessary:

            \n
            na_map = {"Red": 123, "Green": 456, "Blue": 789}\nmask = df.B.isnull()\n
            \n

            mask looks as follows:

            \n
            0    False\n1    False\n2     True\n3    False\n4    False\n5     True\n6    False\n7     True\n
            \n

            Finally:

            \n
            df.loc[mask, 'B'] = df.loc[mask, 'A'].map(na_map)\n\n       A    B\n0    Red  628\n1    Red  149\n2    Red  123\n3  Green  575\n4  Green  687\n5  Green  456\n6   Blue  159\n7   Blue  789\n
            \n soup wrap:

            Select the relevant rows using boolean indexing (see docs), and map your dictionary to translate A to B values where necessary:

            na_map = {"Red": 123, "Green": 456, "Blue": 789}
            mask = df.B.isnull()
            

            mask looks as follows:

            0    False
            1    False
            2     True
            3    False
            4    False
            5     True
            6    False
            7     True
            

            Finally:

            df.loc[mask, 'B'] = df.loc[mask, 'A'].map(na_map)
            
                   A    B
            0    Red  628
            1    Red  149
            2    Red  123
            3  Green  575
            4  Green  687
            5  Green  456
            6   Blue  159
            7   Blue  789
            
            qid & accept id: (34601770, 34601844) query: Create numpy array based on magnitude of difference between arrays soup:

            Assuming you mean the magnitude of the difference relative to arr_a, use:

            \n
            import numpy as np \n\narr_a = np.random.rand(10) \narr_b = np.random.rand(10)\n\narr_c = np.where((abs(arr_a - arr_b)/arr_a) > 0.3, 1, 0) \n
            \n

            If you want the magnitude of the difference relative to arr_b, use:

            \n
            arr_c = np.where((abs(arr_a - arr_b)/arr_b) > 0.3, 1, 0) \n
            \n soup wrap:

            Assuming you mean the magnitude of the difference relative to arr_a, use:

            import numpy as np 
            
            arr_a = np.random.rand(10) 
            arr_b = np.random.rand(10)
            
            arr_c = np.where((abs(arr_a - arr_b)/arr_a) > 0.3, 1, 0) 
            

            If you want the magnitude of the difference relative to arr_b, use:

            arr_c = np.where((abs(arr_a - arr_b)/arr_b) > 0.3, 1, 0) 
            
            qid & accept id: (34607271, 34607419) query: Is it possible to download apk from google play programmatically to PC? soup:

            Use Google Play Unofficial Python API (github)

            \n

            Using this API you can download APKs using their package name:

            \n
            python download.py com.google.android.gm\n
            \n

            For finding relevant APKs you can use the search or even parse subcategories

            \n
            python search.py earth\npython list.py WEATHER apps_topselling_free\n
            \n soup wrap:

            Use Google Play Unofficial Python API (github)

            Using this API you can download APKs using their package name:

            python download.py com.google.android.gm
            

            For finding relevant APKs you can use the search or even parse subcategories

            python search.py earth
            python list.py WEATHER apps_topselling_free
            
            qid & accept id: (34637002, 34637619) query: Fast and pythonic way to find out if a string is a palindrome soup:

            So, I decided to just timeit, and find which one was the fastest. Note that the final function is a cleaner version of your own pythonicPalindrome. It is defined as follows:

            \n
            def palindrome(s, o):\n    return re.sub("[ ,.;:?!]", "", s.lower()) == re.sub("[ ,.;:?!]", "", o.lower())[::-1]\n
            \n

            Methodology

            \n

            I ran 10 distinct tests per function. In each test run, the function was called 10000 times, with arguments self="aabccccccbaa", other="aabccccccbaa". The results can be found below.

            \n
                        palindrom       iteratorPalindrome      pythonicPalindrome      palindrome  \n1           0.131656638            0.108762937             0.071676536      0.072031984\n2           0.140950052            0.109713793             0.073781851      0.071860462\n3           0.126966087            0.109586756             0.072349792      0.073776719\n4           0.125113136            0.108729573             0.094633969      0.071474645\n5           0.130878159            0.108602964             0.075770395      0.072455015\n6           0.133569472            0.110276694             0.072811747      0.071764222\n7           0.128642812            0.111065438             0.072170571      0.072285204\n8           0.124896702            0.110218949             0.071898959      0.071841214\n9           0.123841905            0.109278358             0.077430437      0.071747112\n10          0.124083576            0.108184210             0.080211147      0.077391086\n\nAVG         0.129059854            0.109441967             0.076273540      0.072662766\nSTDDEV      0.005387429            0.000901370             0.007030835      0.001781309\n
            \n

            It would appear that the cleaner version of your pythonicPalindrome is marginally faster, but both functions clearly outclass the alternatives.

            \n soup wrap:

            So, I decided to just timeit, and find which one was the fastest. Note that the final function is a cleaner version of your own pythonicPalindrome. It is defined as follows:

            def palindrome(s, o):
                return re.sub("[ ,.;:?!]", "", s.lower()) == re.sub("[ ,.;:?!]", "", o.lower())[::-1]
            

            Methodology

            I ran 10 distinct tests per function. In each test run, the function was called 10000 times, with arguments self="aabccccccbaa", other="aabccccccbaa". The results can be found below.

                        palindrom       iteratorPalindrome      pythonicPalindrome      palindrome  
            1           0.131656638            0.108762937             0.071676536      0.072031984
            2           0.140950052            0.109713793             0.073781851      0.071860462
            3           0.126966087            0.109586756             0.072349792      0.073776719
            4           0.125113136            0.108729573             0.094633969      0.071474645
            5           0.130878159            0.108602964             0.075770395      0.072455015
            6           0.133569472            0.110276694             0.072811747      0.071764222
            7           0.128642812            0.111065438             0.072170571      0.072285204
            8           0.124896702            0.110218949             0.071898959      0.071841214
            9           0.123841905            0.109278358             0.077430437      0.071747112
            10          0.124083576            0.108184210             0.080211147      0.077391086
            
            AVG         0.129059854            0.109441967             0.076273540      0.072662766
            STDDEV      0.005387429            0.000901370             0.007030835      0.001781309
            

            It would appear that the cleaner version of your pythonicPalindrome is marginally faster, but both functions clearly outclass the alternatives.

            qid & accept id: (34638457, 34638862) query: How to determine type of nested data structures in Python? soup:

            One way to do it by hand would be:

            \n
            def type_spec_iterable(obj, name):\n    tps = set(type_spec(e) for e in obj)\n    if len(tps) == 1:\n        return name + "<" + next(iter(tps)) + ">"\n    else:\n        return name + ""\n\n\ndef type_spec_dict(obj):\n    tps = set((type_spec(k), type_spec(v)) for (k,v) in obj.iteritems())\n    keytypes = set(k for (k, v) in tps)\n    valtypes =  set(v for (k, v) in tps)\n    kt = next(iter(keytypes)) if len(keytypes) == 1 else "?"\n    vt = next(iter(valtypes)) if len(valtypes) == 1 else "?"\n    return "dict<%s, %s>" % (kt, vt)\n\n\ndef type_spec_tuple(obj):\n    return "tuple<" + ", ".join(type_spec(e) for e in obj) + ">"\n\n\ndef type_spec(obj):\n    t = type(obj)\n    res = {\n        int: "int",\n        str: "str",\n        bool: "bool",\n        float: "float",\n        type(None): "(none)",\n        list: lambda o: type_spec_iterable(o, 'list'),\n        set: lambda o: type_spec_iterable(o, 'set'),\n        dict: type_spec_dict,\n        tuple: type_spec_tuple,\n    }.get(t, lambda o: type(o).__name__)\n    return res if type(res) is str else res(obj)\n\n\nif __name__ == "__main__":\n    class Foo(object):\n        pass\n    for obj in [\n        1,\n        2.3,\n        None,\n        False,\n        "hello",\n        [1, 2, 3],\n        ["a", "b"],\n        [1, "h"],\n        (False, 1, "2"),\n        set([1.2, 2.3, 3.4]),\n        [[1,2,3],[4,5,6],[7,8,9]],\n        [(1,'a'), (2, 'b')],\n        {1:'b', 2:'c'},\n        [Foo()], # todo - inheritance?\n    ]:\n        print repr(obj), ":", type_spec(obj)\n
            \n

            This prints:

            \n
            1 : int\n2.3 : float\nNone : (none)\nFalse : bool\n'hello' : str\n[1, 2, 3] : list\n['a', 'b'] : list\n[1, 'h'] : list\n(False, 1, '2') : tuple\nset([2.3, 1.2, 3.4]) : set\n[[1, 2, 3], [4, 5, 6], [7, 8, 9]] : list>\n[(1, 'a'), (2, 'b')] : list>\n{1: 'b', 2: 'c'} : dict\n[<__main__.Foo object at 0x101de6c50>] : list\n
            \n

            There's a question of how far you want to take it, and how deeply to check, with trade-offs between speed and accuracy. For example, do you want to go through all the items in a large list? Do you want to handle custom types (and tracking down common ancestors of those types)?

            \n

            Worth a read, though I'm not sure it's applicable, this PEP on type hints.

            \n soup wrap:

            One way to do it by hand would be:

            def type_spec_iterable(obj, name):
                tps = set(type_spec(e) for e in obj)
                if len(tps) == 1:
                    return name + "<" + next(iter(tps)) + ">"
                else:
                    return name + ""
            
            
            def type_spec_dict(obj):
                tps = set((type_spec(k), type_spec(v)) for (k,v) in obj.iteritems())
                keytypes = set(k for (k, v) in tps)
                valtypes =  set(v for (k, v) in tps)
                kt = next(iter(keytypes)) if len(keytypes) == 1 else "?"
                vt = next(iter(valtypes)) if len(valtypes) == 1 else "?"
                return "dict<%s, %s>" % (kt, vt)
            
            
            def type_spec_tuple(obj):
                return "tuple<" + ", ".join(type_spec(e) for e in obj) + ">"
            
            
            def type_spec(obj):
                t = type(obj)
                res = {
                    int: "int",
                    str: "str",
                    bool: "bool",
                    float: "float",
                    type(None): "(none)",
                    list: lambda o: type_spec_iterable(o, 'list'),
                    set: lambda o: type_spec_iterable(o, 'set'),
                    dict: type_spec_dict,
                    tuple: type_spec_tuple,
                }.get(t, lambda o: type(o).__name__)
                return res if type(res) is str else res(obj)
            
            
            if __name__ == "__main__":
                class Foo(object):
                    pass
                for obj in [
                    1,
                    2.3,
                    None,
                    False,
                    "hello",
                    [1, 2, 3],
                    ["a", "b"],
                    [1, "h"],
                    (False, 1, "2"),
                    set([1.2, 2.3, 3.4]),
                    [[1,2,3],[4,5,6],[7,8,9]],
                    [(1,'a'), (2, 'b')],
                    {1:'b', 2:'c'},
                    [Foo()], # todo - inheritance?
                ]:
                    print repr(obj), ":", type_spec(obj)
            

            This prints:

            1 : int
            2.3 : float
            None : (none)
            False : bool
            'hello' : str
            [1, 2, 3] : list
            ['a', 'b'] : list
            [1, 'h'] : list
            (False, 1, '2') : tuple
            set([2.3, 1.2, 3.4]) : set
            [[1, 2, 3], [4, 5, 6], [7, 8, 9]] : list>
            [(1, 'a'), (2, 'b')] : list>
            {1: 'b', 2: 'c'} : dict
            [<__main__.Foo object at 0x101de6c50>] : list
            

            There's a question of how far you want to take it, and how deeply to check, with trade-offs between speed and accuracy. For example, do you want to go through all the items in a large list? Do you want to handle custom types (and tracking down common ancestors of those types)?

            Worth a read, though I'm not sure it's applicable, this PEP on type hints.

            qid & accept id: (34672986, 34677248) query: detecting POS tag pattern along with specified words soup:

            Assuming you want to check literally for "would" followed by "be", followed by some adjective, you can do this:

            \n
            def would_be(tagged):\n    return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))\n
            \n

            The input is a POS tagged sentence (list of tuples, as per NLTK).

            \n

            It checks if there are any three elements in the list such that "would" is next to "be" and "be" is next to a word tagged as an adjective ('JJ'). It will return True as soon as this "pattern" is matched.

            \n

            You can do something very similar for the second type of sentence:

            \n
            def am_able_to(tagged):\n    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))\n
            \n

            Here's a driver for the program:

            \n
            s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]\ns2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]\n\ndef would_be(tagged):\n   return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))\n\ndef am_able_to(tagged):\n    return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))\n\nsent1 = ' '.join(s[0] for s in s1)\nsent2 = ' '.join(s[0] for s in s2)\n\nprint("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))\nprint("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))\n\nprint("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))\nprint("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))\n
            \n

            This correctly outputs:

            \n
            Is 'This feature would be nice to have' of type 'would be' + adj? True\nIs 'This feature would be nice to have' of type 'am able to' + verb? False\nIs 'I am able to delete the group functionality' of type 'would be' + adj? False\nIs 'I am able to delete the group functionality' of type 'am able to' + verb? True\n
            \n

            If you'd like to generalize this, you can change whether you're checking the literal words or their POS tag.

            \n soup wrap:

            Assuming you want to check literally for "would" followed by "be", followed by some adjective, you can do this:

            def would_be(tagged):
                return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
            

            The input is a POS tagged sentence (list of tuples, as per NLTK).

            It checks if there are any three elements in the list such that "would" is next to "be" and "be" is next to a word tagged as an adjective ('JJ'). It will return True as soon as this "pattern" is matched.

            You can do something very similar for the second type of sentence:

            def am_able_to(tagged):
                return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))
            

            Here's a driver for the program:

            s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
            s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]
            
            def would_be(tagged):
               return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
            
            def am_able_to(tagged):
                return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))
            
            sent1 = ' '.join(s[0] for s in s1)
            sent2 = ' '.join(s[0] for s in s2)
            
            print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
            print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))
            
            print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
            print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))
            

            This correctly outputs:

            Is 'This feature would be nice to have' of type 'would be' + adj? True
            Is 'This feature would be nice to have' of type 'am able to' + verb? False
            Is 'I am able to delete the group functionality' of type 'would be' + adj? False
            Is 'I am able to delete the group functionality' of type 'am able to' + verb? True
            

            If you'd like to generalize this, you can change whether you're checking the literal words or their POS tag.

            qid & accept id: (34683678, 34683748) query: Python - list of dicts into function that only accepts *dicts soup:

            This should work:

            \n
            multi_dicts(*list_of_dicts)\n
            \n

            Putting an asterisk in front of a list argument will unpack it.

            \n

            So:

            \n
            def foo(*bars):\n    for bar in bars:\n        print(bar)\n\nlist_of_things = ['one', 'two', 'three']\n\nfoo(*list_of_things)\n
            \n

            Will print:

            \n
            one\ntwo\nthree\n
            \n

            This works just as well if the list contains dictionaries.

            \n soup wrap:

            This should work:

            multi_dicts(*list_of_dicts)
            

            Putting an asterisk in front of a list argument will unpack it.

            So:

            def foo(*bars):
                for bar in bars:
                    print(bar)
            
            list_of_things = ['one', 'two', 'three']
            
            foo(*list_of_things)
            

            Will print:

            one
            two
            three
            

            This works just as well if the list contains dictionaries.

            qid & accept id: (34686485, 34687222) query: Execute parsed xml data as command in python soup:

            Indeed concatenating string will just result in another string. To run string containing python codes as python expression, you need to use exec() function, for example :

            \n
            ....\nelse:\n    exec(ui_application.tag)\n
            \n

            or if you expect the expression to return value, you can use eval() instead of exec() :

            \n
            ....\nelse:\n    result = eval(ui_application.tag)\n
            \n soup wrap:

            Indeed concatenating string will just result in another string. To run string containing python codes as python expression, you need to use exec() function, for example :

            ....
            else:
                exec(ui_application.tag)
            

            or if you expect the expression to return value, you can use eval() instead of exec() :

            ....
            else:
                result = eval(ui_application.tag)
            
            qid & accept id: (34687883, 34700345) query: Starting/stopping a background Python process wtihout nohup + ps aux grep + kill soup:

            As far as I know, there are just two (or maybe three or maybe four?) solutions to the problem of running background scripts on remote systems.

            \n

            1) nohup

            \n
            nohup python -u myscript.py > ./mylog.log  2>&1 &\n
            \n

            1 bis) disown

            \n

            Same as above, slightly different because it actually remove the program to the shell job lists, preventing the SIGHUP to be sent.

            \n

            2) screen (or tmux as suggested by neared)

            \n

            Here you will find a starting point for screen.

            \n

            See this post for a great explanation of how background processes works. Another related post.

            \n

            3) Bash

            \n

            Another solution is to write two bash functions that do the job:

            \n
            mynohup () {\n    [[ "$1" = "" ]] && echo "usage: mynohup python_script" && return 0\n    nohup python -u "$1" > "${1%.*}.log" 2>&1 < /dev/null &\n}\n\nmykill() {\n    ps -ef | grep "$1" | grep -v grep | awk '{print $2}' | xargs kill\n    echo "process "$1" killed"\n}\n
            \n

            Just put the above functions in your ~/.bashrc or ~/.bash_profile and use them as normal bash commands.

            \n

            Now you can do exactly what you told:

            \n
            mynohup myscript.py             # will automatically continue running in\n                                # background even if I log out\n\n# two days later, even if I logged out / logged in again the meantime\nmykill myscript.py\n
            \n

            4) Daemon

            \n

            This daemon module is very useful:

            \n
            python myscript.py start\n\npython myscript.py stop\n
            \n soup wrap:

            As far as I know, there are just two (or maybe three or maybe four?) solutions to the problem of running background scripts on remote systems.

            1) nohup

            nohup python -u myscript.py > ./mylog.log  2>&1 &
            

            1 bis) disown

            Same as above, slightly different because it actually remove the program to the shell job lists, preventing the SIGHUP to be sent.

            2) screen (or tmux as suggested by neared)

            Here you will find a starting point for screen.

            See this post for a great explanation of how background processes works. Another related post.

            3) Bash

            Another solution is to write two bash functions that do the job:

            mynohup () {
                [[ "$1" = "" ]] && echo "usage: mynohup python_script" && return 0
                nohup python -u "$1" > "${1%.*}.log" 2>&1 < /dev/null &
            }
            
            mykill() {
                ps -ef | grep "$1" | grep -v grep | awk '{print $2}' | xargs kill
                echo "process "$1" killed"
            }
            

            Just put the above functions in your ~/.bashrc or ~/.bash_profile and use them as normal bash commands.

            Now you can do exactly what you told:

            mynohup myscript.py             # will automatically continue running in
                                            # background even if I log out
            
            # two days later, even if I logged out / logged in again the meantime
            mykill myscript.py
            

            4) Daemon

            This daemon module is very useful:

            python myscript.py start
            
            python myscript.py stop
            
            qid & accept id: (34701261, 34701289) query: What's the convinient way to evaluate multiple string equality in Python? soup:

            You can do

            \n
            if A in ["a", "b", "c"]:\n    # do the thing\n
            \n

            Since you are just returning the truth value, you can do

            \n
            def f(A):\n    return A in ["a", "b", "c"]\n
            \n

            The in operator returns a boolean.

            \n soup wrap:

            You can do

            if A in ["a", "b", "c"]:
                # do the thing
            

            Since you are just returning the truth value, you can do

            def f(A):
                return A in ["a", "b", "c"]
            

            The in operator returns a boolean.

            qid & accept id: (34715227, 34715268) query: how to write two elements into one row in Python soup:

            Use zip

            \n
            x= [['first sentence'],['second sentence'],['third sentence']]\ny= [1,0,1]\n\nfor zx,zy in zip(x, y):\n    print('{}, {}'.format(zx[0], zy))\n
            \n

            output:

            \n
            first sentence, 1\nsecond sentence, 0\nthird sentence, 1\n
            \n soup wrap:

            Use zip

            x= [['first sentence'],['second sentence'],['third sentence']]
            y= [1,0,1]
            
            for zx,zy in zip(x, y):
                print('{}, {}'.format(zx[0], zy))
            

            output:

            first sentence, 1
            second sentence, 0
            third sentence, 1
            
            qid & accept id: (34734933, 34734989) query: Python - Access contents of list after applying Counter from collections module soup:

            The Counter object is a sub-class of a dictionary.

            \n
            \n

            A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.

            \n
            \n

            You can access the elements the same way you would another dictionary:

            \n
            >>> from collections import Counter\n>>> theList = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']\n>>> newList = Counter(theList)\n>>> newList['blue']\n3\n
            \n

            If you want to print the keys and values you can do this:

            \n
            >>> for k,v in newList.items():\n...     print(k,v)\n...\nblue 3\nyellow 1\nred 2\n
            \n soup wrap:

            The Counter object is a sub-class of a dictionary.

            A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.

            You can access the elements the same way you would another dictionary:

            >>> from collections import Counter
            >>> theList = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
            >>> newList = Counter(theList)
            >>> newList['blue']
            3
            

            If you want to print the keys and values you can do this:

            >>> for k,v in newList.items():
            ...     print(k,v)
            ...
            blue 3
            yellow 1
            red 2
            
            qid & accept id: (34755636, 34777741) query: Date removed from x axis on overlaid plots matplotlib soup:

            One option would be to manually specify the x-axis based on the DataFrame index, and then plot directly using matplotlib.

            \n
            import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\n# make up some data\nn = 100\ndates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")\n\ndfs = []\n\nfor i in range(3):\n    df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,\n                      columns = ["FishEffort"] )\n    df.df_name = str(i)\n    dfs.append(df)\n\n# plot it directly using matplotlib instead of through the DataFrame\nfig = plt.figure()\nax = fig.add_subplot()\n\nfor df in dfs:\n    plt.plot(df.index,df["FishEffort"], label = df.df_name)\n\nplt.legend()\nplt.show()\n
            \n

            enter image description here

            \n

            Another option would be to concatenate your DataFrames and plot using Pandas. If you give your "FishEffort" field the correct label name when loading the data or via DataFrame.rename then the labels will be specified automatically.

            \n
            import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nn = 100\ndates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")\n\ndfs = []\n\nfor i in range(3):\n    df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,\n                      columns = ["DataFrame #" + str(i) ] )\n    df.df_name = str(i)\n    dfs.append(df)\n\ndf = pd.concat(dfs, axis = 1)\ndf.plot()\n
            \n

            enter image description here

            \n soup wrap:

            One option would be to manually specify the x-axis based on the DataFrame index, and then plot directly using matplotlib.

            import numpy as np
            import pandas as pd
            import matplotlib.pyplot as plt
            
            # make up some data
            n = 100
            dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
            
            dfs = []
            
            for i in range(3):
                df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
                                  columns = ["FishEffort"] )
                df.df_name = str(i)
                dfs.append(df)
            
            # plot it directly using matplotlib instead of through the DataFrame
            fig = plt.figure()
            ax = fig.add_subplot()
            
            for df in dfs:
                plt.plot(df.index,df["FishEffort"], label = df.df_name)
            
            plt.legend()
            plt.show()
            

            enter image description here

            Another option would be to concatenate your DataFrames and plot using Pandas. If you give your "FishEffort" field the correct label name when loading the data or via DataFrame.rename then the labels will be specified automatically.

            import numpy as np
            import pandas as pd
            import matplotlib.pyplot as plt
            
            n = 100
            dates = pd.date_range(start = "2015-01-01", periods = n, name = "yearDate")
            
            dfs = []
            
            for i in range(3):
                df = pd.DataFrame(data = np.random.random(n)*(i + 1), index = dates,
                                  columns = ["DataFrame #" + str(i) ] )
                df.df_name = str(i)
                dfs.append(df)
            
            df = pd.concat(dfs, axis = 1)
            df.plot()
            

            enter image description here

            qid & accept id: (34769801, 34769912) query: how to pick random items from a list while avoiding picking the same item in a row soup:

            Both work for list of non-unique elements as well:

            \n
            def choice_without_repetition(lst):\n    prev = None\n    while True:\n        i = random.randrange(len(lst))\n        if i != prev:\n            yield lst[i]\n            prev = i\n
            \n

            or

            \n
            def choice_without_repetition(lst):\n    i = 0\n    while True:\n        i = (i + random.randrange(1, len(lst))) % len(lst)\n        yield lst[i]\n
            \n

            Usage:

            \n
            lst = [1,2,3,4,5,6,7,8]\nfor x in choice_without_repetition(lst):\n    print x\n
            \n soup wrap:

            Both work for list of non-unique elements as well:

            def choice_without_repetition(lst):
                prev = None
                while True:
                    i = random.randrange(len(lst))
                    if i != prev:
                        yield lst[i]
                        prev = i
            

            or

            def choice_without_repetition(lst):
                i = 0
                while True:
                    i = (i + random.randrange(1, len(lst))) % len(lst)
                    yield lst[i]
            

            Usage:

            lst = [1,2,3,4,5,6,7,8]
            for x in choice_without_repetition(lst):
                print x
            
            qid & accept id: (34773317, 34774878) query: Python Pandas removing substring using another column soup:

            Here is one solution that is quite a bit faster than your current solution, I'm not convinced that there wouldn't be something faster though

            \n
            In [13]: import numpy as np\n         import pandas as pd\n         n = 1000\n         testing  = pd.DataFrame({'NAME':[\n         'FIRST', np.nan, 'NAME2', 'NAME3', \n         'NAME4', 'NAME5', 'NAME6']*n, 'FULL_NAME':['FIRST LAST', np.nan, 'FIRST  LAST', 'FIRST NAME3', 'FIRST NAME4 LAST', 'ANOTHER NAME', 'LAST NAME']*n})\n
            \n

            This is kind of a long one liner but it should do what you need

            \n

            Fasted solution I can come up with is using replace as mentioned in another answer:

            \n
            In [37]: %timeit testing ['NEW2'] = [e.replace(k, '') for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]\n100 loops, best of 3: 4.67 ms per loop\n
            \n

            Original answer:

            \n
            In [14]: %timeit testing ['NEW'] = [''.join(str(e).split(k)) for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]\n100 loops, best of 3: 7.24 ms per loop\n
            \n

            compared to your current solution:

            \n
            In [16]: %timeit testing['NEW1'] = testing.apply(address_remove, axis=1)\n10 loops, best of 3: 166 ms per loop\n
            \n

            These get you the same answer as your current solution

            \n soup wrap:

            Here is one solution that is quite a bit faster than your current solution, I'm not convinced that there wouldn't be something faster though

            In [13]: import numpy as np
                     import pandas as pd
                     n = 1000
                     testing  = pd.DataFrame({'NAME':[
                     'FIRST', np.nan, 'NAME2', 'NAME3', 
                     'NAME4', 'NAME5', 'NAME6']*n, 'FULL_NAME':['FIRST LAST', np.nan, 'FIRST  LAST', 'FIRST NAME3', 'FIRST NAME4 LAST', 'ANOTHER NAME', 'LAST NAME']*n})
            

            This is kind of a long one liner but it should do what you need

            Fasted solution I can come up with is using replace as mentioned in another answer:

            In [37]: %timeit testing ['NEW2'] = [e.replace(k, '') for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]
            100 loops, best of 3: 4.67 ms per loop
            

            Original answer:

            In [14]: %timeit testing ['NEW'] = [''.join(str(e).split(k)) for e, k in zip(testing.FULL_NAME.astype('str'), testing.NAME.astype('str'))]
            100 loops, best of 3: 7.24 ms per loop
            

            compared to your current solution:

            In [16]: %timeit testing['NEW1'] = testing.apply(address_remove, axis=1)
            10 loops, best of 3: 166 ms per loop
            

            These get you the same answer as your current solution

            qid & accept id: (34777323, 34777621) query: Opening and closing files in a loop soup:

            It is easier to use the tools Python provides:

            \n
            from itertools import groupby\nfrom operator import itemgetter\n\nitems = [\n    ('name1', 10), ('name1', 30),\n    ('name2', 5), ('name2', 7), ('name2', 3),\n    ('name3', 10)\n]\n\nfor name, rows in groupby(items, itemgetter(0)):\n    with open(name + ".txt", "w") as outf:\n        outf.write("\n".join(str(row[1]) for row in rows))\n
            \n

            Edit: to match the updated question, here is the updated solution ;-)

            \n
            for name, records in groupby(SeqIO.parse(some_genbank, 'gb'), lambda record:record.annotations['source']):\n    with open(name + ".faa", "w+") as outf:\n        for record in records:\n            write_all_record(outf, record)\n
            \n soup wrap:

            It is easier to use the tools Python provides:

            from itertools import groupby
            from operator import itemgetter
            
            items = [
                ('name1', 10), ('name1', 30),
                ('name2', 5), ('name2', 7), ('name2', 3),
                ('name3', 10)
            ]
            
            for name, rows in groupby(items, itemgetter(0)):
                with open(name + ".txt", "w") as outf:
                    outf.write("\n".join(str(row[1]) for row in rows))
            

            Edit: to match the updated question, here is the updated solution ;-)

            for name, records in groupby(SeqIO.parse(some_genbank, 'gb'), lambda record:record.annotations['source']):
                with open(name + ".faa", "w+") as outf:
                    for record in records:
                        write_all_record(outf, record)
            
            qid & accept id: (34793225, 34793421) query: Extract from a match to next match if patten found in between soup:

            It isn't recommended to use regex to parse XML - you should use a library such as lxml, which you can install using pip install lxml. Then, you could select the appropriate elements to output using lxml and XPath as follows (I have taken the liberty of closing the tags in your XML):

            \n
            content = '''\n\n\n  Elememt1 Element1\n    abc1 hit 1\n  \n\n\n  Elememt2 Element2\n    abc2 hit 1\n  \n\n\n  Elememt3 Element3\n    abc3 hit 1\n  \n\n\n  Elememt4 Element4\n    abc4 hit 1\n  \n\n\n'''\n\nfrom lxml import etree\n\ntree = etree.XML(content)\ntarget_elements = tree.xpath('//Iteration_hit[contains(., "Element2") or contains(., "Element4")]')\n\nfor element in target_elements:\n    print(etree.tostring(element))\n
            \n

            Output

            \n
            Elememt2 Element2\n    abc2 hit 1\n  \n\nElememt4 Element4\n    abc4 hit 1\n  \n
            \n soup wrap:

            It isn't recommended to use regex to parse XML - you should use a library such as lxml, which you can install using pip install lxml. Then, you could select the appropriate elements to output using lxml and XPath as follows (I have taken the liberty of closing the tags in your XML):

            content = '''
            
            
              Elememt1 Element1
                abc1 hit 1
              
            
            
              Elememt2 Element2
                abc2 hit 1
              
            
            
              Elememt3 Element3
                abc3 hit 1
              
            
            
              Elememt4 Element4
                abc4 hit 1
              
            
            
            '''
            
            from lxml import etree
            
            tree = etree.XML(content)
            target_elements = tree.xpath('//Iteration_hit[contains(., "Element2") or contains(., "Element4")]')
            
            for element in target_elements:
                print(etree.tostring(element))
            

            Output

            Elememt2 Element2
                abc2 hit 1
              
            
            Elememt4 Element4
                abc4 hit 1
              
            
            qid & accept id: (34799167, 34802633) query: Conditionally and interatively calculate column based on value of three columns soup:

            OK here it goes, this is one way of solving your problem. I used a dictionary to hold the values for each combination.

            \n
            xyzdict = {"xx":0.25,\n          "xy":0.25,\n          "xz":0.5,\n          "yx":0.33,\n          "yy":0.33,\n          "yz":0.33,\n          "zx":0.5,\n          "zy":0.5}\n
            \n

            Then, for each 'connection' combination, the first letter was always that same as the first letter for fld1. the second letter was always not fld1. So here is an exhaustive and maybe not pythonic way of calculating your values and storing the combinations' connections values in a dictionary for later use.

            \n
            cnxn = {}\nxyz = ["x","y","z"]\n\nfor combo in xyzdict.keys():\n    #print "the combo is %s" % (combo) #xyzdict[two] #actual value\n    first_letter = combo[0]\n\n    not_second = [combo[0],combo[1]]\n    not_second_letter = list(set(xyz) - set(not_second))\n\n    if len(not_second_letter) > 1:\n        multi_cnxn = []\n        for each_not_second_letter in not_second_letter:\n\n            fwd = ''.join((first_letter,each_not_second_letter))\n            rev = ''.join((each_not_second_letter,first_letter))\n            cnxnval = xyzdict[fwd] * xyzdict[rev]\n\n            multi_cnxn.append(cnxnval)\n\n        rowvalue = xyzdict[combo] + sum(multi_cnxn)\n        cnxn[combo] =rowvalue\n    else:\n        fwd = ''.join((first_letter,not_second_letter[0]))\n        rev = ''.join((not_second_letter[0],first_letter))\n        cnxnval = xyzdict[fwd] * xyzdict[rev]\n\n        rowvalue = xyzdict[combo] + cnxnval\n        cnxn[combo] = rowvalue\n
            \n

            Almost there, define a function check that will pull out your fld1 and fld2 and return the calculated values from cnxn above.

            \n
            def check(fld1,fld2,cnxn_sub):\n    rowpair = ''.join((fld1,fld2))\n    return cnxn_sub[rowpair]\n
            \n

            Finally, a little pandas apply to bring it all home.

            \n
            df['connection'] = df.apply(lambda row: check(row['fld1'], row['fld2'],cnxn), axis=1)\n
            \n

            Here are my results, our "yz" connection is a little off, idk if that is on your end or mine...

            \n
            fld1    fld2    relationship    connection\n0   x   x   0.25    0.5825\n1   x   y   0.25    0.5000\n2   x   z   0.50    0.5825\n3   y   x   0.33    0.4950\n4   y   y   0.33    0.5775\n5   y   z   0.33    0.4125\n6   z   x   0.50    0.6650\n7   z   y   0.50    0.7500\n
            \n

            Good Luck!

            \n soup wrap:

            OK here it goes, this is one way of solving your problem. I used a dictionary to hold the values for each combination.

            xyzdict = {"xx":0.25,
                      "xy":0.25,
                      "xz":0.5,
                      "yx":0.33,
                      "yy":0.33,
                      "yz":0.33,
                      "zx":0.5,
                      "zy":0.5}
            

            Then, for each 'connection' combination, the first letter was always that same as the first letter for fld1. the second letter was always not fld1. So here is an exhaustive and maybe not pythonic way of calculating your values and storing the combinations' connections values in a dictionary for later use.

            cnxn = {}
            xyz = ["x","y","z"]
            
            for combo in xyzdict.keys():
                #print "the combo is %s" % (combo) #xyzdict[two] #actual value
                first_letter = combo[0]
            
                not_second = [combo[0],combo[1]]
                not_second_letter = list(set(xyz) - set(not_second))
            
                if len(not_second_letter) > 1:
                    multi_cnxn = []
                    for each_not_second_letter in not_second_letter:
            
                        fwd = ''.join((first_letter,each_not_second_letter))
                        rev = ''.join((each_not_second_letter,first_letter))
                        cnxnval = xyzdict[fwd] * xyzdict[rev]
            
                        multi_cnxn.append(cnxnval)
            
                    rowvalue = xyzdict[combo] + sum(multi_cnxn)
                    cnxn[combo] =rowvalue
                else:
                    fwd = ''.join((first_letter,not_second_letter[0]))
                    rev = ''.join((not_second_letter[0],first_letter))
                    cnxnval = xyzdict[fwd] * xyzdict[rev]
            
                    rowvalue = xyzdict[combo] + cnxnval
                    cnxn[combo] = rowvalue
            

            Almost there, define a function check that will pull out your fld1 and fld2 and return the calculated values from cnxn above.

            def check(fld1,fld2,cnxn_sub):
                rowpair = ''.join((fld1,fld2))
                return cnxn_sub[rowpair]
            

            Finally, a little pandas apply to bring it all home.

            df['connection'] = df.apply(lambda row: check(row['fld1'], row['fld2'],cnxn), axis=1)
            

            Here are my results, our "yz" connection is a little off, idk if that is on your end or mine...

            fld1    fld2    relationship    connection
            0   x   x   0.25    0.5825
            1   x   y   0.25    0.5000
            2   x   z   0.50    0.5825
            3   y   x   0.33    0.4950
            4   y   y   0.33    0.5775
            5   y   z   0.33    0.4125
            6   z   x   0.50    0.6650
            7   z   y   0.50    0.7500
            

            Good Luck!

            qid & accept id: (34818228, 34818381) query: How to count number of repeated keys in several dictionaries? soup:

            If you can accept slightly different output, this might work for you:

            \n
            from collections import Counter\n\ndicts = [\n    {1: 'url1', 3: 'url2', 7: 'url3', 5: 'url4'},\n    {1: 'url1', 7: 'url3'},\n    {5: 'url4', 10: 'url5'},\n]\n\nresult = Counter()\nfor d in dicts:\n    result.update(d.keys())\n\nprint dict(result)\n
            \n

            Note that has keys and counts, but no values.

            \n

            Alternatively:

            \n
            from collections import Counter\nfrom itertools import chain\n\ndicts = [\n    {1: 'url1', 3: 'url2', 7: 'url3', 5: 'url4'},\n    {1: 'url1', 7: 'url3'},\n    {5: 'url4', 10: 'url5'},\n]\n\nresult = Counter(chain.from_iterable(dicts))\n\nprint dict(result)\n
            \n

            Final version: this one produces exactly your requested output:

            \n
            from collections import Counter\nfrom itertools import chain\n\ndicts = [\n    {1: 'url1', 3: 'url2', 7: 'url3', 5: 'url4'},\n    {1: 'url1', 7: 'url3'},\n    {5: 'url4', 10: 'url5'},\n]\n\nresult = Counter(chain.from_iterable(d.items() for d in dicts))\nresult = {k:[n,v] for ((k,v),n) in result.items()}\n\nprint dict(result)\n
            \n soup wrap:

            If you can accept slightly different output, this might work for you:

            from collections import Counter
            
            dicts = [
                {1: 'url1', 3: 'url2', 7: 'url3', 5: 'url4'},
                {1: 'url1', 7: 'url3'},
                {5: 'url4', 10: 'url5'},
            ]
            
            result = Counter()
            for d in dicts:
                result.update(d.keys())
            
            print dict(result)
            

            Note that has keys and counts, but no values.

            Alternatively:

            from collections import Counter
            from itertools import chain
            
            dicts = [
                {1: 'url1', 3: 'url2', 7: 'url3', 5: 'url4'},
                {1: 'url1', 7: 'url3'},
                {5: 'url4', 10: 'url5'},
            ]
            
            result = Counter(chain.from_iterable(dicts))
            
            print dict(result)
            

            Final version: this one produces exactly your requested output:

            from collections import Counter
            from itertools import chain
            
            dicts = [
                {1: 'url1', 3: 'url2', 7: 'url3', 5: 'url4'},
                {1: 'url1', 7: 'url3'},
                {5: 'url4', 10: 'url5'},
            ]
            
            result = Counter(chain.from_iterable(d.items() for d in dicts))
            result = {k:[n,v] for ((k,v),n) in result.items()}
            
            print dict(result)
            
            qid & accept id: (34824864, 34825852) query: Python list and time soup:

            I am just using it like this, and it works fine (Python 3.3+):

            \n
            from datetime import datetime\ntime_list = []\ntime_list.append(datetime.now())\n
            \n

            If you want to save it or send for other application without converting the types you can think about to use the timestamp rather:

            \n
            from datetime import datetime\ntime_list = []\ntime_list.append(datetime.now().timestamp())\n
            \n soup wrap:

            I am just using it like this, and it works fine (Python 3.3+):

            from datetime import datetime
            time_list = []
            time_list.append(datetime.now())
            

            If you want to save it or send for other application without converting the types you can think about to use the timestamp rather:

            from datetime import datetime
            time_list = []
            time_list.append(datetime.now().timestamp())
            
            qid & accept id: (34837194, 34904945) query: renaming pcraster mapstack soup:

            Here's something I put together, based on some old Python scripts I once wrote:

            \n
            #! /usr/bin/env python\n# Rename PCRaster map stack with names following prefix.yyymmmdd to stack with valid\n# PCRaster time step numbers\n# Johan van der Knijff\n#\n# Example input stack:\n#\n# precip.19810101\n# precip.19810102\n# precip.19810103\n# precip.19810104\n# precip.19810105\n#\n# Then run script with following arguments:\n#\n# python renpcrstack.py precip 1\n#\n# Result:\n#\n# precip00.001\n# precip00.002\n# precip00.003\n# precip00.004\n# precip00.005\n#\n\nimport sys\nimport os\nimport argparse\nimport math\nimport datetime\nimport glob\n\n# Create argument parser\nparser = argparse.ArgumentParser(\n    description="Rename map stack")\n\ndef parseCommandLine():\n    # Add arguments\n    parser.add_argument('prefix',\n                        action="store",\n                        type=str,\n                        help="prefix of input map stack (also used as output prefix)")\n    parser.add_argument('stepStartOut',\n                        action="store",\n                        type=int,\n                        help="time step number that is assigned to first map in output stack")\n\n    # Parse arguments\n    args = parser.parse_args()\n\n    return(args)\n\ndef dateToJulianDay(date):\n\n    # Calculate Julian Day from date\n    # Source: https://en.wikipedia.org/wiki/Julian_day#Converting_Julian_or_Gregorian_calendar_date_to_Julian_day_number\n\n    a = (14 - date.month)/12\n    y = date.year + 4800 - a\n    m = date.month +12*a - 3\n\n    JulianDay = date.day + math.floor((153*m + 2)/5) + 365*y + math.floor(y/4) \\n        - math.floor(y/100) + math.floor(y/400) - 32045\n\n    return(JulianDay)\n\ndef genStackNames(prefix,start,end, stepSize):\n    # Generate list with names of all maps\n    # map name is made up of 11 characters, and chars 8 and 9 are\n    # separated by a dot. Name starts with prefix, ends with time step\n    # number and all character positions in between are filled with zeroes\n\n    # define list that will contain map names\n    listMaps = []\n\n    # Count no chars prefix\n    charsPrefix = len(prefix)\n\n    # Maximum no chars needed for suffix (end step)\n    maxCharsSuffix = len(str(end))\n\n    # No of free positions between pre- and suffix\n    noFreePositions = 11 - charsPrefix - maxCharsSuffix\n\n    # Trim prefix if not enough character positions are available \n    if noFreePositions < 0:\n        # No of chars to cut from prefix if 11-char limit is exceeded\n        charsToCut = charsPrefix + maxCharsSuffix - 11\n        charsToKeep = charsPrefix - charsToCut\n\n        # Updated prefix\n        prefix = prefix[0:charsToKeep]\n\n        # Updated prefix length\n        charsPrefix = len(prefix)\n\n    # Generate name for each step\n\n    for i in range(start,end + 1,stepSize):\n\n        # No of chars in suffix for this step\n        charsSuffix = len(str(i))\n\n        # No of zeroes to fill\n        noZeroes = 11 - charsPrefix - charsSuffix\n\n        # Total no of chars right of prefix\n        charsAfterPrefix = noZeroes + charsSuffix\n\n        # Name of map\n\n        thisName = prefix + (str(i)).zfill(charsAfterPrefix)\n        thisFile = thisName[0:8]+"." + thisName[8:11]\n\n        listMaps.append(thisFile)\n\n    return listMaps    \n\ndef main():\n    # Parse command line arguments\n    args = parseCommandLine()\n    prefix = args.prefix\n    stepStartOut = args.stepStartOut\n\n    # Glob pattern for input maps: prefix + dot + 8 char extension\n    pattern = prefix + ".????????"\n\n    # Get list of all input maps based on glob pattern\n    mapsIn = glob.glob(pattern)\n\n    # Set time format\n    tfmt = "%Y%m%d"\n\n    # Set up dictionary that will act as lookup table between Julian Days (key) \n    # and Date string\n    jDayDate = {}\n\n    for map in mapsIn:\n        baseNameIn = os.path.splitext(map)[0]\n        dateIn = os.path.splitext(map)[1].strip(".")\n\n        # Convert to date / time format\n        dt = datetime.datetime.strptime(dateIn, tfmt)\n\n        # Convert date to Julian day number\n        jDay = int(dateToJulianDay(dt))\n\n        # Store as key-value pair in dictionary\n        jDayDate[jDay] = dateIn\n\n    # Number of input maps (equals number of key-value pairs)\n    noMaps = len(jDayDate)\n\n    # Create list of names for output files \n    mapNamesOut = genStackNames(prefix, stepStartOut, noMaps + stepStartOut -1, 1)\n\n    # Iterate over Julian Days (ascending order)\n\n    i = 0\n\n    for key in sorted(jDayDate):\n        # Name of input file\n        fileIn = prefix + "."+ jDayDate[key]\n\n        # Name of output file\n        fileOut = mapNamesOut[i]\n\n        # Rename file\n        os.rename(fileIn, fileOut)\n\n        print("Renamed " + fileIn + " ---> " + fileOut)\n\n        i += 1\n\nmain()\n
            \n

            (Alternatively download the code from my Github Gist.)

            \n

            You can run it from the command line, using the prefix of your map stack and the number of the first output map as arguments, e.g.:

            \n
            python renpcrmaps.py precip 1\n
            \n

            Please note that the script renames the files in place, so make sure to make a copy of your original map stack in case something goes wrong (I only did some very limited testing on this!).

            \n

            Also, the script assumes a non-sparse input map stack, i.e. in case of daily maps, an input map exists for each day. In case of missing days, the numbering of the output maps will not be what you'd expect.

            \n

            The internal conversion of all dates to Julian Days may be a bit overkill here, but once you start doing more advanced transformations it does make things easier because it gives you decimal numbers which are more straightforward to manipulate than date strings.

            \n soup wrap:

            Here's something I put together, based on some old Python scripts I once wrote:

            #! /usr/bin/env python
            # Rename PCRaster map stack with names following prefix.yyymmmdd to stack with valid
            # PCRaster time step numbers
            # Johan van der Knijff
            #
            # Example input stack:
            #
            # precip.19810101
            # precip.19810102
            # precip.19810103
            # precip.19810104
            # precip.19810105
            #
            # Then run script with following arguments:
            #
            # python renpcrstack.py precip 1
            #
            # Result:
            #
            # precip00.001
            # precip00.002
            # precip00.003
            # precip00.004
            # precip00.005
            #
            
            import sys
            import os
            import argparse
            import math
            import datetime
            import glob
            
            # Create argument parser
            parser = argparse.ArgumentParser(
                description="Rename map stack")
            
            def parseCommandLine():
                # Add arguments
                parser.add_argument('prefix',
                                    action="store",
                                    type=str,
                                    help="prefix of input map stack (also used as output prefix)")
                parser.add_argument('stepStartOut',
                                    action="store",
                                    type=int,
                                    help="time step number that is assigned to first map in output stack")
            
                # Parse arguments
                args = parser.parse_args()
            
                return(args)
            
            def dateToJulianDay(date):
            
                # Calculate Julian Day from date
                # Source: https://en.wikipedia.org/wiki/Julian_day#Converting_Julian_or_Gregorian_calendar_date_to_Julian_day_number
            
                a = (14 - date.month)/12
                y = date.year + 4800 - a
                m = date.month +12*a - 3
            
                JulianDay = date.day + math.floor((153*m + 2)/5) + 365*y + math.floor(y/4) \
                    - math.floor(y/100) + math.floor(y/400) - 32045
            
                return(JulianDay)
            
            def genStackNames(prefix,start,end, stepSize):
                # Generate list with names of all maps
                # map name is made up of 11 characters, and chars 8 and 9 are
                # separated by a dot. Name starts with prefix, ends with time step
                # number and all character positions in between are filled with zeroes
            
                # define list that will contain map names
                listMaps = []
            
                # Count no chars prefix
                charsPrefix = len(prefix)
            
                # Maximum no chars needed for suffix (end step)
                maxCharsSuffix = len(str(end))
            
                # No of free positions between pre- and suffix
                noFreePositions = 11 - charsPrefix - maxCharsSuffix
            
                # Trim prefix if not enough character positions are available 
                if noFreePositions < 0:
                    # No of chars to cut from prefix if 11-char limit is exceeded
                    charsToCut = charsPrefix + maxCharsSuffix - 11
                    charsToKeep = charsPrefix - charsToCut
            
                    # Updated prefix
                    prefix = prefix[0:charsToKeep]
            
                    # Updated prefix length
                    charsPrefix = len(prefix)
            
                # Generate name for each step
            
                for i in range(start,end + 1,stepSize):
            
                    # No of chars in suffix for this step
                    charsSuffix = len(str(i))
            
                    # No of zeroes to fill
                    noZeroes = 11 - charsPrefix - charsSuffix
            
                    # Total no of chars right of prefix
                    charsAfterPrefix = noZeroes + charsSuffix
            
                    # Name of map
            
                    thisName = prefix + (str(i)).zfill(charsAfterPrefix)
                    thisFile = thisName[0:8]+"." + thisName[8:11]
            
                    listMaps.append(thisFile)
            
                return listMaps    
            
            def main():
                # Parse command line arguments
                args = parseCommandLine()
                prefix = args.prefix
                stepStartOut = args.stepStartOut
            
                # Glob pattern for input maps: prefix + dot + 8 char extension
                pattern = prefix + ".????????"
            
                # Get list of all input maps based on glob pattern
                mapsIn = glob.glob(pattern)
            
                # Set time format
                tfmt = "%Y%m%d"
            
                # Set up dictionary that will act as lookup table between Julian Days (key) 
                # and Date string
                jDayDate = {}
            
                for map in mapsIn:
                    baseNameIn = os.path.splitext(map)[0]
                    dateIn = os.path.splitext(map)[1].strip(".")
            
                    # Convert to date / time format
                    dt = datetime.datetime.strptime(dateIn, tfmt)
            
                    # Convert date to Julian day number
                    jDay = int(dateToJulianDay(dt))
            
                    # Store as key-value pair in dictionary
                    jDayDate[jDay] = dateIn
            
                # Number of input maps (equals number of key-value pairs)
                noMaps = len(jDayDate)
            
                # Create list of names for output files 
                mapNamesOut = genStackNames(prefix, stepStartOut, noMaps + stepStartOut -1, 1)
            
                # Iterate over Julian Days (ascending order)
            
                i = 0
            
                for key in sorted(jDayDate):
                    # Name of input file
                    fileIn = prefix + "."+ jDayDate[key]
            
                    # Name of output file
                    fileOut = mapNamesOut[i]
            
                    # Rename file
                    os.rename(fileIn, fileOut)
            
                    print("Renamed " + fileIn + " ---> " + fileOut)
            
                    i += 1
            
            main()
            

            (Alternatively download the code from my Github Gist.)

            You can run it from the command line, using the prefix of your map stack and the number of the first output map as arguments, e.g.:

            python renpcrmaps.py precip 1
            

            Please note that the script renames the files in place, so make sure to make a copy of your original map stack in case something goes wrong (I only did some very limited testing on this!).

            Also, the script assumes a non-sparse input map stack, i.e. in case of daily maps, an input map exists for each day. In case of missing days, the numbering of the output maps will not be what you'd expect.

            The internal conversion of all dates to Julian Days may be a bit overkill here, but once you start doing more advanced transformations it does make things easier because it gives you decimal numbers which are more straightforward to manipulate than date strings.

            qid & accept id: (34845096, 34845233) query: How can I sort a 2D list? soup:

            Assume values stored row-by-row in list, like that:

            \n
            a = [['D', 'C', 'B', 'A'],\n     ['1', '3', '2', '0'],\n     ['1', '3', '2', '0']]\n
            \n

            To sort this array you can use following code:

            \n
            zip(*sorted(zip(*a), key=lambda column: column[0]))\n
            \n

            where column[0] - value to be sorted by (you can use column1 etc.)

            \n

            Output:

            \n
            [('A', 'B', 'C', 'D'),\n ('0', '2', '3', '1'),\n ('0', '2', '3', '1')]\n
            \n

            Note:\nIf you are working with pretty big arrays and execution time does matter, consider using numpy, it has appropriate method: NumPy sort

            \n soup wrap:

            Assume values stored row-by-row in list, like that:

            a = [['D', 'C', 'B', 'A'],
                 ['1', '3', '2', '0'],
                 ['1', '3', '2', '0']]
            

            To sort this array you can use following code:

            zip(*sorted(zip(*a), key=lambda column: column[0]))
            

            where column[0] - value to be sorted by (you can use column1 etc.)

            Output:

            [('A', 'B', 'C', 'D'),
             ('0', '2', '3', '1'),
             ('0', '2', '3', '1')]
            

            Note: If you are working with pretty big arrays and execution time does matter, consider using numpy, it has appropriate method: NumPy sort

            qid & accept id: (34859683, 34859795) query: Reorder a dictionary to fit a data frame soup:

            Starting with a dict of two names and 10 different links each:

            \n
            d = {'Name 1': ['link{}'.format(l) for l in list(range(10))], 'Name 2': ['link{}'.format(l) for l in list(range(10, 20))]}\n\n{'Name 1': ['link0', 'link1', 'link2', 'link3', 'link4', 'link5', 'link6', 'link7', 'link8', 'link9'], 'Name 2': ['link10', 'link11', 'link12', 'link13', 'link14', 'link15', 'link16', 'link17', 'link18', 'link19']}\n
            \n

            You could create a DataFrame .from_dict(), .stack(), and clean up the index:

            \n
            df = pd.DataFrame.from_dict(d, orient='index').stack().reset_index(1, drop=True).to_frame().reset_index()\ndf.columns = ['name', 'link']\n
            \n

            to get:

            \n
                  name    link\n0   Name 1   link0\n1   Name 1   link1\n2   Name 1   link2\n3   Name 1   link3\n4   Name 1   link4\n5   Name 1   link5\n6   Name 1   link6\n7   Name 1   link7\n8   Name 1   link8\n9   Name 1   link9\n10  Name 2  link10\n11  Name 2  link11\n12  Name 2  link12\n13  Name 2  link13\n14  Name 2  link14\n15  Name 2  link15\n16  Name 2  link16\n17  Name 2  link17\n18  Name 2  link18\n19  Name 2  link19\n
            \n soup wrap:

            Starting with a dict of two names and 10 different links each:

            d = {'Name 1': ['link{}'.format(l) for l in list(range(10))], 'Name 2': ['link{}'.format(l) for l in list(range(10, 20))]}
            
            {'Name 1': ['link0', 'link1', 'link2', 'link3', 'link4', 'link5', 'link6', 'link7', 'link8', 'link9'], 'Name 2': ['link10', 'link11', 'link12', 'link13', 'link14', 'link15', 'link16', 'link17', 'link18', 'link19']}
            

            You could create a DataFrame .from_dict(), .stack(), and clean up the index:

            df = pd.DataFrame.from_dict(d, orient='index').stack().reset_index(1, drop=True).to_frame().reset_index()
            df.columns = ['name', 'link']
            

            to get:

                  name    link
            0   Name 1   link0
            1   Name 1   link1
            2   Name 1   link2
            3   Name 1   link3
            4   Name 1   link4
            5   Name 1   link5
            6   Name 1   link6
            7   Name 1   link7
            8   Name 1   link8
            9   Name 1   link9
            10  Name 2  link10
            11  Name 2  link11
            12  Name 2  link12
            13  Name 2  link13
            14  Name 2  link14
            15  Name 2  link15
            16  Name 2  link16
            17  Name 2  link17
            18  Name 2  link18
            19  Name 2  link19
            
            qid & accept id: (34871024, 34871529) query: Combine methods with identical structure but different parameters soup:

            It may be easier to just combine the storage into a dictionary within your own class ...

            \n
            self.storage = {'key_A':[], 'key_B':[]}\n
            \n

            Then, use one function ...

            \n
            def method(self, key):\n    some_list = list(irrelevant_extraction_function(key, self.some_dict))\n    self.storage[key] = [item['address'] for item in some_list]\n
            \n soup wrap:

            It may be easier to just combine the storage into a dictionary within your own class ...

            self.storage = {'key_A':[], 'key_B':[]}
            

            Then, use one function ...

            def method(self, key):
                some_list = list(irrelevant_extraction_function(key, self.some_dict))
                self.storage[key] = [item['address'] for item in some_list]
            
            qid & accept id: (34906231, 34906294) query: How to return both string and value within HttpResponse? soup:

            your client (web browser) expects one string as response:

            \n
            return HttpResponse("a string: {}".format(val))\n
            \n

            or use JSON for the response:

            \n
            return JsonResponse({'message': 'a string', 'val': val})\n
            \n

            or, to send a variable to the next Django view:

            \n
            def my_view(request):\n    if request.session.get('val', None):\n        # do something with the 'val' variable.\n    else:\n        request.session['val'] = 'somevalue'\n        return HttpResponse('some message')\n
            \n

            More about Django session here.

            \n soup wrap:

            your client (web browser) expects one string as response:

            return HttpResponse("a string: {}".format(val))
            

            or use JSON for the response:

            return JsonResponse({'message': 'a string', 'val': val})
            

            or, to send a variable to the next Django view:

            def my_view(request):
                if request.session.get('val', None):
                    # do something with the 'val' variable.
                else:
                    request.session['val'] = 'somevalue'
                    return HttpResponse('some message')
            

            More about Django session here.

            qid & accept id: (34906286, 34907013) query: Groupby in a list for python soup:

            There is indeed a groupby method in itertools, just be aware that it requires the data to be sorted beforehand, see the documentation here https://docs.python.org/2/library/itertools.html#itertools.groupby

            \n

            But from the code you posted, it looks like you don't really need to group, you just want to count, right? Then you may better use collections.Counter. Note that it requires the items to be hashable so you'd want to convert those lists into tuples.

            \n
            >>> lst = [tuple(i) for i in ls]\n>>> collections.Counter(lst)\nCounter({('A', 4): 2, ('F', 3): 1, ('B', 1): 1, ('B', 4): 1})\n
            \n

            Regarding efficiency... Not sure you'd fare very well loading the whole dataset in memory but you could use the defaultdict approach described by Vlad with an iterator.

            \n

            About the averages, if you really want to use groupby then you could do something like this:

            \n
            >>> def average(lst):\n...     return 1.0*sum(lst)/len(lst) if lst else 0.0\n>>> [(i[0],average([j[1] for j in i[1]])) for i in itertools.groupby(sorted(ls),key=lambda i:i[0])]\n[('A', 4.0), ('B', 2.5), ('F', 3.0)]\n
            \n soup wrap:

            There is indeed a groupby method in itertools, just be aware that it requires the data to be sorted beforehand, see the documentation here https://docs.python.org/2/library/itertools.html#itertools.groupby

            But from the code you posted, it looks like you don't really need to group, you just want to count, right? Then you may better use collections.Counter. Note that it requires the items to be hashable so you'd want to convert those lists into tuples.

            >>> lst = [tuple(i) for i in ls]
            >>> collections.Counter(lst)
            Counter({('A', 4): 2, ('F', 3): 1, ('B', 1): 1, ('B', 4): 1})
            

            Regarding efficiency... Not sure you'd fare very well loading the whole dataset in memory but you could use the defaultdict approach described by Vlad with an iterator.

            About the averages, if you really want to use groupby then you could do something like this:

            >>> def average(lst):
            ...     return 1.0*sum(lst)/len(lst) if lst else 0.0
            >>> [(i[0],average([j[1] for j in i[1]])) for i in itertools.groupby(sorted(ls),key=lambda i:i[0])]
            [('A', 4.0), ('B', 2.5), ('F', 3.0)]
            
            qid & accept id: (34910228, 34910562) query: Change object's variable from different file soup:

            Problem

            \n

            You need to hand over the object as argument.

            \n

            In the function:

            \n
            def knightofni(obj):\n    obj.number = 1\n    obj.who = "We've encountered Knight of Ni."\n
            \n

            and when using it in the class:

            \n
            enemies.knightofni(self)\n
            \n

            Do the same for frenchman().

            \n

            Full code

            \n

            grail.py

            \n
            import enemies\n\nclass Encounter:\n    def __init__(self):\n        self.counter = 1\n        self.number = 0\n        self.who = "We've encountered no one."\n\n    def forward(self):\n        if self.counter == 1:\n            enemies.knightofni(self)\n        elif self.counter == 2:\n            enemies.frenchman(self)\n        else:\n            self.number = 42\n            self.who = "We've found the Grail!"\n        self.counter += 1\n\nknight = Encounter()\nfor i in range(4):\n    print(str(knight.number) + " " + knight.who)\n    knight.forward()\n
            \n

            and enemies.py:

            \n
            def knightofni(obj):\n    obj.number = 1\n    obj.who = "We've encountered Knight of Ni."\n\ndef frenchman(obj):\n    obj.number = 4\n    obj.who = "We've encountered French."\n
            \n

            Output:

            \n
            0 We've encountered no one.\n1 We've encountered Knight of Ni.\n4 We've encountered French.\n42 We've found the Grail!\n
            \n soup wrap:

            Problem

            You need to hand over the object as argument.

            In the function:

            def knightofni(obj):
                obj.number = 1
                obj.who = "We've encountered Knight of Ni."
            

            and when using it in the class:

            enemies.knightofni(self)
            

            Do the same for frenchman().

            Full code

            grail.py

            import enemies
            
            class Encounter:
                def __init__(self):
                    self.counter = 1
                    self.number = 0
                    self.who = "We've encountered no one."
            
                def forward(self):
                    if self.counter == 1:
                        enemies.knightofni(self)
                    elif self.counter == 2:
                        enemies.frenchman(self)
                    else:
                        self.number = 42
                        self.who = "We've found the Grail!"
                    self.counter += 1
            
            knight = Encounter()
            for i in range(4):
                print(str(knight.number) + " " + knight.who)
                knight.forward()
            

            and enemies.py:

            def knightofni(obj):
                obj.number = 1
                obj.who = "We've encountered Knight of Ni."
            
            def frenchman(obj):
                obj.number = 4
                obj.who = "We've encountered French."
            

            Output:

            0 We've encountered no one.
            1 We've encountered Knight of Ni.
            4 We've encountered French.
            42 We've found the Grail!
            
            qid & accept id: (34914655, 35244267) query: Python C API - How to construct object from PyObject soup:

            If a boost::python::object references a type, then invoking it will construct an object with the referenced type:

            \n
            boost::python::object type = /* Py_TYPE */;\nboost::python::object object = type(); // isinstance(object, type) == True\n
            \n

            As nearly everything in Python is an object, accepting arguments from Python as boost::python::object will allow any type of object, even those that are not a type. As long as the object is callable (__call___), then the code will succeed.

            \n
            \n

            On the other hand, if you want to guarantee that a type is provided, then one solution is to create a C++ type that represents a Python type, accept it as an argument, and use a custom converter to construct the C++ type only if a Python type is provided.

            \n

            The following type_object C++ type represents a Python object that is a Py_TYPE.

            \n
            /// @brief boost::python::object that refers to a type.\nstruct type_object: \n  public boost::python::object\n{\n  /// @brief If the object is a type, then refer to it.  Otherwise,\n  ///        refer to the instance's type.\n  explicit\n  type_object(boost::python::object object):\n    boost::python::object(object)\n  {\n    if (!PyType_Check(object.ptr()))\n    {\n      throw std::invalid_argument("type_object requires a Python type");\n    }\n  }\n};\n\n...\n\n// Only accepts a Python type.\nvoid add_component(type_object type) { ... }\n
            \n

            The following custom converter will only construct a type_object instance if it is provided a PyObject* that is a Py_TYPE:

            \n
            /// @brief Enable automatic conversions to type_object.\nstruct enable_type_object\n{\n  enable_type_object()\n  {\n    boost::python::converter::registry::push_back(\n      &convertible,\n      &construct,\n      boost::python::type_id());\n  }\n\n  static void* convertible(PyObject* object)\n  {\n    return PyType_Check(object) ? object : NULL;\n  }\n\n  static void construct(\n    PyObject* object,\n    boost::python::converter::rvalue_from_python_stage1_data* data)\n  {\n    // Obtain a handle to the memory block that the converter has allocated\n    // for the C++ type.\n    namespace python = boost::python;\n    typedef python::converter::rvalue_from_python_storage\n                                                                 storage_type;\n    void* storage = reinterpret_cast(data)->storage.bytes;\n\n    // Construct the type object within the storage.  Object is a borrowed \n    // reference, so create a handle indicting it is borrowed for proper\n    // reference counting.\n    python::handle<> handle(python::borrowed(object));\n    new (storage) type_object(python::object(handle));\n\n    // Set convertible to indicate success. \n    data->convertible = storage;\n  }\n};\n\n...\n\nBOOST_PYTHON_MODULE(...)\n{\n  enable_type_object(); // register type_object converter.\n}\n
            \n
            \n

            Here is a complete example demonstrating exposing a function that requires a Python type, then constructs an instance of the type:

            \n
            #include \n#include  // std::invalid_argument\n#include \n\n/// @brief boost::python::object that refers to a type.\nstruct type_object: \n  public boost::python::object\n{\n  /// @brief If the object is a type, then refer to it.  Otherwise,\n  ///        refer to the instance's type.\n  explicit\n  type_object(boost::python::object object):\n    boost::python::object(object)\n  {\n    if (!PyType_Check(object.ptr()))\n    {\n      throw std::invalid_argument("type_object requires a Python type");\n    }\n  }\n};\n\n/// @brief Enable automatic conversions to type_object.\nstruct enable_type_object\n{\n  enable_type_object()\n  {\n    boost::python::converter::registry::push_back(\n      &convertible,\n      &construct,\n      boost::python::type_id());\n  }\n\n  static void* convertible(PyObject* object)\n  {\n    return PyType_Check(object) ? object : NULL;\n  }\n\n  static void construct(\n    PyObject* object,\n    boost::python::converter::rvalue_from_python_stage1_data* data)\n  {\n    // Obtain a handle to the memory block that the converter has allocated\n    // for the C++ type.\n    namespace python = boost::python;\n    typedef python::converter::rvalue_from_python_storage\n                                                                 storage_type;\n    void* storage = reinterpret_cast(data)->storage.bytes;\n\n    // Construct the type object within the storage.  Object is a borrowed \n    // reference, so create a handle indicting it is borrowed for proper\n    // reference counting.\n    python::handle<> handle(python::borrowed(object));\n    new (storage) type_object(python::object(handle));\n\n    // Set convertible to indicate success. \n    data->convertible = storage;\n  }\n};\n\n// Mock API.\nstruct GameObject {};\nstruct CameraComponent\n{\n  CameraComponent()\n  {\n    std::cout << "CameraComponent()" << std::endl;\n  }\n};\n\nboost::python::object add_component(GameObject& /* self */, type_object type)\n{\n  auto constructed_type = type();\n  return constructed_type;\n}\n\nBOOST_PYTHON_MODULE(example)\n{\n  namespace python = boost::python;\n\n  // Enable receiving type_object as arguments.\n  enable_type_object();\n\n  python::class_("GameObject")\n    .def("add_component", &add_component);\n\n  python::class_("CameraComponent");\n}\n
            \n

            Interactive usage:

            \n
            >>> import example\n>>> game = example.GameObject()\n>>> component = game.add_component(example.CameraComponent)\nCameraComponent()\n>>> assert(isinstance(component, example.CameraComponent))\n>>> try:\n...     game.add_component(component) # throws Boost.Python.ArgumentError\n...     assert(False)\n... except TypeError:\n...     assert(True)\n...\n
            \n soup wrap:

            If a boost::python::object references a type, then invoking it will construct an object with the referenced type:

            boost::python::object type = /* Py_TYPE */;
            boost::python::object object = type(); // isinstance(object, type) == True
            

            As nearly everything in Python is an object, accepting arguments from Python as boost::python::object will allow any type of object, even those that are not a type. As long as the object is callable (__call___), then the code will succeed.


            On the other hand, if you want to guarantee that a type is provided, then one solution is to create a C++ type that represents a Python type, accept it as an argument, and use a custom converter to construct the C++ type only if a Python type is provided.

            The following type_object C++ type represents a Python object that is a Py_TYPE.

            /// @brief boost::python::object that refers to a type.
            struct type_object: 
              public boost::python::object
            {
              /// @brief If the object is a type, then refer to it.  Otherwise,
              ///        refer to the instance's type.
              explicit
              type_object(boost::python::object object):
                boost::python::object(object)
              {
                if (!PyType_Check(object.ptr()))
                {
                  throw std::invalid_argument("type_object requires a Python type");
                }
              }
            };
            
            ...
            
            // Only accepts a Python type.
            void add_component(type_object type) { ... }
            

            The following custom converter will only construct a type_object instance if it is provided a PyObject* that is a Py_TYPE:

            /// @brief Enable automatic conversions to type_object.
            struct enable_type_object
            {
              enable_type_object()
              {
                boost::python::converter::registry::push_back(
                  &convertible,
                  &construct,
                  boost::python::type_id());
              }
            
              static void* convertible(PyObject* object)
              {
                return PyType_Check(object) ? object : NULL;
              }
            
              static void construct(
                PyObject* object,
                boost::python::converter::rvalue_from_python_stage1_data* data)
              {
                // Obtain a handle to the memory block that the converter has allocated
                // for the C++ type.
                namespace python = boost::python;
                typedef python::converter::rvalue_from_python_storage
                                                                             storage_type;
                void* storage = reinterpret_cast(data)->storage.bytes;
            
                // Construct the type object within the storage.  Object is a borrowed 
                // reference, so create a handle indicting it is borrowed for proper
                // reference counting.
                python::handle<> handle(python::borrowed(object));
                new (storage) type_object(python::object(handle));
            
                // Set convertible to indicate success. 
                data->convertible = storage;
              }
            };
            
            ...
            
            BOOST_PYTHON_MODULE(...)
            {
              enable_type_object(); // register type_object converter.
            }
            

            Here is a complete example demonstrating exposing a function that requires a Python type, then constructs an instance of the type:

            #include 
            #include  // std::invalid_argument
            #include 
            
            /// @brief boost::python::object that refers to a type.
            struct type_object: 
              public boost::python::object
            {
              /// @brief If the object is a type, then refer to it.  Otherwise,
              ///        refer to the instance's type.
              explicit
              type_object(boost::python::object object):
                boost::python::object(object)
              {
                if (!PyType_Check(object.ptr()))
                {
                  throw std::invalid_argument("type_object requires a Python type");
                }
              }
            };
            
            /// @brief Enable automatic conversions to type_object.
            struct enable_type_object
            {
              enable_type_object()
              {
                boost::python::converter::registry::push_back(
                  &convertible,
                  &construct,
                  boost::python::type_id());
              }
            
              static void* convertible(PyObject* object)
              {
                return PyType_Check(object) ? object : NULL;
              }
            
              static void construct(
                PyObject* object,
                boost::python::converter::rvalue_from_python_stage1_data* data)
              {
                // Obtain a handle to the memory block that the converter has allocated
                // for the C++ type.
                namespace python = boost::python;
                typedef python::converter::rvalue_from_python_storage
                                                                             storage_type;
                void* storage = reinterpret_cast(data)->storage.bytes;
            
                // Construct the type object within the storage.  Object is a borrowed 
                // reference, so create a handle indicting it is borrowed for proper
                // reference counting.
                python::handle<> handle(python::borrowed(object));
                new (storage) type_object(python::object(handle));
            
                // Set convertible to indicate success. 
                data->convertible = storage;
              }
            };
            
            // Mock API.
            struct GameObject {};
            struct CameraComponent
            {
              CameraComponent()
              {
                std::cout << "CameraComponent()" << std::endl;
              }
            };
            
            boost::python::object add_component(GameObject& /* self */, type_object type)
            {
              auto constructed_type = type();
              return constructed_type;
            }
            
            BOOST_PYTHON_MODULE(example)
            {
              namespace python = boost::python;
            
              // Enable receiving type_object as arguments.
              enable_type_object();
            
              python::class_("GameObject")
                .def("add_component", &add_component);
            
              python::class_("CameraComponent");
            }
            

            Interactive usage:

            >>> import example
            >>> game = example.GameObject()
            >>> component = game.add_component(example.CameraComponent)
            CameraComponent()
            >>> assert(isinstance(component, example.CameraComponent))
            >>> try:
            ...     game.add_component(component) # throws Boost.Python.ArgumentError
            ...     assert(False)
            ... except TypeError:
            ...     assert(True)
            ...
            
            qid & accept id: (34929717, 34929764) query: Get list of column names for columns that contain negative values soup:

            You can select them by building an appropriate Series and then using it to index into df:

            \n
            >>> df < 0\n    fld1   fld2   fld3   fld4   fld5   fld6   fld7\n0  False  False   True  False  False  False  False\n1  False  False  False  False  False   True  False\n2  False  False  False  False  False  False  False\n3   True   True  False  False  False  False  False\n4  False  False  False  False  False  False  False\n5   True  False  False  False  False  False  False\n6  False  False   True  False  False  False  False\n7  False  False  False  False  False  False  False\n8  False  False  False  False  False   True  False\n9  False  False  False  False  False  False  False\n>>> (df < 0).any()\nfld1     True\nfld2     True\nfld3     True\nfld4    False\nfld5    False\nfld6     True\nfld7    False\ndtype: bool\n
            \n

            and then

            \n
            >>> df.columns[(df < 0).any()]\nIndex(['fld1', 'fld2', 'fld3', 'fld6'], dtype='object')\n
            \n

            or

            \n
            >>> df.columns[(df < 0).any()].tolist()\n['fld1', 'fld2', 'fld3', 'fld6']\n
            \n

            depending on what data structure you want. We can also use this io index into df directly:

            \n
            >>> df.loc[:,(df < 0).any()]\n   fld1  fld2  fld3  fld6\n0     8     8    -1     7\n1     6     6     1    -1\n2     2     5     4     8\n3    -1    -1     7     2\n4     6     6     4     5\n5    -1     5     7     8\n6     7     1    -1     8\n7     6     2     4     6\n8     3     4     4    -1\n9     4     4     3     4\n
            \n soup wrap:

            You can select them by building an appropriate Series and then using it to index into df:

            >>> df < 0
                fld1   fld2   fld3   fld4   fld5   fld6   fld7
            0  False  False   True  False  False  False  False
            1  False  False  False  False  False   True  False
            2  False  False  False  False  False  False  False
            3   True   True  False  False  False  False  False
            4  False  False  False  False  False  False  False
            5   True  False  False  False  False  False  False
            6  False  False   True  False  False  False  False
            7  False  False  False  False  False  False  False
            8  False  False  False  False  False   True  False
            9  False  False  False  False  False  False  False
            >>> (df < 0).any()
            fld1     True
            fld2     True
            fld3     True
            fld4    False
            fld5    False
            fld6     True
            fld7    False
            dtype: bool
            

            and then

            >>> df.columns[(df < 0).any()]
            Index(['fld1', 'fld2', 'fld3', 'fld6'], dtype='object')
            

            or

            >>> df.columns[(df < 0).any()].tolist()
            ['fld1', 'fld2', 'fld3', 'fld6']
            

            depending on what data structure you want. We can also use this io index into df directly:

            >>> df.loc[:,(df < 0).any()]
               fld1  fld2  fld3  fld6
            0     8     8    -1     7
            1     6     6     1    -1
            2     2     5     4     8
            3    -1    -1     7     2
            4     6     6     4     5
            5    -1     5     7     8
            6     7     1    -1     8
            7     6     2     4     6
            8     3     4     4    -1
            9     4     4     3     4
            
            qid & accept id: (34930630, 34930706) query: grouping an unknown number of arguments with argparse soup:

            You can use nargs=4 with an 'append' action:

            \n
            import argparse\n\nparser = argparse.ArgumentParser()\nparser.add_argument('--group', nargs=4, action='append')\n\nprint parser.parse_args()\n
            \n

            It'd be called as:

            \n
            $ python ~/sandbox/test.py --group 1 2 3 4 --group 1 2 3 4\nNamespace(group=[['1', '2', '3', '4'], ['1', '2', '3', '4']])\n
            \n

            From here, you can parse the key-value pairs if you'd like.

            \n
            \n

            Another option is to use a custom action to do the parsing -- Here's a simple one which accepts arguments of the form --group key:value key2:value2 ... --group ...

            \n
            import argparse\n\nclass DictAction(argparse.Action):\n    def __init__(self, *args, **kwargs):\n        super(DictAction, self).__init__(*args, **kwargs)\n        self.nargs = '*'\n\n    def __call__(self, parser, namespace, values, option_string=None):\n        # The default value is often set to `None` rather than an empty list.\n        current_arg_vals = getattr(namespace, self.dest, []) or []\n        setattr(namespace, self.dest, current_arg_vals)\n        arg_vals = getattr(namespace, self.dest)\n        arg_vals.append(dict(v.split(':') for v in values))\n\nparser = argparse.ArgumentParser()\nparser.add_argument('--group', action=DictAction)\n\nprint parser.parse_args()\n
            \n

            This has no checking (so the user could get funny TypeErrors if the key:value are not formatted properly) and if you want to restrict it to specified keys, you'll need to build that in as well... but those details should be easy enough to add. You could also require that they provide 4 values using self.nargs = 4 in DictAction.__init__.

            \n soup wrap:

            You can use nargs=4 with an 'append' action:

            import argparse
            
            parser = argparse.ArgumentParser()
            parser.add_argument('--group', nargs=4, action='append')
            
            print parser.parse_args()
            

            It'd be called as:

            $ python ~/sandbox/test.py --group 1 2 3 4 --group 1 2 3 4
            Namespace(group=[['1', '2', '3', '4'], ['1', '2', '3', '4']])
            

            From here, you can parse the key-value pairs if you'd like.


            Another option is to use a custom action to do the parsing -- Here's a simple one which accepts arguments of the form --group key:value key2:value2 ... --group ...

            import argparse
            
            class DictAction(argparse.Action):
                def __init__(self, *args, **kwargs):
                    super(DictAction, self).__init__(*args, **kwargs)
                    self.nargs = '*'
            
                def __call__(self, parser, namespace, values, option_string=None):
                    # The default value is often set to `None` rather than an empty list.
                    current_arg_vals = getattr(namespace, self.dest, []) or []
                    setattr(namespace, self.dest, current_arg_vals)
                    arg_vals = getattr(namespace, self.dest)
                    arg_vals.append(dict(v.split(':') for v in values))
            
            parser = argparse.ArgumentParser()
            parser.add_argument('--group', action=DictAction)
            
            print parser.parse_args()
            

            This has no checking (so the user could get funny TypeErrors if the key:value are not formatted properly) and if you want to restrict it to specified keys, you'll need to build that in as well... but those details should be easy enough to add. You could also require that they provide 4 values using self.nargs = 4 in DictAction.__init__.

            qid & accept id: (34931548, 34931962) query: How to get the 'cardinal' day of the year in Pandas? soup:

            To make a column which is a running count which resets each year, you could use groupby/cumcount:

            \n
            df['C'] = df.groupby(df.index.year).cumcount(1)+1\n
            \n
            \n

            For example,

            \n
            df = pd.DataFrame({\n    'Close': [16.66, 16.85, 16.93, 16.98, 17.08, 17.03, 17.09, 16.76, 16.67, 16.71, 20],\n    'Date': ['1950-01-03', '1950-01-04', '1950-01-05', '1950-01-06', '1950-01-09', \n             '1950-01-10', '1950-01-11', '1950-01-12', '1950-01-13', '1950-01-16',\n             '1951-01-01'], })\ndf['Date'] = pd.to_datetime(df['Date'])\ndf = df.set_index('Date')\n\ndf['O'] = df.index.day\ndf['C'] = df.groupby(df.index.year).cumcount(1)+1\n
            \n

            yields

            \n
                        Close   O   C\nDate                     \n1950-01-03  16.66   3   1\n1950-01-04  16.85   4   2\n1950-01-05  16.93   5   3\n1950-01-06  16.98   6   4\n1950-01-09  17.08   9   5\n1950-01-10  17.03  10   6\n1950-01-11  17.09  11   7\n1950-01-12  16.76  12   8\n1950-01-13  16.67  13   9\n1950-01-16  16.71  16  10\n1951-01-01  20.00   1   1\n
            \n soup wrap:

            To make a column which is a running count which resets each year, you could use groupby/cumcount:

            df['C'] = df.groupby(df.index.year).cumcount(1)+1
            

            For example,

            df = pd.DataFrame({
                'Close': [16.66, 16.85, 16.93, 16.98, 17.08, 17.03, 17.09, 16.76, 16.67, 16.71, 20],
                'Date': ['1950-01-03', '1950-01-04', '1950-01-05', '1950-01-06', '1950-01-09', 
                         '1950-01-10', '1950-01-11', '1950-01-12', '1950-01-13', '1950-01-16',
                         '1951-01-01'], })
            df['Date'] = pd.to_datetime(df['Date'])
            df = df.set_index('Date')
            
            df['O'] = df.index.day
            df['C'] = df.groupby(df.index.year).cumcount(1)+1
            

            yields

                        Close   O   C
            Date                     
            1950-01-03  16.66   3   1
            1950-01-04  16.85   4   2
            1950-01-05  16.93   5   3
            1950-01-06  16.98   6   4
            1950-01-09  17.08   9   5
            1950-01-10  17.03  10   6
            1950-01-11  17.09  11   7
            1950-01-12  16.76  12   8
            1950-01-13  16.67  13   9
            1950-01-16  16.71  16  10
            1951-01-01  20.00   1   1
            
            qid & accept id: (34936864, 34937227) query: Remove double and single square brackets from text file generated from python soup:

            If you just want to remove all instances of square brackets from a string, you can do the following:

            \n
            s = "[[ hello] [there]]"\ns = s.replace("[", "")\ns = s.replace("]", "")\n
            \n

            UPDATE:

            \n

            If you want the code to import the file contents, and make the changes:

            \n
            with open('/path/to/my_file.txt', 'r') as my_file:\n    text = my_file.read()\n    text = text.replace("[", "")\n    text = text.replace("]", "")\n\n# If you wish to save the updates back into a cleaned up file\nwith open('/path/to/my_file_clean.txt', 'w') as my_file:\n    my_file.write(text)\n
            \n soup wrap:

            If you just want to remove all instances of square brackets from a string, you can do the following:

            s = "[[ hello] [there]]"
            s = s.replace("[", "")
            s = s.replace("]", "")
            

            UPDATE:

            If you want the code to import the file contents, and make the changes:

            with open('/path/to/my_file.txt', 'r') as my_file:
                text = my_file.read()
                text = text.replace("[", "")
                text = text.replace("]", "")
            
            # If you wish to save the updates back into a cleaned up file
            with open('/path/to/my_file_clean.txt', 'w') as my_file:
                my_file.write(text)
            
            qid & accept id: (34950064, 34950147) query: String slicing with delimiter changing in length soup:

            You can use rsplit(' ', 1) on your string to split based on the last occurrence of a whitespace in your string:

            \n

            So you could do:

            \n
            x = '20.06.2009 05:00:00        2.6'\ny = '20.06.2009 06:00:00       21.5'\nitems = [x, y]\n\nvalue = 0\nfor item in items:\n    value += float(item.rsplit(' ', 1)[1])\n\nprint(value)\n
            \n

            Output

            \n
            24.1\n
            \n soup wrap:

            You can use rsplit(' ', 1) on your string to split based on the last occurrence of a whitespace in your string:

            So you could do:

            x = '20.06.2009 05:00:00        2.6'
            y = '20.06.2009 06:00:00       21.5'
            items = [x, y]
            
            value = 0
            for item in items:
                value += float(item.rsplit(' ', 1)[1])
            
            print(value)
            

            Output

            24.1
            
            qid & accept id: (34959948, 34959993) query: Parse valid JSON object or array from a string soup:

            You can locate the start of the JSON by checking the presence of { or [ and then save everything to the end of the string into a capturing group:

            \n
            >>> import re\n>>> string1 = 'bob1: The ceo of the company {"salary": 100000}'\n>>> string2 = 'bob1: The ceo of the company ["10001", "10002"]'\n>>> \n>>> re.search(r"\s([{\[].*?[}\]])$", string1).group(1)\n'{"salary": 100000}'\n>>> re.search(r"\s([{\[].*?[}\]])$", string2).group(1)\n'["10001", "10002"]'\n
            \n

            Here the \s([{\[].*?[}\]])$ breaks down to:

            \n
              \n
            • \s - a single space character
            • \n
            • parenthesis is a capturing group
            • \n
            • [{\[] would match a single { or [ (the latter needs to be escaped with a backslash)
            • \n
            • .*? is a non-greedy match for any characters any number of times
            • \n
            • [}\]] would match a single } and ] (the latter needs to be escaped with a backslash)
            • \n
            • $ means the end of the string
            • \n
            \n
            \n

            Or, you may use re.split() to split the string by a space followed by a { or [ (with a positive look ahead) and get the last item. It works for the sample input you've provided, but not sure if this is reliable in general:

            \n
            >>> re.split(r"\s(?=[{\[])", string1)[-1]\n'{"salary": 100000}'\n>>> re.split(r"\s(?=[{\[])", string2)[-1]\n'["10001", "10002"]'\n
            \n soup wrap:

            You can locate the start of the JSON by checking the presence of { or [ and then save everything to the end of the string into a capturing group:

            >>> import re
            >>> string1 = 'bob1: The ceo of the company {"salary": 100000}'
            >>> string2 = 'bob1: The ceo of the company ["10001", "10002"]'
            >>> 
            >>> re.search(r"\s([{\[].*?[}\]])$", string1).group(1)
            '{"salary": 100000}'
            >>> re.search(r"\s([{\[].*?[}\]])$", string2).group(1)
            '["10001", "10002"]'
            

            Here the \s([{\[].*?[}\]])$ breaks down to:

            • \s - a single space character
            • parenthesis is a capturing group
            • [{\[] would match a single { or [ (the latter needs to be escaped with a backslash)
            • .*? is a non-greedy match for any characters any number of times
            • [}\]] would match a single } and ] (the latter needs to be escaped with a backslash)
            • $ means the end of the string

            Or, you may use re.split() to split the string by a space followed by a { or [ (with a positive look ahead) and get the last item. It works for the sample input you've provided, but not sure if this is reliable in general:

            >>> re.split(r"\s(?=[{\[])", string1)[-1]
            '{"salary": 100000}'
            >>> re.split(r"\s(?=[{\[])", string2)[-1]
            '["10001", "10002"]'
            
            qid & accept id: (34970283, 34970325) query: What is the pythonic way to sort a list with multiple attributes, such that the first is sorted reversely but the second is not? soup:

            You can sort it twice (Python uses a stable sort that performs well on already-sorted portions):

            \n
            >>> l = [ ['a','b'], ['x','y'], ['a','y'], ['x', 'b'] ]\n>>> sorted(sorted(l, key=lambda x: x[1]), key=lambda x: x[0], reverse=True)\n[['x', 'b'], ['x', 'y'], ['a', 'b'], ['a', 'y']]\n
            \n

            Or you can use ord() to get an integer and negate it:

            \n
            >>> l = [ ['a','b'], ['x','y'], ['a','y'], ['x', 'b'] ]\n>>> sorted(l, key=lambda x: (-ord(x[0]), x[1]))\n[['x', 'b'], ['x', 'y'], ['a', 'b'], ['a', 'y']]\n
            \n soup wrap:

            You can sort it twice (Python uses a stable sort that performs well on already-sorted portions):

            >>> l = [ ['a','b'], ['x','y'], ['a','y'], ['x', 'b'] ]
            >>> sorted(sorted(l, key=lambda x: x[1]), key=lambda x: x[0], reverse=True)
            [['x', 'b'], ['x', 'y'], ['a', 'b'], ['a', 'y']]
            

            Or you can use ord() to get an integer and negate it:

            >>> l = [ ['a','b'], ['x','y'], ['a','y'], ['x', 'b'] ]
            >>> sorted(l, key=lambda x: (-ord(x[0]), x[1]))
            [['x', 'b'], ['x', 'y'], ['a', 'b'], ['a', 'y']]
            
            qid & accept id: (34977252, 34987844) query: Using Numba to improve finite-differences laplacian soup:

            I am unable to replicate your results.

            \n

            Python version: 3.4.4 |Anaconda 2.4.1 (64-bit)| (default, Jan 19 2016, 12:10:59) [MSC v.1600 64 bit (AMD64)]

            \n

            numba version: 0.23.1

            \n
            import numba as nb\nimport numpy as np\n\ndef neumann_laplacian_1d(u,dx2):\n    """Return finite difference Laplacian approximation of 2d array.\n    Uses Neumann boundary conditions and a 2nd order approximation.\n    """\n    laplacian = np.zeros(u.shape)\n    laplacian[1:-1] =  ((1.0)*u[2:] \n                       +(1.0)*u[:-2]\n                       -(2.0)*u[1:-1])\n    # Neumann boundary conditions\n    # edges\n    laplacian[0]  =  ((2.0)*u[1]-(2.0)*u[0])\n    laplacian[-1] =  ((2.0)*u[-2]-(2.0)*u[-1])\n\n    return laplacian/ dx2\n\n@nb.autojit(nopython=True)\ndef neumann_laplacian_1d_numba(u,dx2):\n    """Return finite difference Laplacian approximation of 2d array.\n    Uses Neumann boundary conditions and a 2nd order approximation.\n    """\n    laplacian = np.zeros(u.shape)\n    laplacian[1:-1] =  ((1.0)*u[2:] \n                       +(1.0)*u[:-2]\n                       -(2.0)*u[1:-1])\n    # Neumann boundary conditions\n    # edges\n    laplacian[0]  =  ((2.0)*u[1]-(2.0)*u[0])\n    laplacian[-1] =  ((2.0)*u[-2]-(2.0)*u[-1])\n\n    return laplacian/ dx2\n\na = np.random.random(252)\n#run once to make the JIT do it's work before timing\nneumann_laplacian_1d_numba(a, 1.0)\n\n\n%timeit neumann_laplacian_1d(a, 1.0)\n%timeit neumann_laplacian_1d_numba(a, 1.0)\n\n>>10000 loops, best of 3: 21.5 µs per loop\n>>The slowest run took 4.49 times longer than the fastest. This could mean that an intermediate result is being cached \n>>100000 loops, best of 3: 3.53 µs per loop\n
            \n

            I see similar results for python 2.7.11 and numba 0.23

            \n
            >>100000 loops, best of 3: 19.1 µs per loop\n>>The slowest run took 8.55 times longer than the fastest. This could mean that an intermediate result is being cached \n>>100000 loops, best of 3: 2.4 µs per loop\n
            \n soup wrap:

            I am unable to replicate your results.

            Python version: 3.4.4 |Anaconda 2.4.1 (64-bit)| (default, Jan 19 2016, 12:10:59) [MSC v.1600 64 bit (AMD64)]

            numba version: 0.23.1

            import numba as nb
            import numpy as np
            
            def neumann_laplacian_1d(u,dx2):
                """Return finite difference Laplacian approximation of 2d array.
                Uses Neumann boundary conditions and a 2nd order approximation.
                """
                laplacian = np.zeros(u.shape)
                laplacian[1:-1] =  ((1.0)*u[2:] 
                                   +(1.0)*u[:-2]
                                   -(2.0)*u[1:-1])
                # Neumann boundary conditions
                # edges
                laplacian[0]  =  ((2.0)*u[1]-(2.0)*u[0])
                laplacian[-1] =  ((2.0)*u[-2]-(2.0)*u[-1])
            
                return laplacian/ dx2
            
            @nb.autojit(nopython=True)
            def neumann_laplacian_1d_numba(u,dx2):
                """Return finite difference Laplacian approximation of 2d array.
                Uses Neumann boundary conditions and a 2nd order approximation.
                """
                laplacian = np.zeros(u.shape)
                laplacian[1:-1] =  ((1.0)*u[2:] 
                                   +(1.0)*u[:-2]
                                   -(2.0)*u[1:-1])
                # Neumann boundary conditions
                # edges
                laplacian[0]  =  ((2.0)*u[1]-(2.0)*u[0])
                laplacian[-1] =  ((2.0)*u[-2]-(2.0)*u[-1])
            
                return laplacian/ dx2
            
            a = np.random.random(252)
            #run once to make the JIT do it's work before timing
            neumann_laplacian_1d_numba(a, 1.0)
            
            
            %timeit neumann_laplacian_1d(a, 1.0)
            %timeit neumann_laplacian_1d_numba(a, 1.0)
            
            >>10000 loops, best of 3: 21.5 µs per loop
            >>The slowest run took 4.49 times longer than the fastest. This could mean that an intermediate result is being cached 
            >>100000 loops, best of 3: 3.53 µs per loop
            

            I see similar results for python 2.7.11 and numba 0.23

            >>100000 loops, best of 3: 19.1 µs per loop
            >>The slowest run took 8.55 times longer than the fastest. This could mean that an intermediate result is being cached 
            >>100000 loops, best of 3: 2.4 µs per loop
            
            qid & accept id: (34980059, 34980200) query: Difference of elements to find same adjacent soup:

            here is a working code:

            \n
            numbers = [1,3,7,11,25,36,57,678,999]\ncount = sum([numbers[i] == numbers[i+1] for i in range(len(numbers)-1)])\n>>> count\n8\n
            \n

            For your example:

            \n
            data = [1,2,2,2,2,2,3,3,4,4,4,4,5,5,6,7,7,7,1,1]\nresult = sum([data[i] == data[i+1] for i in range(len(data)-1)])\n>>> result\n7\n
            \n soup wrap:

            here is a working code:

            numbers = [1,3,7,11,25,36,57,678,999]
            count = sum([numbers[i] == numbers[i+1] for i in range(len(numbers)-1)])
            >>> count
            8
            

            For your example:

            data = [1,2,2,2,2,2,3,3,4,4,4,4,5,5,6,7,7,7,1,1]
            result = sum([data[i] == data[i+1] for i in range(len(data)-1)])
            >>> result
            7
            
            qid & accept id: (34982928, 34983082) query: Return rows only if all items of category are True soup:

            Assuming that your Boolean column is really of dtype bool (and not strings), you could use groupby with transform:

            \n
            >>> df.loc[df.groupby("Category")["Boolean"].transform(all)]\n  Category Boolean\n5        C    True\n6        C    True\n8        D    True\n9        C    True\n
            \n

            which works because we have

            \n
            >>> df.groupby("Category")["Boolean"].all()\nCategory\nA    False\nB    False\nC     True\nD     True\nName: Boolean, dtype: bool\n
            \n

            and so

            \n
            >>> df.groupby("Category")["Boolean"].transform(all)\n0    False\n1    False\n2    False\n3    False\n4    False\n5     True\n6     True\n7    False\n8     True\n9     True\nName: Boolean, dtype: bool\n
            \n soup wrap:

            Assuming that your Boolean column is really of dtype bool (and not strings), you could use groupby with transform:

            >>> df.loc[df.groupby("Category")["Boolean"].transform(all)]
              Category Boolean
            5        C    True
            6        C    True
            8        D    True
            9        C    True
            

            which works because we have

            >>> df.groupby("Category")["Boolean"].all()
            Category
            A    False
            B    False
            C     True
            D     True
            Name: Boolean, dtype: bool
            

            and so

            >>> df.groupby("Category")["Boolean"].transform(all)
            0    False
            1    False
            2    False
            3    False
            4    False
            5     True
            6     True
            7    False
            8     True
            9     True
            Name: Boolean, dtype: bool
            
            qid & accept id: (35011168, 35011308) query: How to call __setattr__() correctly in Python3 as part of a class? soup:

            Call the base version via super():

            \n
            class MyClass(object):\n    myattrib = None\n\n    def __setattr__(self, prop, val):\n        super().__setattr__('myattrib', val)\n        print("setting myattrib")\n
            \n

            You probably do not want to ignore the prop argument here, it is not necessarily 'myattrib' that is being set.

            \n

            However, consider using a property instead of intercepting all attribute setting:

            \n
            class MyClass(object):\n    _myattrib = None\n\n    @property:\n    def myattrib(self):\n        return self._myattrib\n\n    @myattrib.setter\n    def myattrib(self, val):\n        self._myattrib = val\n        print("setting myattrib")\n
            \n

            I added object as a base-class; this is the default in Python 3, but a requirement for super() and property objects to work in Python 2.

            \n soup wrap:

            Call the base version via super():

            class MyClass(object):
                myattrib = None
            
                def __setattr__(self, prop, val):
                    super().__setattr__('myattrib', val)
                    print("setting myattrib")
            

            You probably do not want to ignore the prop argument here, it is not necessarily 'myattrib' that is being set.

            However, consider using a property instead of intercepting all attribute setting:

            class MyClass(object):
                _myattrib = None
            
                @property:
                def myattrib(self):
                    return self._myattrib
            
                @myattrib.setter
                def myattrib(self, val):
                    self._myattrib = val
                    print("setting myattrib")
            

            I added object as a base-class; this is the default in Python 3, but a requirement for super() and property objects to work in Python 2.

            qid & accept id: (35020513, 35020768) query: Python & Beautifulsoup web scraping - select a paragraph with a specific child tag soup:

            You could use extract to remove the em tag as follows:

            \n
            from bs4 import BeautifulSoup\n\nhtml = """
            \n

            \n val1\n val2\n

            \n

            text of no interesttext of interest

            """\n\nsoup = BeautifulSoup(html)\np = soup.find('span', attrs={'class': 'class_name_2'}).parent\np.span.em.extract()\nprint p.text\n
            \n

            This would display:

            \n
            text of interest\n
            \n soup wrap:

            You could use extract to remove the em tag as follows:

            from bs4 import BeautifulSoup
            
            html = """

            val1 val2

            text of no interesttext of interest

            """ soup = BeautifulSoup(html) p = soup.find('span', attrs={'class': 'class_name_2'}).parent p.span.em.extract() print p.text

            This would display:

            text of interest
            
            qid & accept id: (35025400, 35025538) query: Sort list with multiple criteria in python soup:

            You can use the key argument of sorted function:

            \n
            filenames = [\n    '1.0.0.0.py',\n    '0.0.0.0.py',\n    '1.1.0.0.py'\n]\n\nprint sorted(filenames, key=lambda f: map(int, f.split('.')[:-1]))\n
            \n

            Result:

            \n
            ['0.0.0.0.py', '1.0.0.0.py', '1.1.0.0.py']\n
            \n

            The lambda splits the filename into parts, removes the last part and converts the remaining ones into integers. Then sorted uses this value as the sorting criterion.

            \n soup wrap:

            You can use the key argument of sorted function:

            filenames = [
                '1.0.0.0.py',
                '0.0.0.0.py',
                '1.1.0.0.py'
            ]
            
            print sorted(filenames, key=lambda f: map(int, f.split('.')[:-1]))
            

            Result:

            ['0.0.0.0.py', '1.0.0.0.py', '1.1.0.0.py']
            

            The lambda splits the filename into parts, removes the last part and converts the remaining ones into integers. Then sorted uses this value as the sorting criterion.

            qid & accept id: (35061363, 35062061) query: convert xlsx files to xls inside folders and subfolders in Excel VBA or Python soup:

            You could do this in Python as follows. This will take all the xlsx files from a single folder and write them using the same name in xls format:

            \n
            import win32com.client as win32\nimport glob\nimport os\n\nexcel = win32.gencache.EnsureDispatch('Excel.Application')\n\nfor excel_filename in glob.glob(r'c:\excel_files_folder\*.xlsx'):\n    print excel_filename\n    wb = excel.Workbooks.Open(excel_filename)\n    wb.SaveAs(os.path.splitext(excel_filename)[0] + '.xls', FileFormat=56, ConflictResolution=2) \n\nexcel.Application.Quit()\n
            \n

            Where 56 is the format number to be used, these are listed on the Microsoft website.

            \n
            \n

            To do this on a whole directory structure, you could use os.walk as follows:

            \n
            import win32com.client as win32\nimport os\n\nexcel = win32.gencache.EnsureDispatch('Excel.Application')\n\nfor dirpath, dirnames, filenames in os.walk(r'c:\excel_files_folder'):\n    for filename in filenames:\n        name, ext = os.path.splitext(filename)\n        if ext == '.xlsx':\n            wb = excel.Workbooks.Open(os.path.join(dirpath, filename))\n            wb.DoNotPromptForConvert = True\n            wb.CheckCompatibility = False\n            excel.DisplayAlerts = False\n            wb.SaveAs(os.path.join(dirpath, name + '.xls'), FileFormat=56, ConflictResolution=2) \n\nexcel.Application.Quit()\n
            \n soup wrap:

            You could do this in Python as follows. This will take all the xlsx files from a single folder and write them using the same name in xls format:

            import win32com.client as win32
            import glob
            import os
            
            excel = win32.gencache.EnsureDispatch('Excel.Application')
            
            for excel_filename in glob.glob(r'c:\excel_files_folder\*.xlsx'):
                print excel_filename
                wb = excel.Workbooks.Open(excel_filename)
                wb.SaveAs(os.path.splitext(excel_filename)[0] + '.xls', FileFormat=56, ConflictResolution=2) 
            
            excel.Application.Quit()
            

            Where 56 is the format number to be used, these are listed on the Microsoft website.


            To do this on a whole directory structure, you could use os.walk as follows:

            import win32com.client as win32
            import os
            
            excel = win32.gencache.EnsureDispatch('Excel.Application')
            
            for dirpath, dirnames, filenames in os.walk(r'c:\excel_files_folder'):
                for filename in filenames:
                    name, ext = os.path.splitext(filename)
                    if ext == '.xlsx':
                        wb = excel.Workbooks.Open(os.path.join(dirpath, filename))
                        wb.DoNotPromptForConvert = True
                        wb.CheckCompatibility = False
                        excel.DisplayAlerts = False
                        wb.SaveAs(os.path.join(dirpath, name + '.xls'), FileFormat=56, ConflictResolution=2) 
            
            excel.Application.Quit()
            
            qid & accept id: (35083540, 35083914) query: Using a loop to make a dictionary soup:

            It sounds like you need to spend a little more time learning Python essentials.

            \n

            Anyway, here's a way to make a dictionary that handles both converting a bitstring to a character and vice versa. I just loop over range(65, 70) to keep the output small.

            \n
            from pprint import pprint\n\nbinary2ascii = {}\nfor i in range(65, 70):\n    bits = format(i, "08b")\n    char = chr(i)\n    binary2ascii[bits] = char\n    binary2ascii[char] = bits\n\npprint(binary2ascii)    \n
            \n

            output

            \n
            {'01000001': 'A',\n '01000010': 'B',\n '01000011': 'C',\n '01000100': 'D',\n '01000101': 'E',\n 'A': '01000001',\n 'B': '01000010',\n 'C': '01000011',\n 'D': '01000100',\n 'E': '01000101'}\n
            \n soup wrap:

            It sounds like you need to spend a little more time learning Python essentials.

            Anyway, here's a way to make a dictionary that handles both converting a bitstring to a character and vice versa. I just loop over range(65, 70) to keep the output small.

            from pprint import pprint
            
            binary2ascii = {}
            for i in range(65, 70):
                bits = format(i, "08b")
                char = chr(i)
                binary2ascii[bits] = char
                binary2ascii[char] = bits
            
            pprint(binary2ascii)    
            

            output

            {'01000001': 'A',
             '01000010': 'B',
             '01000011': 'C',
             '01000100': 'D',
             '01000101': 'E',
             'A': '01000001',
             'B': '01000010',
             'C': '01000011',
             'D': '01000100',
             'E': '01000101'}
            
            qid & accept id: (35104102, 35104163) query: eliminate malformed records from a large .csv file soup:

            Here's a basic example:

            \n
            num_headers = 5\nwith open("input.csv", 'r') as file_in, open("output.csv", 'w') as file_out:\n    for i, line in enumerate(file_in):\n        if len(line.split(",")) == num_headers:\n            file_out.write(line)\n        else:\n            print "line %d is malformed" % i\n
            \n

            Or using the csv module (which is more flexible for different types of CSV formatting):

            \n
            import csv\nnum_headers = 5\nwith open("input.csv", 'r') as file_in, open("output.csv", 'w') as file_out:\n    csv_in = csv.reader(file_in)\n    csv_out = csv.writer(file_out)\n    for i, row in enumerate(csv_in):\n        if len(row) == num_headers:\n            csv_out.writerow(row)\n        else:\n            print "line %d is malformed" % i\n
            \n soup wrap:

            Here's a basic example:

            num_headers = 5
            with open("input.csv", 'r') as file_in, open("output.csv", 'w') as file_out:
                for i, line in enumerate(file_in):
                    if len(line.split(",")) == num_headers:
                        file_out.write(line)
                    else:
                        print "line %d is malformed" % i
            

            Or using the csv module (which is more flexible for different types of CSV formatting):

            import csv
            num_headers = 5
            with open("input.csv", 'r') as file_in, open("output.csv", 'w') as file_out:
                csv_in = csv.reader(file_in)
                csv_out = csv.writer(file_out)
                for i, row in enumerate(csv_in):
                    if len(row) == num_headers:
                        csv_out.writerow(row)
                    else:
                        print "line %d is malformed" % i
            
            qid & accept id: (35108136, 35108379) query: Get object attribute in class based view soup:

            You could use a mixin to achieve something similar.

            \n
            class ContextMixin:\n    extra_context = {}\n\n    def get_context_data(self, **kwargs):\n        context = super(ContextMixin, self).get_context_data(**kwargs)\n        context.update(self.extra_context)\n        return context \n\nclass FooUpdate(ContextMixin, UpdateView):\n    model = Foo\n    extra_context={'page_title': 'foo-objects name should go here'}\n
            \n

            Edit: a different mixin, which feels bit hacky, but closer to what you want. I haven't tested it, but I think it should work.

            \n
            class AutoContextMixin:\n\n    def get_context_data(self, **kwargs):\n        context = super(AutoContextMixin, self).get_context_data(**kwargs)\n        for key in dir(self):\n            value = getattr(self, key)\n            if isinstance(value, str) and not key.startswith('_'):\n                context[key] = value\n        return context \n\nclass FooUpdate(AutoContextMixin, UpdateView):\n    model = Foo\n    page_title = 'foo-objects name should go here'\n
            \n soup wrap:

            You could use a mixin to achieve something similar.

            class ContextMixin:
                extra_context = {}
            
                def get_context_data(self, **kwargs):
                    context = super(ContextMixin, self).get_context_data(**kwargs)
                    context.update(self.extra_context)
                    return context 
            
            class FooUpdate(ContextMixin, UpdateView):
                model = Foo
                extra_context={'page_title': 'foo-objects name should go here'}
            

            Edit: a different mixin, which feels bit hacky, but closer to what you want. I haven't tested it, but I think it should work.

            class AutoContextMixin:
            
                def get_context_data(self, **kwargs):
                    context = super(AutoContextMixin, self).get_context_data(**kwargs)
                    for key in dir(self):
                        value = getattr(self, key)
                        if isinstance(value, str) and not key.startswith('_'):
                            context[key] = value
                    return context 
            
            class FooUpdate(AutoContextMixin, UpdateView):
                model = Foo
                page_title = 'foo-objects name should go here'
            
            qid & accept id: (35115138, 35115365) query: How do I check if a network is contained in another network in Python? soup:
            import ipaddress\n\ndef is_subnet_of(a, b):\n   """\n   Returns boolean: is `a` a subnet of `b`?\n   """\n   a = ipaddress.ip_network(a)\n   b = ipaddress.ip_network(b)\n   a_len = a.prefixlen\n   b_len = b.prefixlen\n   return a_len >= b_len and a.supernet(a_len - b_len) == b\n
            \n

            then

            \n
            is_subnet_of("10.11.12.0/24", "10.11.0.0/16")   # => True\n
            \n soup wrap:
            import ipaddress
            
            def is_subnet_of(a, b):
               """
               Returns boolean: is `a` a subnet of `b`?
               """
               a = ipaddress.ip_network(a)
               b = ipaddress.ip_network(b)
               a_len = a.prefixlen
               b_len = b.prefixlen
               return a_len >= b_len and a.supernet(a_len - b_len) == b
            

            then

            is_subnet_of("10.11.12.0/24", "10.11.0.0/16")   # => True
            
            qid & accept id: (35119291, 35119751) query: find an empty value gap in a list and allocate a group of strings soup:

            From the description of the question, this seems to be what you want, but I'm unsure why there is an empty string at the end of your sample output.

            \n

            Also, this directly modifies list0 , so feel free to change the references to list1 if you wish.

            \n
            list0 = ["text","text","","","text","","text","text","","text"]\n\n# Find the "gap" - the first consectutive empty strings\n# gap_pos remains 0 if no gap is found\ngap_pos = 0\ngap_size = 2\nfor i in range(len(list0)-gap_size):\n    if all(x == '' for x in  list0[i:i+gap_size]):\n        gap_pos = i+1\n        break # remove this if you want the last gap\n\n# Find the non-empty strings that are detected after the gap\nafter_gap = filter(lambda x : x != '', list0[gap_pos+1:])\n\n# allocate this group starting at a specific index (e.g. index 5)\nspecific_index = 5\nfor i in range(len(after_gap)):\n    allocate_at = i + specific_index\n    # Make sure not to go out-of-bounds\n    if allocate_at < len(list0):\n        list0[allocate_at] = after_gap[i]\n
            \n

            Outputs

            \n
            ['text', 'text', '', '', 'text', 'text', 'text', 'text', 'text', 'text']\n
            \n soup wrap:

            From the description of the question, this seems to be what you want, but I'm unsure why there is an empty string at the end of your sample output.

            Also, this directly modifies list0 , so feel free to change the references to list1 if you wish.

            list0 = ["text","text","","","text","","text","text","","text"]
            
            # Find the "gap" - the first consectutive empty strings
            # gap_pos remains 0 if no gap is found
            gap_pos = 0
            gap_size = 2
            for i in range(len(list0)-gap_size):
                if all(x == '' for x in  list0[i:i+gap_size]):
                    gap_pos = i+1
                    break # remove this if you want the last gap
            
            # Find the non-empty strings that are detected after the gap
            after_gap = filter(lambda x : x != '', list0[gap_pos+1:])
            
            # allocate this group starting at a specific index (e.g. index 5)
            specific_index = 5
            for i in range(len(after_gap)):
                allocate_at = i + specific_index
                # Make sure not to go out-of-bounds
                if allocate_at < len(list0):
                    list0[allocate_at] = after_gap[i]
            

            Outputs

            ['text', 'text', '', '', 'text', 'text', 'text', 'text', 'text', 'text']
            
            qid & accept id: (35153942, 35173006) query: How to run Django management commands against Google Cloud SQL soup:

            If I get it right, your app runs on App Engine (sandboxed environment) and uses Cloud SQL.

            \n

            1) Configure your database in settings.py as you can see below.

            \n
            if os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine'):\n    # Running on production App Engine, so use a Google Cloud SQL database.\n    DATABASES = {\n        'default': {\n            'ENGINE': 'django.db.backends.mysql',\n            'HOST': '/cloudsql/project-id:instance-name',\n            'NAME': 'database-name',\n            'USER': 'root',\n        }\n    }\nelif os.getenv('SETTINGS_MODE') == 'prod':\n    # Running in development, but want to access the Google Cloud SQL instance in production.\n    DATABASES = {\n        'default': {\n            'ENGINE': 'django.db.backends.mysql',\n            'INSTANCE': 'cloud-sql-instance-ip-address',\n            'NAME': 'database-name',\n            'USER': 'root',\n            'PASSWORD': 'password',\n        }\n    }\nelse:\n    # Running in development, so use a local MySQL database.\n    DATABASES = {\n        'default': {\n            'ENGINE': 'django.db.backends.mysql',\n            'NAME': 'database-name',\n            'USER': 'username',\n            'PASSWORD': 'password',\n        }\n    }\n
            \n

            2) Set environment variable SETTINGS_MODE to prod (or do not set if you want to access your local MySQL server).

            \n

            3) Run the below command from your machine.

            \n
            $ SETTINGS_MODE=prod python manage.py migrate\n
            \n

            You can find more details in App Engine documentation - Management commands and Alternate development database and settings.

            \n soup wrap:

            If I get it right, your app runs on App Engine (sandboxed environment) and uses Cloud SQL.

            1) Configure your database in settings.py as you can see below.

            if os.getenv('SERVER_SOFTWARE', '').startswith('Google App Engine'):
                # Running on production App Engine, so use a Google Cloud SQL database.
                DATABASES = {
                    'default': {
                        'ENGINE': 'django.db.backends.mysql',
                        'HOST': '/cloudsql/project-id:instance-name',
                        'NAME': 'database-name',
                        'USER': 'root',
                    }
                }
            elif os.getenv('SETTINGS_MODE') == 'prod':
                # Running in development, but want to access the Google Cloud SQL instance in production.
                DATABASES = {
                    'default': {
                        'ENGINE': 'django.db.backends.mysql',
                        'INSTANCE': 'cloud-sql-instance-ip-address',
                        'NAME': 'database-name',
                        'USER': 'root',
                        'PASSWORD': 'password',
                    }
                }
            else:
                # Running in development, so use a local MySQL database.
                DATABASES = {
                    'default': {
                        'ENGINE': 'django.db.backends.mysql',
                        'NAME': 'database-name',
                        'USER': 'username',
                        'PASSWORD': 'password',
                    }
                }
            

            2) Set environment variable SETTINGS_MODE to prod (or do not set if you want to access your local MySQL server).

            3) Run the below command from your machine.

            $ SETTINGS_MODE=prod python manage.py migrate
            

            You can find more details in App Engine documentation - Management commands and Alternate development database and settings.

            qid & accept id: (35159791, 35162006) query: Two windows: First Login after that main program soup:

            You can use Toplevel for login window and withdraw , deiconify methods to hide and show root window respectively.

            \n

            Check the following code:

            \n
            from Tkinter import *\n\n\nclass loginWindow(Toplevel):\n    def __init__(self, title, parent):\n        Toplevel.__init__(self, parent)\n        # Save parent reference to use it \n        self.parent = parent\n        self.parent.title(u"Geometry Calc - Login")\n        Button(self, text="Login", command=self.login).pack()\n\n    def login(self):\n\n        access =  True # Used to test if a user can login.\n\n        if access:\n            # Close Toplevel window and show root window\n            self.destroy()\n            self.parent.deiconify()\n        else:\n            self.parent.quit()\n\n\n\nclass main(Tk):\n    def __init__(self, *args, **kwargs):\n        Tk.__init__(self, *args, **kwargs)\n        self.title(u"Geometry Calc")  # Nadpis\n        self.geometry("695x935")  # Rozmery v px\n        self.config(background="white")\n        self.resizable(width=FALSE, height=FALSE)  # Zakážeme změnu rozměrů uživatelem - zatím..\n\n        menubar = Menu(self)\n\n        helpmenu = Menu(menubar, tearoff=0)\n        helpmenu.add_command(label="Konec", command=self.quit)\n        menubar.add_cascade(label="Soubor", menu=helpmenu)\n        helpmenu = Menu(menubar, tearoff=0)\n        helpmenu.add_command(label="O programu", command=self.createAbout)\n        menubar.add_cascade(label="Pomoc", menu=helpmenu)\n        self.config(menu=menubar)\n\n        canvas = Canvas(self, width=691, height=900)\n        canvas.pack(expand=1, fill=BOTH)\n\n        # Hide root window\n        self.withdraw()\n\n        # Lunch login window\n        loginWindow('Frame', self)\n\n\n    def createAbout(self):\n        pass\n\n    def quit(self):\n        self.destroy()\n\n\n\napp = main()\n\napp.mainloop()\n
            \n

            If youn want to use 2 Toplevel windows for login and main app, root window should be hidden:

            \n
            class loginWindow(Toplevel):\n    def __init__(self, title, parent):\n        Toplevel.__init__(self, parent)\n        ...\n\n    def login(self):\n       if access:\n            # Close Toplevel window and lunch root window\n            self.destroy()\n            main()\n\n\n\nclass main(Toplevel):\n    def __init__(self, *args, **kwargs):\n        Toplevel.__init__(self, *args, **kwargs)\n        ...\n\n\n\nroot = Tk()\nroot.withdraw()\n\nloginWindow('title', root)\n\nroot.mainloop()  \n
            \n soup wrap:

            You can use Toplevel for login window and withdraw , deiconify methods to hide and show root window respectively.

            Check the following code:

            from Tkinter import *
            
            
            class loginWindow(Toplevel):
                def __init__(self, title, parent):
                    Toplevel.__init__(self, parent)
                    # Save parent reference to use it 
                    self.parent = parent
                    self.parent.title(u"Geometry Calc - Login")
                    Button(self, text="Login", command=self.login).pack()
            
                def login(self):
            
                    access =  True # Used to test if a user can login.
            
                    if access:
                        # Close Toplevel window and show root window
                        self.destroy()
                        self.parent.deiconify()
                    else:
                        self.parent.quit()
            
            
            
            class main(Tk):
                def __init__(self, *args, **kwargs):
                    Tk.__init__(self, *args, **kwargs)
                    self.title(u"Geometry Calc")  # Nadpis
                    self.geometry("695x935")  # Rozmery v px
                    self.config(background="white")
                    self.resizable(width=FALSE, height=FALSE)  # Zakážeme změnu rozměrů uživatelem - zatím..
            
                    menubar = Menu(self)
            
                    helpmenu = Menu(menubar, tearoff=0)
                    helpmenu.add_command(label="Konec", command=self.quit)
                    menubar.add_cascade(label="Soubor", menu=helpmenu)
                    helpmenu = Menu(menubar, tearoff=0)
                    helpmenu.add_command(label="O programu", command=self.createAbout)
                    menubar.add_cascade(label="Pomoc", menu=helpmenu)
                    self.config(menu=menubar)
            
                    canvas = Canvas(self, width=691, height=900)
                    canvas.pack(expand=1, fill=BOTH)
            
                    # Hide root window
                    self.withdraw()
            
                    # Lunch login window
                    loginWindow('Frame', self)
            
            
                def createAbout(self):
                    pass
            
                def quit(self):
                    self.destroy()
            
            
            
            app = main()
            
            app.mainloop()
            

            If youn want to use 2 Toplevel windows for login and main app, root window should be hidden:

            class loginWindow(Toplevel):
                def __init__(self, title, parent):
                    Toplevel.__init__(self, parent)
                    ...
            
                def login(self):
                   if access:
                        # Close Toplevel window and lunch root window
                        self.destroy()
                        main()
            
            
            
            class main(Toplevel):
                def __init__(self, *args, **kwargs):
                    Toplevel.__init__(self, *args, **kwargs)
                    ...
            
            
            
            root = Tk()
            root.withdraw()
            
            loginWindow('title', root)
            
            root.mainloop()  
            
            qid & accept id: (35164333, 35164894) query: Efficient way of creating a permutated 2D array with a range of integers soup:

            A numpy solution could be :

            \n
            X,Y = np.meshgrid(np.arange(0,100), np.arange(0,100))\nresult = np.vstack((Y.ravel(), X.ravel())).T\nresult\n# array([[ 0,  0],\n#        [ 0,  1],\n#        [ 0,  2],\n#           ..., \n
            \n

            Which seems significantly faster than the python way :

            \n
            In [3]: %%timeit\n   ...: X,Y = np.meshgrid(np.arange(0,100), np.arange(0,100))\n   ...: result = np.vstack((Y.ravel(), X.ravel())).T\n   ...: \n10000 loops, best of 3: 109 µs per loop\n\nIn [4]: %%timeit\n   ...: N = 100\n   ...: result = np.array([[x, y] for x in range(N) for y in range(N)])\n   ...: \n100 loops, best of 3: 6.54 ms per loop\n\nIn [7]: %timeit result = list(itertools.product(range(100),repeat=2))\n1000 loops, best of 3: 521 µs per loop\n
            \n soup wrap:

            A numpy solution could be :

            X,Y = np.meshgrid(np.arange(0,100), np.arange(0,100))
            result = np.vstack((Y.ravel(), X.ravel())).T
            result
            # array([[ 0,  0],
            #        [ 0,  1],
            #        [ 0,  2],
            #           ..., 
            

            Which seems significantly faster than the python way :

            In [3]: %%timeit
               ...: X,Y = np.meshgrid(np.arange(0,100), np.arange(0,100))
               ...: result = np.vstack((Y.ravel(), X.ravel())).T
               ...: 
            10000 loops, best of 3: 109 µs per loop
            
            In [4]: %%timeit
               ...: N = 100
               ...: result = np.array([[x, y] for x in range(N) for y in range(N)])
               ...: 
            100 loops, best of 3: 6.54 ms per loop
            
            In [7]: %timeit result = list(itertools.product(range(100),repeat=2))
            1000 loops, best of 3: 521 µs per loop
            
            qid & accept id: (35197854, 35197989) query: Python keyword arguments unpack and return dictionary soup:

            If that way is suitable for you, use kwargs (see Understanding kwargs in Python) as in code snippet below:

            \n
            def generate_student_dict(self, **kwargs):            \n     return kwargs\n
            \n

            Otherwise, you can create a copy of params with built-in locals() at function start and return that copy:

            \n
            def generate_student_dict(first_name=None, last_name=None , birthday=None, gender =None):\n     # It's important to copy locals in first line of code (see @MuhammadTahir comment).\n     args_passed = locals().copy()\n     # some code\n     return args_passed\n\ngenerate_student_dict()\n
            \n soup wrap:

            If that way is suitable for you, use kwargs (see Understanding kwargs in Python) as in code snippet below:

            def generate_student_dict(self, **kwargs):            
                 return kwargs
            

            Otherwise, you can create a copy of params with built-in locals() at function start and return that copy:

            def generate_student_dict(first_name=None, last_name=None , birthday=None, gender =None):
                 # It's important to copy locals in first line of code (see @MuhammadTahir comment).
                 args_passed = locals().copy()
                 # some code
                 return args_passed
            
            generate_student_dict()
            
            qid & accept id: (35199556, 35199683) query: call function through variable or without parentheses in python soup:

            Python prefers you to be explicit; if you want to re-calculate an expression, you have to call it. But if you really want this to work in the Python interactive interpreter, you'd have to hack it.

            \n

            You are only echoing a variable, not executing an expression. The variable is not going to change just because you asked the interactive interpreter to echo it.

            \n

            That is, unless you hook into the echoing mechanism. You can do so with overriding the __repr__ method:

            \n
            class EvaluatingName(object):\n    def __init__(self, callable):\n        self._callable = callable\n    def __call__(self):\n        return self._callable()\n    def __repr__(self):\n        return repr(self())\n\nls = EvaluatingName(os.getcwd)\n
            \n

            Demo:

            \n
            >>> import os\n>>> class EvaluatingName(object):\n...     def __init__(self, callable):\n...         self._callable = callable\n...     def __call__(self):\n...         return self._callable()\n...     def __repr__(self):\n...         return repr(self())\n...\n>>> ls = EvaluatingName(os.getcwd)\n>>> os.chdir('/')\n>>> ls\n'/'\n>>> os.chdir('/tmp')\n>>> ls\n'/private/tmp'\n
            \n

            This now works because each time an expression produces a value other than None, that value is echoed, and echoing calls repr() on the object.

            \n

            Note that this will not work outside the interactive interpreter or printing. In other contexts (say, a script) you probably have to convert the object to a string each time. You can't use it as an argument to a function that expects a string, for example.

            \n

            This will work:

            \n
            os.path.join(ls(), 'foo.txt')  # produce the value first\n
            \n

            but this will not:

            \n
            os.path.join(ls, 'foo.txt')    # throws an AttributeError.\n
            \n soup wrap:

            Python prefers you to be explicit; if you want to re-calculate an expression, you have to call it. But if you really want this to work in the Python interactive interpreter, you'd have to hack it.

            You are only echoing a variable, not executing an expression. The variable is not going to change just because you asked the interactive interpreter to echo it.

            That is, unless you hook into the echoing mechanism. You can do so with overriding the __repr__ method:

            class EvaluatingName(object):
                def __init__(self, callable):
                    self._callable = callable
                def __call__(self):
                    return self._callable()
                def __repr__(self):
                    return repr(self())
            
            ls = EvaluatingName(os.getcwd)
            

            Demo:

            >>> import os
            >>> class EvaluatingName(object):
            ...     def __init__(self, callable):
            ...         self._callable = callable
            ...     def __call__(self):
            ...         return self._callable()
            ...     def __repr__(self):
            ...         return repr(self())
            ...
            >>> ls = EvaluatingName(os.getcwd)
            >>> os.chdir('/')
            >>> ls
            '/'
            >>> os.chdir('/tmp')
            >>> ls
            '/private/tmp'
            

            This now works because each time an expression produces a value other than None, that value is echoed, and echoing calls repr() on the object.

            Note that this will not work outside the interactive interpreter or printing. In other contexts (say, a script) you probably have to convert the object to a string each time. You can't use it as an argument to a function that expects a string, for example.

            This will work:

            os.path.join(ls(), 'foo.txt')  # produce the value first
            

            but this will not:

            os.path.join(ls, 'foo.txt')    # throws an AttributeError.
            
            qid & accept id: (35205400, 35210735) query: How to randomly pick numbers from ranked groups in python, to create a list of specific length soup:

            you have four forced choices, then two free choices. set is a good help here.

            \n
            from random import choice\na = [1,2,3]\nb = [9]\nc = [5,6]\nd = [11,12,4]\n\nl=a+b+c+d #ordered candidates\n\ndef select():\n    e=set(l)\n    for s in (a,b,c,d,e,e):              # 4 forced choices and 2 frees.\n        e.remove(choice(tuple(s)))       # sets have no index.\n    return [x for x in l if x not in e]\n
            \n

            10 samples :

            \n
            >>> for _ in range(10) : print (select())\n[1, 9, 5, 11, 12, 4]\n[1, 3, 9, 6, 11, 4]\n[1, 3, 9, 5, 6, 12]\n[1, 2, 9, 6, 11, 4]\n[1, 2, 9, 5, 6, 4]\n[2, 9, 5, 6, 11, 4]\n[1, 2, 9, 5, 11, 12]\n[1, 3, 9, 6, 11, 12]\n[3, 9, 6, 11, 12, 4]\n[1, 2, 9, 5, 12, 4]\n
            \n soup wrap:

            you have four forced choices, then two free choices. set is a good help here.

            from random import choice
            a = [1,2,3]
            b = [9]
            c = [5,6]
            d = [11,12,4]
            
            l=a+b+c+d #ordered candidates
            
            def select():
                e=set(l)
                for s in (a,b,c,d,e,e):              # 4 forced choices and 2 frees.
                    e.remove(choice(tuple(s)))       # sets have no index.
                return [x for x in l if x not in e]
            

            10 samples :

            >>> for _ in range(10) : print (select())
            [1, 9, 5, 11, 12, 4]
            [1, 3, 9, 6, 11, 4]
            [1, 3, 9, 5, 6, 12]
            [1, 2, 9, 6, 11, 4]
            [1, 2, 9, 5, 6, 4]
            [2, 9, 5, 6, 11, 4]
            [1, 2, 9, 5, 11, 12]
            [1, 3, 9, 6, 11, 12]
            [3, 9, 6, 11, 12, 4]
            [1, 2, 9, 5, 12, 4]
            
            qid & accept id: (35208832, 35209389) query: TensorFlow Resize image tensor to dynamic shape soup:

            The way to do this is to use the (currently experimental, but available in the next release) tf.cond()* operator. This operator is able to test a value computed at runtime, and execute one of two branches based on that value.

            \n
            shape = tf.shape(image)\nheight = shape[0]\nwidth = shape[1]\nnew_shorter_edge = 400\nheight_smaller_than_width = tf.less_equal(height, width)\n\nnew_shorter_edge = tf.constant(400)\nnew_height, new_width = tf.cond(\n    height_smaller_than_width,\n    lambda: new_shorter_edge, (width / height) * new_shorter_edge,\n    lambda: new_shorter_edge, (height / width) * new_shorter_edge)\n
            \n

            Now you have Tensor values for new_height and new_width that will take the appropriate value at runtime.

            \n
            \n

            *  To access the operator in the current released version, you'll need to import the following:

            \n
            from tensorflow.python.ops import control_flow_ops\n
            \n

            ...and then use control_flow_ops.cond() instead of tf.cond().

            \n soup wrap:

            The way to do this is to use the (currently experimental, but available in the next release) tf.cond()* operator. This operator is able to test a value computed at runtime, and execute one of two branches based on that value.

            shape = tf.shape(image)
            height = shape[0]
            width = shape[1]
            new_shorter_edge = 400
            height_smaller_than_width = tf.less_equal(height, width)
            
            new_shorter_edge = tf.constant(400)
            new_height, new_width = tf.cond(
                height_smaller_than_width,
                lambda: new_shorter_edge, (width / height) * new_shorter_edge,
                lambda: new_shorter_edge, (height / width) * new_shorter_edge)
            

            Now you have Tensor values for new_height and new_width that will take the appropriate value at runtime.


            *  To access the operator in the current released version, you'll need to import the following:

            from tensorflow.python.ops import control_flow_ops
            

            ...and then use control_flow_ops.cond() instead of tf.cond().

            qid & accept id: (35208997, 35211471) query: Running blocks of code inside vim soup:

            First you should make a function to get your visually selected text. I brought it from https://stackoverflow.com/a/6271254/3108885:

            \n
            function! s:GetVisualSelection()\n  let [lnum1, col1] = getpos("'<")[1:2]\n  let [lnum2, col2] = getpos("'>")[1:2]\n  let lines = getline(lnum1, lnum2)\n  let lines[-1] = lines[-1][:col2 - (&selection == 'inclusive' ? 1 : 2)]\n  let lines[0] = lines[0][col1 - 1:]\n  return join(lines, "\n")\nendfunction\n
            \n

            Then, add an autocmd for Visual mode.

            \n
            autocmd FileType python vnoremap   :exec '!clear; python -c' shellescape(GetVisualSelection(), 1)\n
            \n

            Note that is for cleaning '<,'> things when : is pressed on Visual mode. Also we use python -c to pass a program as a string.

            \n soup wrap:

            First you should make a function to get your visually selected text. I brought it from https://stackoverflow.com/a/6271254/3108885:

            function! s:GetVisualSelection()
              let [lnum1, col1] = getpos("'<")[1:2]
              let [lnum2, col2] = getpos("'>")[1:2]
              let lines = getline(lnum1, lnum2)
              let lines[-1] = lines[-1][:col2 - (&selection == 'inclusive' ? 1 : 2)]
              let lines[0] = lines[0][col1 - 1:]
              return join(lines, "\n")
            endfunction
            

            Then, add an autocmd for Visual mode.

            autocmd FileType python vnoremap   :exec '!clear; python -c' shellescape(GetVisualSelection(), 1)
            

            Note that is for cleaning '<,'> things when : is pressed on Visual mode. Also we use python -c to pass a program as a string.

            qid & accept id: (35209114, 35209494) query: Fastest way to remove subsets of lists from a list in Python soup:

            I don't know if it is faster but this is easier to read (to me anyway):

            \n
            sets={frozenset(e) for e in fruits}  \nus=set()\nwhile sets:\n    e=sets.pop()\n    if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):\n        continue\n    else:\n        us.add(e)   \n
            \n
            \n

            Update

            \n

            It is fast. Faster still is to use a for loop. Check timings:

            \n
            fruits = [['apple', 'pear'],\n        ['apple', 'pear', 'banana'],\n        ['banana', 'pear'],\n        ['pear', 'pineapple'],\n        ['apple', 'pear', 'banana', 'watermelon']]\n\nfrom itertools import imap, ifilter, compress    \n\ndef f1():              \n    sets={frozenset(e) for e in fruits}  \n    us=[]\n    while sets:\n        e=sets.pop()\n        if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):\n            continue\n        else:\n            us.append(list(e))   \n    return us           \n\ndef f2():\n    supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)\n    new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))\n    return new_list\n\ndef f3():\n    return filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)\n\ndef f4():              \n    sets={frozenset(e) for e in fruits}  \n    us=[]\n    for e in sets:\n        if any(e < s for s in sets):\n            continue\n        else:\n            us.append(list(e))   \n    return us              \n\nif __name__=='__main__':\n    import timeit     \n    for f in (f1, f2, f3, f4):\n        print f.__name__, timeit.timeit("f()", setup="from __main__ import f, fruits"), f()  \n
            \n

            On my machine on Python 2.7:

            \n
            f1 8.09958791733 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]\nf2 15.5085151196 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]\nf3 11.9473619461 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]\nf4 5.87942910194 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]\n
            \n soup wrap:

            I don't know if it is faster but this is easier to read (to me anyway):

            sets={frozenset(e) for e in fruits}  
            us=set()
            while sets:
                e=sets.pop()
                if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
                    continue
                else:
                    us.add(e)   
            

            Update

            It is fast. Faster still is to use a for loop. Check timings:

            fruits = [['apple', 'pear'],
                    ['apple', 'pear', 'banana'],
                    ['banana', 'pear'],
                    ['pear', 'pineapple'],
                    ['apple', 'pear', 'banana', 'watermelon']]
            
            from itertools import imap, ifilter, compress    
            
            def f1():              
                sets={frozenset(e) for e in fruits}  
                us=[]
                while sets:
                    e=sets.pop()
                    if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
                        continue
                    else:
                        us.append(list(e))   
                return us           
            
            def f2():
                supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)
                new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
                return new_list
            
            def f3():
                return filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)
            
            def f4():              
                sets={frozenset(e) for e in fruits}  
                us=[]
                for e in sets:
                    if any(e < s for s in sets):
                        continue
                    else:
                        us.append(list(e))   
                return us              
            
            if __name__=='__main__':
                import timeit     
                for f in (f1, f2, f3, f4):
                    print f.__name__, timeit.timeit("f()", setup="from __main__ import f, fruits"), f()  
            

            On my machine on Python 2.7:

            f1 8.09958791733 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]
            f2 15.5085151196 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
            f3 11.9473619461 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
            f4 5.87942910194 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]
            
            qid & accept id: (35220031, 35223681) query: retrive minimum maximum values of a ctype soup:

            How many such strings are there? It might be quickest to define them by hand yourself:

            \n
            class MaxVal:\n    SI8  = 2 ** 7  - 1\n    UI8  = 2 ** 8  - 1\n    SI16 = 2 ** 15 - 1\n    UI16 = 2 ** 16 - 1\n    SI32 = 2 ** 31 - 1\n    UI32 = 2 ** 32 - 1\n    SI64 = 2 ** 63 - 1\n    UI64 = 2 ** 64 - 1\n
            \n

            Or you could put them in a dict. I put them in static attributes of a class because it's about the neatest way of letting you refer to them in a way that "feels like" enum constants:

            \n
            print( MaxVal.UI32 )\n
            \n

            More programmatically, if your type string is a variable, you could use it like this:

            \n
            dt = 'UI32'\nprint( getattr(MaxVal, dt) )\n
            \n

            The corresponding MinVal definitions are left as an exercise for the reader..

            \n soup wrap:

            How many such strings are there? It might be quickest to define them by hand yourself:

            class MaxVal:
                SI8  = 2 ** 7  - 1
                UI8  = 2 ** 8  - 1
                SI16 = 2 ** 15 - 1
                UI16 = 2 ** 16 - 1
                SI32 = 2 ** 31 - 1
                UI32 = 2 ** 32 - 1
                SI64 = 2 ** 63 - 1
                UI64 = 2 ** 64 - 1
            

            Or you could put them in a dict. I put them in static attributes of a class because it's about the neatest way of letting you refer to them in a way that "feels like" enum constants:

            print( MaxVal.UI32 )
            

            More programmatically, if your type string is a variable, you could use it like this:

            dt = 'UI32'
            print( getattr(MaxVal, dt) )
            

            The corresponding MinVal definitions are left as an exercise for the reader..

            qid & accept id: (35232897, 35232999) query: Grouping in Python soup:

            As Padric said in his comment, itertools.groupby() needs ordered data to do what you want. The simplest solution (as in least code edits) would be:

            \n
            import itertools\n\nkey_func = lambda item: item["teamID"]\n\nfor key, group in itertools.groupby(sorted(batting, key=key_func), key_func):\n    print key, sum([item["R"] for item in group])\n
            \n

            If your data is relatively big, you may want to consider something more efficient that doesn't require a duplicate sorted copy in memory. defaultdict mentioned in the comment may be a good choice.

            \n
            from collections import defaultdict\n\nd = defaultdict(int)\n\nfor item in batting:\n  d[item['teamID']] += item.get('R', 0) or 0\n\nfor team, r_sum in sorted(d.items(), key=lambda x: x[0]):\n  print team, r_sum\n
            \n

            The code may need slight adjustments for Python 3.

            \n soup wrap:

            As Padric said in his comment, itertools.groupby() needs ordered data to do what you want. The simplest solution (as in least code edits) would be:

            import itertools
            
            key_func = lambda item: item["teamID"]
            
            for key, group in itertools.groupby(sorted(batting, key=key_func), key_func):
                print key, sum([item["R"] for item in group])
            

            If your data is relatively big, you may want to consider something more efficient that doesn't require a duplicate sorted copy in memory. defaultdict mentioned in the comment may be a good choice.

            from collections import defaultdict
            
            d = defaultdict(int)
            
            for item in batting:
              d[item['teamID']] += item.get('R', 0) or 0
            
            for team, r_sum in sorted(d.items(), key=lambda x: x[0]):
              print team, r_sum
            

            The code may need slight adjustments for Python 3.

            qid & accept id: (35234823, 35235113) query: Python Find n words before and after a certain words soup:

            Not sure if this is the best solution, but maybe it's enough to help you.

            \n
            import re\nimport numpy\n\n# open the file? \ntest_string = " a lot of text read from file ... Department of Something is called (DoS) and then more texts and more text..."\nregex_acronym = r'\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?'\n\nra = re.compile(regex_acronym)\nfor m in ra.finditer(test_string):\n    print m.start(), m.group(), m.span()\n    n = len(m.group()) * 2\n    regex_pre_post = r"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,%d})(" % n\n    regex_pre_post += regex_acronym \n    regex_pre_post += ")((?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,%d})" % n\n    found= re.findall(regex_pre_post, test_string)\n    print found\n\n    found = found[0] # For a single match, just do this.\n    pre = found[0]\n    acro = found[1]\n    post = found[2]\n    print pre, acro, post\n
            \n

            Will give you:

            \n
            69 DoS (69, 72)\n[('file ... Department of Something is called (', 'DoS', ') and then more texts and more')]\nfile ... Department of Something is called ( DoS ) and then more texts and more\n
            \n soup wrap:

            Not sure if this is the best solution, but maybe it's enough to help you.

            import re
            import numpy
            
            # open the file? 
            test_string = " a lot of text read from file ... Department of Something is called (DoS) and then more texts and more text..."
            regex_acronym = r'\b[A-Z][a-zA-Z\.]*[A-Z]\b\.?'
            
            ra = re.compile(regex_acronym)
            for m in ra.finditer(test_string):
                print m.start(), m.group(), m.span()
                n = len(m.group()) * 2
                regex_pre_post = r"((?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,%d})(" % n
                regex_pre_post += regex_acronym 
                regex_pre_post += ")((?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,%d})" % n
                found= re.findall(regex_pre_post, test_string)
                print found
            
                found = found[0] # For a single match, just do this.
                pre = found[0]
                acro = found[1]
                post = found[2]
                print pre, acro, post
            

            Will give you:

            69 DoS (69, 72)
            [('file ... Department of Something is called (', 'DoS', ') and then more texts and more')]
            file ... Department of Something is called ( DoS ) and then more texts and more
            
            qid & accept id: (35242055, 35242147) query: Getting crawled information in dictionary format soup:

            I'd suggest:

            \n
            dict(i.strip().split('\n') for i in text.split('\n\n') if len(i.strip().split('\n')) == 2)\n
            \n

            Output:

            \n
            {'Job ID': 'EE-1213256', \n 'Manages Others': 'Not Specified', \n 'Job Type': 'Information Technology,  Engineering,  Professional Services', \n 'Relocation': 'No', \n 'Education': '4 Year Degree', \n 'Base Pay': '$140,000.00 - $160,000.00 /Year', \n 'Experience': 'At least 5 year(s)', \n 'Industry': 'Computer Software, Banking - Financial Services, Biotechnology', \n 'Employment Type': 'Full-Time', \n 'Required Travel': 'Not Specified'}\n
            \n soup wrap:

            I'd suggest:

            dict(i.strip().split('\n') for i in text.split('\n\n') if len(i.strip().split('\n')) == 2)
            

            Output:

            {'Job ID': 'EE-1213256', 
             'Manages Others': 'Not Specified', 
             'Job Type': 'Information Technology,  Engineering,  Professional Services', 
             'Relocation': 'No', 
             'Education': '4 Year Degree', 
             'Base Pay': '$140,000.00 - $160,000.00 /Year', 
             'Experience': 'At least 5 year(s)', 
             'Industry': 'Computer Software, Banking - Financial Services, Biotechnology', 
             'Employment Type': 'Full-Time', 
             'Required Travel': 'Not Specified'}
            
            qid & accept id: (35252265, 35252866) query: matching between two columns and taking value from another in pandas soup:

            Solution

            \n

            This works:

            \n
            ids = df.groupby(['FName', 'LName']).id.apply(lambda x: list(x)[-1])\ndf.Usedid = df.apply(lambda x: int(ids[x.UsedFName, x.UsedLName]), axis=1)\n
            \n

            Explanation

            \n

            First we find ids for the FName and LName:

            \n
            ids = df.groupby(['FName', 'LName']).id.apply(lambda x: list(x)[-1])\n
            \n

            They look like this:

            \n
            FName        LName   \nAndreas      Kai         2006\nConstantine  Pape         NaN\nKoethe       Talukdar    2005\nManual       Hausman     2005\nMax          Weber       2007\nNadia        Alam        2002\nPia          Naime       2003\nPlank        Ingo        2009\nTanvir       Hossain     2001\nWeber        Mac         2008\nName: id, dtype: float64\n
            \n

            Here groupby() groups by two columns, the first and the last names. To "see" anything, you need to "do" something with it. Let's convert all ids per group into a list:

            \n
            >>> df.groupby(['FName', 'LName']).id.apply(list)\n\nFName        LName   \nAndreas      Kai                         [2006.0]\nConstantine  Pape                      [nan, nan]\nKoethe       Talukdar    [2004.0, 2004.0, 2005.0]\nManual       Hausman             [2005.0, 2005.0]\nMax          Weber               [2007.0, 2007.0]\nNadia        Alam                [2002.0, 2002.0]\nPia          Naime       [2003.0, 2003.0, 2003.0]\nPlank        Ingo                        [2009.0]\nTanvir       Hossain             [2001.0, 2001.0]\nWeber        Mac         [2008.0, 2008.0, 2008.0]\nName: id, dtype: object\n
            \n

            Since we have NaN, the datatype is float.

            \n

            We want only the last id per group. So, instead of list() we use a lambda function:

            \n
            lambda x: list(x)[-1]\n
            \n

            In the second step we use our ids:

            \n
            df.apply(lambda x: int(ids[x.UsedFName, x.UsedLName]), axis=1)\n
            \n

            We apply a function to the dataframe going line by line (axis=1). Here x is a line. We use the values in the columns UsedFName and UsedLName to get the appropriate id and assign it to the result column with df.Usedid =.

            \n

            Output

            \n

            df looks like this:

            \n
                      FName     LName    id UsedFName UsedLName  Usedid\n0        Tanvir   Hossain  2001    Tanvir   Hossain    2001\n1         Nadia      Alam  2002    Tanvir   Hossain    2001\n2           Pia     Naime  2003    Tanvir   Hossain    2001\n3        Koethe  Talukdar  2004    Koethe  Talukdar    2005\n4        Manual   Hausman  2005    Koethe  Talukdar    2005\n5   Constantine      Pape   NaN       Max     Weber    2007\n6       Andreas       Kai  2006       Max     Weber    2007\n7           Max     Weber  2007    Manual   Hausman    2005\n8         Weber       Mac  2008    Manual   Hausman    2005\n9         Plank      Ingo  2009    Manual   Hausman    2005\n10       Tanvir   Hossain  2001       Pia     Naime    2003\n11        Weber       Mac  2008       Pia     Naime    2003\n12       Manual   Hausman  2005    Tanvir   Hossain    2001\n13          Max     Weber  2007    Tanvir   Hossain    2001\n14        Nadia      Alam  2002    Manual   Hausman    2005\n15        Weber       Mac  2008    Manual   Hausman    2005\n16          Pia     Naime  2003    Koethe  Talukdar    2005\n17          Pia     Naime  2003    Koethe  Talukdar    2005\n18  Constantine      Pape   NaN    Koethe  Talukdar    2005\n19       Koethe  Talukdar  2004    Koethe  Talukdar    2005\n20       Koethe  Talukdar  2005    Manual   Hausman    2005\n21          NaN       NaN   NaN    Manual   Hausman    2005\n22          NaN       NaN   NaN    Manual   Hausman    2005\n23          NaN       NaN   NaN    Manual   Hausman    2005\n24          NaN       NaN   NaN    Manual   Hausman    2005\n25          NaN       NaN   NaN    Manual   Hausman    2005\n26          NaN       NaN   NaN    Manual   Hausman    2005\n27          NaN       NaN   NaN    Manual   Hausman    2005\n
            \n soup wrap:

            Solution

            This works:

            ids = df.groupby(['FName', 'LName']).id.apply(lambda x: list(x)[-1])
            df.Usedid = df.apply(lambda x: int(ids[x.UsedFName, x.UsedLName]), axis=1)
            

            Explanation

            First we find ids for the FName and LName:

            ids = df.groupby(['FName', 'LName']).id.apply(lambda x: list(x)[-1])
            

            They look like this:

            FName        LName   
            Andreas      Kai         2006
            Constantine  Pape         NaN
            Koethe       Talukdar    2005
            Manual       Hausman     2005
            Max          Weber       2007
            Nadia        Alam        2002
            Pia          Naime       2003
            Plank        Ingo        2009
            Tanvir       Hossain     2001
            Weber        Mac         2008
            Name: id, dtype: float64
            

            Here groupby() groups by two columns, the first and the last names. To "see" anything, you need to "do" something with it. Let's convert all ids per group into a list:

            >>> df.groupby(['FName', 'LName']).id.apply(list)
            
            FName        LName   
            Andreas      Kai                         [2006.0]
            Constantine  Pape                      [nan, nan]
            Koethe       Talukdar    [2004.0, 2004.0, 2005.0]
            Manual       Hausman             [2005.0, 2005.0]
            Max          Weber               [2007.0, 2007.0]
            Nadia        Alam                [2002.0, 2002.0]
            Pia          Naime       [2003.0, 2003.0, 2003.0]
            Plank        Ingo                        [2009.0]
            Tanvir       Hossain             [2001.0, 2001.0]
            Weber        Mac         [2008.0, 2008.0, 2008.0]
            Name: id, dtype: object
            

            Since we have NaN, the datatype is float.

            We want only the last id per group. So, instead of list() we use a lambda function:

            lambda x: list(x)[-1]
            

            In the second step we use our ids:

            df.apply(lambda x: int(ids[x.UsedFName, x.UsedLName]), axis=1)
            

            We apply a function to the dataframe going line by line (axis=1). Here x is a line. We use the values in the columns UsedFName and UsedLName to get the appropriate id and assign it to the result column with df.Usedid =.

            Output

            df looks like this:

                      FName     LName    id UsedFName UsedLName  Usedid
            0        Tanvir   Hossain  2001    Tanvir   Hossain    2001
            1         Nadia      Alam  2002    Tanvir   Hossain    2001
            2           Pia     Naime  2003    Tanvir   Hossain    2001
            3        Koethe  Talukdar  2004    Koethe  Talukdar    2005
            4        Manual   Hausman  2005    Koethe  Talukdar    2005
            5   Constantine      Pape   NaN       Max     Weber    2007
            6       Andreas       Kai  2006       Max     Weber    2007
            7           Max     Weber  2007    Manual   Hausman    2005
            8         Weber       Mac  2008    Manual   Hausman    2005
            9         Plank      Ingo  2009    Manual   Hausman    2005
            10       Tanvir   Hossain  2001       Pia     Naime    2003
            11        Weber       Mac  2008       Pia     Naime    2003
            12       Manual   Hausman  2005    Tanvir   Hossain    2001
            13          Max     Weber  2007    Tanvir   Hossain    2001
            14        Nadia      Alam  2002    Manual   Hausman    2005
            15        Weber       Mac  2008    Manual   Hausman    2005
            16          Pia     Naime  2003    Koethe  Talukdar    2005
            17          Pia     Naime  2003    Koethe  Talukdar    2005
            18  Constantine      Pape   NaN    Koethe  Talukdar    2005
            19       Koethe  Talukdar  2004    Koethe  Talukdar    2005
            20       Koethe  Talukdar  2005    Manual   Hausman    2005
            21          NaN       NaN   NaN    Manual   Hausman    2005
            22          NaN       NaN   NaN    Manual   Hausman    2005
            23          NaN       NaN   NaN    Manual   Hausman    2005
            24          NaN       NaN   NaN    Manual   Hausman    2005
            25          NaN       NaN   NaN    Manual   Hausman    2005
            26          NaN       NaN   NaN    Manual   Hausman    2005
            27          NaN       NaN   NaN    Manual   Hausman    2005
            
            qid & accept id: (35254886, 35254960) query: Obtaining dictionary value in Python soup:

            If you just want the values of your dict, use somethink like:

            \n
            sozluk_ata = {20225: 17, 20232: 9, 20233: 22, 20234: 3, 20235: 28, 20236: 69}\n\nfor key in sozluk_ata:\n    print(key, sozluk_ata[key])\n
            \n

            which prints:

            \n
            20225 17\n20232 9\n20233 22\n20234 3\n20235 28\n20236 69\n
            \n soup wrap:

            If you just want the values of your dict, use somethink like:

            sozluk_ata = {20225: 17, 20232: 9, 20233: 22, 20234: 3, 20235: 28, 20236: 69}
            
            for key in sozluk_ata:
                print(key, sozluk_ata[key])
            

            which prints:

            20225 17
            20232 9
            20233 22
            20234 3
            20235 28
            20236 69
            
            qid & accept id: (35261899, 35262031) query: Selenium scraping with multiple urls soup:

            What you need to do is:

            \n
              \n
            • reuse the same webdriver instance - do not initialize it in the loop
            • \n
            • introduce Explicit Waits - this would definitely make the code more reliable and fast
            • \n
            \n

            Implementation:

            \n
            from selenium import webdriver\nfrom selenium.webdriver.common.by import By\nfrom selenium.webdriver.support.ui import WebDriverWait\nfrom selenium.webdriver.support import expected_conditions as EC\n\nimport pandas as pd\n\n\nurls = [\n    'http://www.oddsportal.com/hockey/austria/ebel-2014-2015/results/#/page/',\n    'http://www.oddsportal.com/hockey/austria/ebel-2013-2014/results/#/page/'\n]\n\ndata = []\n\ndriver = webdriver.PhantomJS()\ndriver.implicitly_wait(10)\nwait = WebDriverWait(driver, 10)\n\nfor url in urls:\n    for page in range(1, 8):\n        driver.get(url + str(page))\n        # wait for the page to load\n        wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#tournamentTable tr.deactivate")))\n\n        for match in driver.find_elements_by_css_selector("div#tournamentTable tr.deactivate"):\n            home, away = match.find_element_by_class_name("table-participant").text.split(" - ")\n            date = match.find_element_by_xpath(".//preceding::th[contains(@class, 'first2')][1]").text\n\n            if " - " in date:\n                date, event = date.split(" - ")\n            else:\n                event = "Not specified"\n\n            data.append({\n                "home": home.strip(),\n                "away": away.strip(),\n                "date": date.strip(),\n                "event": event.strip()\n            })\n\ndriver.close()\n\ndf = pd.DataFrame(data)\nprint(df)\n
            \n

            Prints:

            \n
                               away         date          event                home\n0              Salzburg  14 Apr 2015      Play Offs     Vienna Capitals\n1       Vienna Capitals  12 Apr 2015      Play Offs            Salzburg\n2              Salzburg  10 Apr 2015      Play Offs     Vienna Capitals\n3       Vienna Capitals  07 Apr 2015      Play Offs            Salzburg\n4       Vienna Capitals  31 Mar 2015      Play Offs         Liwest Linz\n5              Salzburg  29 Mar 2015      Play Offs          Klagenfurt\n6           Liwest Linz  29 Mar 2015      Play Offs     Vienna Capitals\n7            Klagenfurt  26 Mar 2015      Play Offs            Salzburg\n8       Vienna Capitals  26 Mar 2015      Play Offs         Liwest Linz\n9           Liwest Linz  24 Mar 2015      Play Offs     Vienna Capitals\n10             Salzburg  24 Mar 2015      Play Offs          Klagenfurt\n11           Klagenfurt  22 Mar 2015      Play Offs            Salzburg\n12      Vienna Capitals  22 Mar 2015      Play Offs         Liwest Linz\n13              Bolzano  20 Mar 2015      Play Offs         Liwest Linz\n14        Fehervar AV19  18 Mar 2015      Play Offs     Vienna Capitals\n15          Liwest Linz  17 Mar 2015      Play Offs             Bolzano\n16      Vienna Capitals  16 Mar 2015      Play Offs       Fehervar AV19\n17              Villach  15 Mar 2015      Play Offs            Salzburg\n18           Klagenfurt  15 Mar 2015      Play Offs              Znojmo\n19              Bolzano  15 Mar 2015      Play Offs         Liwest Linz\n20          Liwest Linz  13 Mar 2015      Play Offs             Bolzano\n21        Fehervar AV19  13 Mar 2015      Play Offs     Vienna Capitals\n22               Znojmo  13 Mar 2015      Play Offs          Klagenfurt\n23             Salzburg  13 Mar 2015      Play Offs             Villach\n24           Klagenfurt  10 Mar 2015      Play Offs              Znojmo\n25      Vienna Capitals  10 Mar 2015      Play Offs       Fehervar AV19\n26              Bolzano  10 Mar 2015      Play Offs         Liwest Linz\n27              Villach  10 Mar 2015      Play Offs            Salzburg\n28          Liwest Linz  08 Mar 2015      Play Offs             Bolzano\n29               Znojmo  08 Mar 2015      Play Offs          Klagenfurt\n..                  ...          ...            ...                 ...\n670       TWK Innsbruck  28 Sep 2013  Not specified              Znojmo\n671         Liwest Linz  27 Sep 2013  Not specified            Dornbirn\n672             Bolzano  27 Sep 2013  Not specified          Graz 99ers\n673          Klagenfurt  27 Sep 2013  Not specified  Olimpija Ljubljana\n674       Fehervar AV19  27 Sep 2013  Not specified            Salzburg\n675       TWK Innsbruck  27 Sep 2013  Not specified     Vienna Capitals\n676             Villach  27 Sep 2013  Not specified              Znojmo\n677            Salzburg  24 Sep 2013  Not specified  Olimpija Ljubljana\n678            Dornbirn  22 Sep 2013  Not specified       TWK Innsbruck\n679          Graz 99ers  22 Sep 2013  Not specified          Klagenfurt\n680     Vienna Capitals  22 Sep 2013  Not specified             Villach\n681       Fehervar AV19  21 Sep 2013  Not specified             Bolzano\n682            Dornbirn  20 Sep 2013  Not specified             Bolzano\n683             Villach  20 Sep 2013  Not specified          Graz 99ers\n684              Znojmo  20 Sep 2013  Not specified          Klagenfurt\n685  Olimpija Ljubljana  20 Sep 2013  Not specified         Liwest Linz\n686       Fehervar AV19  20 Sep 2013  Not specified       TWK Innsbruck\n687            Salzburg  20 Sep 2013  Not specified     Vienna Capitals\n688             Villach  15 Sep 2013  Not specified          Klagenfurt\n689         Liwest Linz  15 Sep 2013  Not specified            Dornbirn\n690     Vienna Capitals  15 Sep 2013  Not specified       Fehervar AV19\n691       TWK Innsbruck  15 Sep 2013  Not specified            Salzburg\n692          Graz 99ers  15 Sep 2013  Not specified              Znojmo\n693  Olimpija Ljubljana  14 Sep 2013  Not specified            Dornbirn\n694             Bolzano  14 Sep 2013  Not specified       Fehervar AV19\n695          Klagenfurt  13 Sep 2013  Not specified          Graz 99ers\n696              Znojmo  13 Sep 2013  Not specified            Salzburg\n697  Olimpija Ljubljana  13 Sep 2013  Not specified       TWK Innsbruck\n698             Bolzano  13 Sep 2013  Not specified     Vienna Capitals\n699         Liwest Linz  13 Sep 2013  Not specified             Villach\n\n[700 rows x 4 columns]\n
            \n soup wrap:

            What you need to do is:

            • reuse the same webdriver instance - do not initialize it in the loop
            • introduce Explicit Waits - this would definitely make the code more reliable and fast

            Implementation:

            from selenium import webdriver
            from selenium.webdriver.common.by import By
            from selenium.webdriver.support.ui import WebDriverWait
            from selenium.webdriver.support import expected_conditions as EC
            
            import pandas as pd
            
            
            urls = [
                'http://www.oddsportal.com/hockey/austria/ebel-2014-2015/results/#/page/',
                'http://www.oddsportal.com/hockey/austria/ebel-2013-2014/results/#/page/'
            ]
            
            data = []
            
            driver = webdriver.PhantomJS()
            driver.implicitly_wait(10)
            wait = WebDriverWait(driver, 10)
            
            for url in urls:
                for page in range(1, 8):
                    driver.get(url + str(page))
                    # wait for the page to load
                    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div#tournamentTable tr.deactivate")))
            
                    for match in driver.find_elements_by_css_selector("div#tournamentTable tr.deactivate"):
                        home, away = match.find_element_by_class_name("table-participant").text.split(" - ")
                        date = match.find_element_by_xpath(".//preceding::th[contains(@class, 'first2')][1]").text
            
                        if " - " in date:
                            date, event = date.split(" - ")
                        else:
                            event = "Not specified"
            
                        data.append({
                            "home": home.strip(),
                            "away": away.strip(),
                            "date": date.strip(),
                            "event": event.strip()
                        })
            
            driver.close()
            
            df = pd.DataFrame(data)
            print(df)
            

            Prints:

                               away         date          event                home
            0              Salzburg  14 Apr 2015      Play Offs     Vienna Capitals
            1       Vienna Capitals  12 Apr 2015      Play Offs            Salzburg
            2              Salzburg  10 Apr 2015      Play Offs     Vienna Capitals
            3       Vienna Capitals  07 Apr 2015      Play Offs            Salzburg
            4       Vienna Capitals  31 Mar 2015      Play Offs         Liwest Linz
            5              Salzburg  29 Mar 2015      Play Offs          Klagenfurt
            6           Liwest Linz  29 Mar 2015      Play Offs     Vienna Capitals
            7            Klagenfurt  26 Mar 2015      Play Offs            Salzburg
            8       Vienna Capitals  26 Mar 2015      Play Offs         Liwest Linz
            9           Liwest Linz  24 Mar 2015      Play Offs     Vienna Capitals
            10             Salzburg  24 Mar 2015      Play Offs          Klagenfurt
            11           Klagenfurt  22 Mar 2015      Play Offs            Salzburg
            12      Vienna Capitals  22 Mar 2015      Play Offs         Liwest Linz
            13              Bolzano  20 Mar 2015      Play Offs         Liwest Linz
            14        Fehervar AV19  18 Mar 2015      Play Offs     Vienna Capitals
            15          Liwest Linz  17 Mar 2015      Play Offs             Bolzano
            16      Vienna Capitals  16 Mar 2015      Play Offs       Fehervar AV19
            17              Villach  15 Mar 2015      Play Offs            Salzburg
            18           Klagenfurt  15 Mar 2015      Play Offs              Znojmo
            19              Bolzano  15 Mar 2015      Play Offs         Liwest Linz
            20          Liwest Linz  13 Mar 2015      Play Offs             Bolzano
            21        Fehervar AV19  13 Mar 2015      Play Offs     Vienna Capitals
            22               Znojmo  13 Mar 2015      Play Offs          Klagenfurt
            23             Salzburg  13 Mar 2015      Play Offs             Villach
            24           Klagenfurt  10 Mar 2015      Play Offs              Znojmo
            25      Vienna Capitals  10 Mar 2015      Play Offs       Fehervar AV19
            26              Bolzano  10 Mar 2015      Play Offs         Liwest Linz
            27              Villach  10 Mar 2015      Play Offs            Salzburg
            28          Liwest Linz  08 Mar 2015      Play Offs             Bolzano
            29               Znojmo  08 Mar 2015      Play Offs          Klagenfurt
            ..                  ...          ...            ...                 ...
            670       TWK Innsbruck  28 Sep 2013  Not specified              Znojmo
            671         Liwest Linz  27 Sep 2013  Not specified            Dornbirn
            672             Bolzano  27 Sep 2013  Not specified          Graz 99ers
            673          Klagenfurt  27 Sep 2013  Not specified  Olimpija Ljubljana
            674       Fehervar AV19  27 Sep 2013  Not specified            Salzburg
            675       TWK Innsbruck  27 Sep 2013  Not specified     Vienna Capitals
            676             Villach  27 Sep 2013  Not specified              Znojmo
            677            Salzburg  24 Sep 2013  Not specified  Olimpija Ljubljana
            678            Dornbirn  22 Sep 2013  Not specified       TWK Innsbruck
            679          Graz 99ers  22 Sep 2013  Not specified          Klagenfurt
            680     Vienna Capitals  22 Sep 2013  Not specified             Villach
            681       Fehervar AV19  21 Sep 2013  Not specified             Bolzano
            682            Dornbirn  20 Sep 2013  Not specified             Bolzano
            683             Villach  20 Sep 2013  Not specified          Graz 99ers
            684              Znojmo  20 Sep 2013  Not specified          Klagenfurt
            685  Olimpija Ljubljana  20 Sep 2013  Not specified         Liwest Linz
            686       Fehervar AV19  20 Sep 2013  Not specified       TWK Innsbruck
            687            Salzburg  20 Sep 2013  Not specified     Vienna Capitals
            688             Villach  15 Sep 2013  Not specified          Klagenfurt
            689         Liwest Linz  15 Sep 2013  Not specified            Dornbirn
            690     Vienna Capitals  15 Sep 2013  Not specified       Fehervar AV19
            691       TWK Innsbruck  15 Sep 2013  Not specified            Salzburg
            692          Graz 99ers  15 Sep 2013  Not specified              Znojmo
            693  Olimpija Ljubljana  14 Sep 2013  Not specified            Dornbirn
            694             Bolzano  14 Sep 2013  Not specified       Fehervar AV19
            695          Klagenfurt  13 Sep 2013  Not specified          Graz 99ers
            696              Znojmo  13 Sep 2013  Not specified            Salzburg
            697  Olimpija Ljubljana  13 Sep 2013  Not specified       TWK Innsbruck
            698             Bolzano  13 Sep 2013  Not specified     Vienna Capitals
            699         Liwest Linz  13 Sep 2013  Not specified             Villach
            
            [700 rows x 4 columns]
            
            qid & accept id: (35281863, 35283466) query: OR style permissions for DjangoRestFramework soup:

            Can you just create your own permission class and use that?\nFor example:

            \n
            from rest_framework import permissions\n\nclass HasNiceHatOrHasNicePants(permissions.BasePermission):\n    """\n    Permission to check if user has a Nice Hat or has Nice Pants.\n    If both are False do not return anything.\n    """\n    def has_permission(self, request, view):\n\n        if request.user.has_nicehat() or request.user.has_nicepants():\n            return True\n
            \n

            Then, import this new class into your view, and use it like this:

            \n
            permission_classes = (HasNiceHatOrHasNicePants,)\n
            \n
            \n

            It looks like rest_condition has the functionality that you need

            \n soup wrap:

            Can you just create your own permission class and use that? For example:

            from rest_framework import permissions
            
            class HasNiceHatOrHasNicePants(permissions.BasePermission):
                """
                Permission to check if user has a Nice Hat or has Nice Pants.
                If both are False do not return anything.
                """
                def has_permission(self, request, view):
            
                    if request.user.has_nicehat() or request.user.has_nicepants():
                        return True
            

            Then, import this new class into your view, and use it like this:

            permission_classes = (HasNiceHatOrHasNicePants,)
            

            It looks like rest_condition has the functionality that you need

            qid & accept id: (35288428, 35288540) query: How to create sub list with fixed length from given number of inputs or list in Python? soup:

            This will split your list into 2 lists of equal length (6):

            \n
            >>> my_list = [1, 'ab', '', 'No', '', 'NULL', 2, 'bc', '','Yes' ,'' ,'Null']\n>>> x = my_list[:len(my_list)//2]\n>>> y = my_list[len(my_list)//2:]\n>>> x\n[1, 'ab', '', 'No', '', 'NULL']\n>>> y\n[2, 'bc', '', 'Yes', '', 'Null']\n
            \n

            If you want to split a list to many smaller lists use:

            \n
            chunks = [my_list[x:x+size] for x in range(0, len(my_list), size)]\n
            \n

            Where size is the size of the smaller lists you want, example:

            \n
            >>> size = 2\n>>> chunks = [my_list[x:x+size] for x in range(0, len(my_list), size)]\n[[1, 'ab'], ['', 'No'], ['', 'NULL'], [2, 'bc'], ['', 'Yes'], ['', 'Null']]\n>>> for item in chunks:\n        print (item)\n[1, 'ab']\n['', 'No']\n['', 'NULL']\n[2, 'bc']\n['', 'Yes']\n['', 'Null']\n
            \n soup wrap:

            This will split your list into 2 lists of equal length (6):

            >>> my_list = [1, 'ab', '', 'No', '', 'NULL', 2, 'bc', '','Yes' ,'' ,'Null']
            >>> x = my_list[:len(my_list)//2]
            >>> y = my_list[len(my_list)//2:]
            >>> x
            [1, 'ab', '', 'No', '', 'NULL']
            >>> y
            [2, 'bc', '', 'Yes', '', 'Null']
            

            If you want to split a list to many smaller lists use:

            chunks = [my_list[x:x+size] for x in range(0, len(my_list), size)]
            

            Where size is the size of the smaller lists you want, example:

            >>> size = 2
            >>> chunks = [my_list[x:x+size] for x in range(0, len(my_list), size)]
            [[1, 'ab'], ['', 'No'], ['', 'NULL'], [2, 'bc'], ['', 'Yes'], ['', 'Null']]
            >>> for item in chunks:
                    print (item)
            [1, 'ab']
            ['', 'No']
            ['', 'NULL']
            [2, 'bc']
            ['', 'Yes']
            ['', 'Null']
            
            qid & accept id: (35293171, 35293392) query: How to make a field computed only if some condition is fulfilled in Odoo 8? soup:

            in the model

            \n
            computed_field = fields.Char(compute='comp', inverse='inv', store=True)\nboolean_field = fields.Boolean()\n\n@api.one\ndef comp(self):\n    ...\n\n@api.one\ndef inv(self):\n    ...\n
            \n

            in the view

            \n
            \n\n
            \n

            edit:

            \n

            now that your example is more clear, i'd say you should change the following:

            \n

            set value's store parameter to True instead of False and just remove the inverse which, in your case, you don't need.

            \n

            then you need another 2 fields

            \n
            value_manual = fields.Float()\nmanual = fields.Boolean(compute='_is_manual', default=True)\n\n@api.one\n@api.depends('child_ids')\ndef _is_manual(self):\n    self.manual = len(self.child_ids) == 0\n
            \n

            plus

            \n
            @api.one\n@api.depends('child_ids', 'child_ids.value')\ndef _compute_value(self):\n    if self.child_ids:\n        self.value = sum(\n            [child.value for child in self.child_ids])\n    else:\n        self.value = self.value_manual\n
            \n

            in the view:

            \n
            \n\n\n
            \n

            There could be another solution that avoid this double field, maybe using the inverse, but I am not sure.

            \n soup wrap:

            in the model

            computed_field = fields.Char(compute='comp', inverse='inv', store=True)
            boolean_field = fields.Boolean()
            
            @api.one
            def comp(self):
                ...
            
            @api.one
            def inv(self):
                ...
            

            in the view

            
            
            

            edit:

            now that your example is more clear, i'd say you should change the following:

            set value's store parameter to True instead of False and just remove the inverse which, in your case, you don't need.

            then you need another 2 fields

            value_manual = fields.Float()
            manual = fields.Boolean(compute='_is_manual', default=True)
            
            @api.one
            @api.depends('child_ids')
            def _is_manual(self):
                self.manual = len(self.child_ids) == 0
            

            plus

            @api.one
            @api.depends('child_ids', 'child_ids.value')
            def _compute_value(self):
                if self.child_ids:
                    self.value = sum(
                        [child.value for child in self.child_ids])
                else:
                    self.value = self.value_manual
            

            in the view:

            
            
            
            

            There could be another solution that avoid this double field, maybe using the inverse, but I am not sure.

            qid & accept id: (35300343, 35300720) query: Transposing dataframe and sorting soup:

            The pandas melt function is awesome.

            \n

            In:

            \n
            df = df.reset_index() #Make your index into a column\ndf = pd.melt(df, id_vars = ['index']) #Reshape data\ndf = df[df['index'] <= df['variable']].sort_values(by = 'value') #Remove duplicates, sort\ndf ['col'] = df['index'] +','+ df['variable'] #Concatenate strings\ndf = df[['col','value']] #Remove unnecessary columns\ndf = df.set_index('col') #Set strings to index\ndf\n
            \n

            Out:

            \n
                            value\ncol \nArnston,Arnston 0\nBerg,Berg       0\nCarlson,Carlson 0\nArnston,Berg    1\nArnston,Carlson 2\nBerg,Carlson    3\n
            \n soup wrap:

            The pandas melt function is awesome.

            In:

            df = df.reset_index() #Make your index into a column
            df = pd.melt(df, id_vars = ['index']) #Reshape data
            df = df[df['index'] <= df['variable']].sort_values(by = 'value') #Remove duplicates, sort
            df ['col'] = df['index'] +','+ df['variable'] #Concatenate strings
            df = df[['col','value']] #Remove unnecessary columns
            df = df.set_index('col') #Set strings to index
            df
            

            Out:

                            value
            col 
            Arnston,Arnston 0
            Berg,Berg       0
            Carlson,Carlson 0
            Arnston,Berg    1
            Arnston,Carlson 2
            Berg,Carlson    3
            
            qid & accept id: (35306419, 35306477) query: Where is the configuration information of installed packages? soup:

            Try using:

            \n
            pip show \n
            \n

            To list installed packages:

            \n
            pip list\n
            \n

            And of course most importantly:

            \n
            pip help\n
            \n soup wrap:

            Try using:

            pip show 
            

            To list installed packages:

            pip list
            

            And of course most importantly:

            pip help
            
            qid & accept id: (35318700, 35318961) query: Convert dataFrame to list soup:

            Maybe you can use iloc or loc for selecting column and then tolist:

            \n
            print df\n   a\n0  2\n1  0\n2  1\n3  0\n4  1\n5  0\n\nprint df.values\n[[2]\n [0]\n [1]\n [0]\n [1]\n [0]]\n\nprint df.iloc[:, 0].tolist()\n[2, 0, 1, 0, 1, 0]\n
            \n

            Or maybe:

            \n
            print df.values.tolist()\n[[2L], [0L], [1L], [0L], [1L], [0L]]\n\nprint df.iloc[:, 0].values.tolist()\n[2L, 0L, 1L, 0L, 1L, 0L]\n\nprint df.loc[:, 'a'].tolist()\n[2, 0, 1, 0, 1, 0]\n\nprint df['a'].tolist()\n[2, 0, 1, 0, 1, 0]\n
            \n

            But maybe you need flatten:

            \n
            print df.values.flatten()\n[2 0 1 0 1 0]\n\nprint df.iloc[:, 0].values.flatten()\n[2 0 1 0 1 0]\n
            \n soup wrap:

            Maybe you can use iloc or loc for selecting column and then tolist:

            print df
               a
            0  2
            1  0
            2  1
            3  0
            4  1
            5  0
            
            print df.values
            [[2]
             [0]
             [1]
             [0]
             [1]
             [0]]
            
            print df.iloc[:, 0].tolist()
            [2, 0, 1, 0, 1, 0]
            

            Or maybe:

            print df.values.tolist()
            [[2L], [0L], [1L], [0L], [1L], [0L]]
            
            print df.iloc[:, 0].values.tolist()
            [2L, 0L, 1L, 0L, 1L, 0L]
            
            print df.loc[:, 'a'].tolist()
            [2, 0, 1, 0, 1, 0]
            
            print df['a'].tolist()
            [2, 0, 1, 0, 1, 0]
            

            But maybe you need flatten:

            print df.values.flatten()
            [2 0 1 0 1 0]
            
            print df.iloc[:, 0].values.flatten()
            [2 0 1 0 1 0]
            
            qid & accept id: (35322452, 35579433) query: Is there a way to sandbox test execution with pytest, especially filesystem access? soup:

            After quite a bit of research I didn't find any ready-made way for pytest to run a project tests with OS-level isolation and in a disposable environment. Many approaches are possible and have advantages and disadvantages, but most of them have more moving parts that I would feel comfortable with.

            \n

            The absolute minimal (but opinionated) approach I devised is the following:

            \n
              \n
            • build a python docker image with:\n\n
                \n
              • a dedicated non-root user: pytest
              • \n
              • all project dependencies from requirements.txt
              • \n
              • the project installed in develop mode
              • \n
            • \n
            • run py.test in a container that mounts the project folder on the host as the home of pytest user
            • \n
            \n

            To implement the approach add the following Dockerfile to the top folder of the project you want to test next to the requirements.txt and setup.py files:

            \n
            FROM python:3\n\n# setup pytest user\nRUN adduser --disabled-password --gecos "" --uid 7357 pytest\nCOPY ./ /home/pytest\nWORKDIR /home/pytest\n\n# setup the python and pytest environments\nRUN pip install --upgrade pip setuptools pytest\nRUN pip install --upgrade -r requirements.txt\nRUN python setup.py develop\n\n# setup entry point\nUSER pytest\nENTRYPOINT ["py.test"]\n
            \n

            Build the image once with:

            \n
            docker build -t pytest .\n
            \n

            Run py.test inside the container mounting the project folder as volume on /home/pytest with:

            \n
            docker run --rm -it -v `pwd`:/home/pytest pytest [USUAL_PYTEST_OPTIONS]\n
            \n

            Note that -v mounts the volume as uid 1000 so host files are not writable by the pytest user with uid forced to 7357.

            \n

            Now you should be able to develop and test your project with OS-level isolation.

            \n

            Update: If you also run the test on the host you may need to remove the python and pytest caches that are not writable inside the container. On the host run:

            \n
            rm -rf .cache/ && find . -name __pycache__  | xargs rm -rf\n
            \n soup wrap:

            After quite a bit of research I didn't find any ready-made way for pytest to run a project tests with OS-level isolation and in a disposable environment. Many approaches are possible and have advantages and disadvantages, but most of them have more moving parts that I would feel comfortable with.

            The absolute minimal (but opinionated) approach I devised is the following:

            • build a python docker image with:
              • a dedicated non-root user: pytest
              • all project dependencies from requirements.txt
              • the project installed in develop mode
            • run py.test in a container that mounts the project folder on the host as the home of pytest user

            To implement the approach add the following Dockerfile to the top folder of the project you want to test next to the requirements.txt and setup.py files:

            FROM python:3
            
            # setup pytest user
            RUN adduser --disabled-password --gecos "" --uid 7357 pytest
            COPY ./ /home/pytest
            WORKDIR /home/pytest
            
            # setup the python and pytest environments
            RUN pip install --upgrade pip setuptools pytest
            RUN pip install --upgrade -r requirements.txt
            RUN python setup.py develop
            
            # setup entry point
            USER pytest
            ENTRYPOINT ["py.test"]
            

            Build the image once with:

            docker build -t pytest .
            

            Run py.test inside the container mounting the project folder as volume on /home/pytest with:

            docker run --rm -it -v `pwd`:/home/pytest pytest [USUAL_PYTEST_OPTIONS]
            

            Note that -v mounts the volume as uid 1000 so host files are not writable by the pytest user with uid forced to 7357.

            Now you should be able to develop and test your project with OS-level isolation.

            Update: If you also run the test on the host you may need to remove the python and pytest caches that are not writable inside the container. On the host run:

            rm -rf .cache/ && find . -name __pycache__  | xargs rm -rf
            
            qid & accept id: (35344237, 35344394) query: Selenium Python select the link from 3rd column from a table soup:

            Assuming you want to locate the

            \n
            view\n
            \n

            element, based on the text in

            \n
            Selenium_CRM_For_Edit_Test\n
            \n

            you can use the following axis:

            \n
            //span[contains(., "Selenium_CRM_Edit_Test")]/following::span[@class="linkhover"]\n
            \n soup wrap:

            Assuming you want to locate the

            view
            

            element, based on the text in

            Selenium_CRM_For_Edit_Test
            

            you can use the following axis:

            //span[contains(., "Selenium_CRM_Edit_Test")]/following::span[@class="linkhover"]
            
            qid & accept id: (35346425, 35346746) query: Printing inherited class in Python soup:

            First, as chepner mentions, the last line should be print(str(newbox)).

            \n

            starwarsbox has __str__ implemented, but box and character don't.

            \n

            box should look like:

            \n
                def __str__(self):\n        result = ""\n        for i in range(self.x):\n            for j in range(self.y):\n                result += '*' if i in [0, self.x - 1] or j in [0, self.y - 1] else ' '\n            result += '\n'\n        return result\n
            \n

            and character should look like:

            \n
                def __str__(self):\n        return 'Name :' + self.name + ', Occupation:' + self.occupation + ', Affiliation:' + self.affiliation + ', Species:' + self.species\n
            \n

            Compare these to your code, and see how you could implement displayCharacter and createBox using the implementations of __str__. :)

            \n soup wrap:

            First, as chepner mentions, the last line should be print(str(newbox)).

            starwarsbox has __str__ implemented, but box and character don't.

            box should look like:

                def __str__(self):
                    result = ""
                    for i in range(self.x):
                        for j in range(self.y):
                            result += '*' if i in [0, self.x - 1] or j in [0, self.y - 1] else ' '
                        result += '\n'
                    return result
            

            and character should look like:

                def __str__(self):
                    return 'Name :' + self.name + ', Occupation:' + self.occupation + ', Affiliation:' + self.affiliation + ', Species:' + self.species
            

            Compare these to your code, and see how you could implement displayCharacter and createBox using the implementations of __str__. :)

            qid & accept id: (35354005, 35354201) query: Filtering histogram edges and counts soup:
            m = 0; M = 200\nmask = [(m < edges) & (edges < M)]\n>>> edges[mask]\narray([  37.4789683 ,   87.07491593,  136.67086357,  186.2668112 ])\n
            \n

            Let's work on a smaller dataset so that it is easier to understand:

            \n
            np.random.seed(0)\nvalues = np.random.uniform(0, 100, 10)\nvalues.sort()\n>>> values\narray([ 38.34415188,  42.36547993,  43.75872113,  54.4883183 ,\n        54.88135039,  60.27633761,  64.58941131,  71.51893664,\n        89.17730008,  96.36627605])\n\n# Histogram using e.g. 10 buckets\nperc, edges = np.histogram(values, bins=10,\n                           weights=np.zeros_like(values) + 100./values.size)\n\n>>> perc\narray([ 30.,   0.,  20.,  10.,  10.,  10.,   0.,   0.,  10.,  10.])\n\n>>> edges\narray([ 38.34415188,  44.1463643 ,  49.94857672,  55.75078913,\n        61.55300155,  67.35521397,  73.15742638,  78.9596388 ,\n        84.76185122,  90.56406363,  96.36627605])\n\nm = 0; M = 50\nmask = (m <= edges) & (edges < M)\n>>> mask\narray([ True,  True,  True, False, False, False, False, False, False,\n       False, False], dtype=bool)\n\n>>> edges[mask]\narray([ 38.34415188,  44.1463643 ,  49.94857672])\n\n>>> perc[mask[:-1]][:-1]\narray([ 30.,   0.])\n\nm = 40; M = 60\nmask = (m < edges) & (edges < M)\n>>> edges[mask]\narray([ 44.1463643 ,  49.94857672,  55.75078913])\n>>> perc[mask[:-1]][:-1]\narray([  0.,  20.])\n
            \n soup wrap:
            m = 0; M = 200
            mask = [(m < edges) & (edges < M)]
            >>> edges[mask]
            array([  37.4789683 ,   87.07491593,  136.67086357,  186.2668112 ])
            

            Let's work on a smaller dataset so that it is easier to understand:

            np.random.seed(0)
            values = np.random.uniform(0, 100, 10)
            values.sort()
            >>> values
            array([ 38.34415188,  42.36547993,  43.75872113,  54.4883183 ,
                    54.88135039,  60.27633761,  64.58941131,  71.51893664,
                    89.17730008,  96.36627605])
            
            # Histogram using e.g. 10 buckets
            perc, edges = np.histogram(values, bins=10,
                                       weights=np.zeros_like(values) + 100./values.size)
            
            >>> perc
            array([ 30.,   0.,  20.,  10.,  10.,  10.,   0.,   0.,  10.,  10.])
            
            >>> edges
            array([ 38.34415188,  44.1463643 ,  49.94857672,  55.75078913,
                    61.55300155,  67.35521397,  73.15742638,  78.9596388 ,
                    84.76185122,  90.56406363,  96.36627605])
            
            m = 0; M = 50
            mask = (m <= edges) & (edges < M)
            >>> mask
            array([ True,  True,  True, False, False, False, False, False, False,
                   False, False], dtype=bool)
            
            >>> edges[mask]
            array([ 38.34415188,  44.1463643 ,  49.94857672])
            
            >>> perc[mask[:-1]][:-1]
            array([ 30.,   0.])
            
            m = 40; M = 60
            mask = (m < edges) & (edges < M)
            >>> edges[mask]
            array([ 44.1463643 ,  49.94857672,  55.75078913])
            >>> perc[mask[:-1]][:-1]
            array([  0.,  20.])
            
            qid & accept id: (35373082, 35410930) query: What's the most efficient way to accumulate dataframes in pyspark? soup:

            Assuming there is at most one row per key in each DataFrame and all keys are of primitive types you can try an union with an aggregation. Lets start with some imports and example data:

            \n
            from itertools import chain\nfrom functools import reduce\nfrom pyspark.sql.types import StructType\nfrom pyspark.sql.functions import col, lit, max\nfrom pyspark.sql import DataFrame\n\ndf1 = sc.parallelize([\n    ("U1", 0, 1), ("U2", 1, 1)\n]).toDF(["Key", "FeatureA", "FeatureB"])\n\ndf2 = sc.parallelize([\n  ("U1", 0, 0, 1)\n]).toDF(["Key", "FeatureC", "FeatureD", "FeatureE"])\n\ndf3 = sc.parallelize([("U2", 1)]).toDF(["Key", "FeatureF"])\n\ndfs = [df1, df2, df3]\n
            \n

            Next we can extract common schema:

            \n
            output_schema = StructType(\n  [df1.schema.fields[0]] + list(chain(*[df.schema.fields[1:] for df in dfs]))\n)\n
            \n

            and transform all DataFrames:

            \n
            transformed_dfs = [df.select(*[\n  lit(None).cast(c.dataType).alias(c.name) if c.name not in df.columns \n  else col(c.name)\n  for c in output_schema.fields\n]) for df in dfs]\n
            \n

            Finally an union and dummy aggregation:

            \n
            combined = reduce(DataFrame.unionAll, transformed_dfs)\nexprs = [max(c).alias(c) for c in combined.columns[1:]]\nresult = combined.repartition(col("Key")).groupBy(col("Key")).agg(*exprs)\n
            \n

            If there is more than one row per key but individual columns are still atomic you can try to replace max with collect_list / collect_set followed by explode.

            \n soup wrap:

            Assuming there is at most one row per key in each DataFrame and all keys are of primitive types you can try an union with an aggregation. Lets start with some imports and example data:

            from itertools import chain
            from functools import reduce
            from pyspark.sql.types import StructType
            from pyspark.sql.functions import col, lit, max
            from pyspark.sql import DataFrame
            
            df1 = sc.parallelize([
                ("U1", 0, 1), ("U2", 1, 1)
            ]).toDF(["Key", "FeatureA", "FeatureB"])
            
            df2 = sc.parallelize([
              ("U1", 0, 0, 1)
            ]).toDF(["Key", "FeatureC", "FeatureD", "FeatureE"])
            
            df3 = sc.parallelize([("U2", 1)]).toDF(["Key", "FeatureF"])
            
            dfs = [df1, df2, df3]
            

            Next we can extract common schema:

            output_schema = StructType(
              [df1.schema.fields[0]] + list(chain(*[df.schema.fields[1:] for df in dfs]))
            )
            

            and transform all DataFrames:

            transformed_dfs = [df.select(*[
              lit(None).cast(c.dataType).alias(c.name) if c.name not in df.columns 
              else col(c.name)
              for c in output_schema.fields
            ]) for df in dfs]
            

            Finally an union and dummy aggregation:

            combined = reduce(DataFrame.unionAll, transformed_dfs)
            exprs = [max(c).alias(c) for c in combined.columns[1:]]
            result = combined.repartition(col("Key")).groupBy(col("Key")).agg(*exprs)
            

            If there is more than one row per key but individual columns are still atomic you can try to replace max with collect_list / collect_set followed by explode.

            qid & accept id: (35389648, 35389969) query: Convert empty dictionary to empty string soup:

            You can do it with the shortest way as below, since the empty dictionary is False, and do it through Boolean Operators.

            \n
            >>> d = {}\n>>> str(d or '')\n''\n
            \n

            Or without str

            \n
            >>> d = {}\n>>> d or ''\n''\n
            \n

            If d is not an empty dictionary, convert it to string with str()

            \n
            >>> d['f'] = 12\n>>> str(d or '')\n"{'f': 12}"\n
            \n soup wrap:

            You can do it with the shortest way as below, since the empty dictionary is False, and do it through Boolean Operators.

            >>> d = {}
            >>> str(d or '')
            ''
            

            Or without str

            >>> d = {}
            >>> d or ''
            ''
            

            If d is not an empty dictionary, convert it to string with str()

            >>> d['f'] = 12
            >>> str(d or '')
            "{'f': 12}"
            
            qid & accept id: (35428388, 35428831) query: Extracting a feature by feature name in scikit dict vectorizer soup:

            Features with DictVectorizer are mapped to numpy arrays, which represents the feature as NxM numerical matrix (dictionary is lost). However, the class DictVectorizer preserves the mapping function internally, and you can recover it using .inverse_transform. From the documentation of DictVectorizer:

            \n
            from sklearn.feature_extraction import DictVectorizer\n>>> v = DictVectorizer(sparse=False)\n>>> D = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}]\n>>> X = v.fit_transform(D)\n>>> X\narray([[ 2.,  0.,  1.],\n       [ 0.,  1.,  3.]])\n>>> v.inverse_transform(X) == [{'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0}]\nTrue\n
            \n

            Thus, for a single instance x_i (row) belonging to X, you can recover the mapping as:

            \n
            >>> v.inverse_transform(X[i][None, :])\n
            \n

            The last bit [None, :] converts the M length row X[i] in a 1xM row vector. Is not entirely needed, but scikits-learn throws a warning. The following should also work:

            \n
            >>> v.inverse_transform(X[i])\n
            \n
            \n

            Now, answering the question, to remove a given feature from your data X, DictVectorizer also stores the name corresponding to every feature in feature_names_.

            \n
            >>> v.feature_names_\n['bar', 'baz', 'foo']\n
            \n

            Thus, you could do something like:

            \n
            >>> column = v.feature_names_.index('foo') # Column mapping index of key 'foo'\n>>> values = X[:, column] # get values\n>>> X[:, column] = 0 # remove them from X\n
            \n
            \n

            Last, extending the answer to sparse matrices with DictVectorizer(sparse=True) where X now is a NxM sparse matrix instead of a numpy array. The above solution works with minor modifications (note .todense() in the value extraction):

            \n
            >>> column = v.feature_names_.index('foo')\n>>> values = X[:, column].todense() # get values\n>>> X[:, column] = 0 # remove them from X\n
            \n

            Replace 'foo' by 'label' in the above code to make it work for you.

            \n soup wrap:

            Features with DictVectorizer are mapped to numpy arrays, which represents the feature as NxM numerical matrix (dictionary is lost). However, the class DictVectorizer preserves the mapping function internally, and you can recover it using .inverse_transform. From the documentation of DictVectorizer:

            from sklearn.feature_extraction import DictVectorizer
            >>> v = DictVectorizer(sparse=False)
            >>> D = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}]
            >>> X = v.fit_transform(D)
            >>> X
            array([[ 2.,  0.,  1.],
                   [ 0.,  1.,  3.]])
            >>> v.inverse_transform(X) == [{'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0}]
            True
            

            Thus, for a single instance x_i (row) belonging to X, you can recover the mapping as:

            >>> v.inverse_transform(X[i][None, :])
            

            The last bit [None, :] converts the M length row X[i] in a 1xM row vector. Is not entirely needed, but scikits-learn throws a warning. The following should also work:

            >>> v.inverse_transform(X[i])
            

            Now, answering the question, to remove a given feature from your data X, DictVectorizer also stores the name corresponding to every feature in feature_names_.

            >>> v.feature_names_
            ['bar', 'baz', 'foo']
            

            Thus, you could do something like:

            >>> column = v.feature_names_.index('foo') # Column mapping index of key 'foo'
            >>> values = X[:, column] # get values
            >>> X[:, column] = 0 # remove them from X
            

            Last, extending the answer to sparse matrices with DictVectorizer(sparse=True) where X now is a NxM sparse matrix instead of a numpy array. The above solution works with minor modifications (note .todense() in the value extraction):

            >>> column = v.feature_names_.index('foo')
            >>> values = X[:, column].todense() # get values
            >>> X[:, column] = 0 # remove them from X
            

            Replace 'foo' by 'label' in the above code to make it work for you.

            qid & accept id: (35431172, 35431313) query: Creating a table out of data in python soup:

            If you're looking to do this text based you could use the .format method, read the docs to learn about formatting so you can specify the spacing of each column

            \n

            like

            \n
            your_list = ['bread', 'milk', 'sugar', 'tea']\n\nprint("{0:20}    {1:20}    {2:20}    {3:20}\n".format('Column 1', 'Column 2', 'Column 3', 'Column 4'))\nprint("{0:20}    {1:20}    {2:20}    {3:20}\n".format(your_list[0], your_list[1], your_list[2], your_list[3]))\n
            \n

            Returns:

            \n
            Column 1                Column 2                Column 3                Column 4            \n\nbread                   milk                    sugar                   tea   \n
            \n

            With a for loop example getting closer to what you might be wanting to do:

            \n
            your_list = ['bread', 'milk', 'sugar', 'tea', 'eggs', 'shampoo', 'clothes', 'tiger', 'beads', 'washing machine', 'juice', 'mixed herbs']\n\nprint("{0:20}    {1:20}    {2:20}    {3:20}\n".format('Column 1', 'Column 2', 'Column 3', 'Column 4'))\ni = 0\nfor x in range(0, 3):\n    print("{0:20}    {1:20}    {2:20}    {3:20}\n".format(your_list[i], your_list[i + 1], your_list[i + 2], your_list[i + 3]))\n    i += 4\n
            \n

            Output:

            \n
            Column 1                Column 2                Column 3                Column 4            \n\nbread                   milk                    sugar                   tea                 \n\neggs                    shampoo                 clothes                 tiger               \n\nbeads                   washing machine         juice                   mixed herbs    \n
            \n

            EDIT:

            \n
            your_list = ['10', 10, '20', 20]\nprint("{0:20}    {1:20}    {2:20}    {3:20}\n".format('Column 1', 'Column 2', 'Column 3', 'Column 4'))\nprint("{0:20}    {1:20}    {2:20}    {3:20}\n".format(your_list[0], your_list[1], your_list[2], str(your_list[3])))\n
            \n

            Output:

            \n
            Column 1                Column 2                Column 3                Column 4            \n\n10                                        10    20                      20      \n
            \n

            If you convert the integers in your list to Strings demonstrated by the last element in my format statement using the Str() method then this won't happen :)

            \n
            str(your_list[3])\n
            \n

            https://docs.python.org/2/library/string.html

            \n soup wrap:

            If you're looking to do this text based you could use the .format method, read the docs to learn about formatting so you can specify the spacing of each column

            like

            your_list = ['bread', 'milk', 'sugar', 'tea']
            
            print("{0:20}    {1:20}    {2:20}    {3:20}\n".format('Column 1', 'Column 2', 'Column 3', 'Column 4'))
            print("{0:20}    {1:20}    {2:20}    {3:20}\n".format(your_list[0], your_list[1], your_list[2], your_list[3]))
            

            Returns:

            Column 1                Column 2                Column 3                Column 4            
            
            bread                   milk                    sugar                   tea   
            

            With a for loop example getting closer to what you might be wanting to do:

            your_list = ['bread', 'milk', 'sugar', 'tea', 'eggs', 'shampoo', 'clothes', 'tiger', 'beads', 'washing machine', 'juice', 'mixed herbs']
            
            print("{0:20}    {1:20}    {2:20}    {3:20}\n".format('Column 1', 'Column 2', 'Column 3', 'Column 4'))
            i = 0
            for x in range(0, 3):
                print("{0:20}    {1:20}    {2:20}    {3:20}\n".format(your_list[i], your_list[i + 1], your_list[i + 2], your_list[i + 3]))
                i += 4
            

            Output:

            Column 1                Column 2                Column 3                Column 4            
            
            bread                   milk                    sugar                   tea                 
            
            eggs                    shampoo                 clothes                 tiger               
            
            beads                   washing machine         juice                   mixed herbs    
            

            EDIT:

            your_list = ['10', 10, '20', 20]
            print("{0:20}    {1:20}    {2:20}    {3:20}\n".format('Column 1', 'Column 2', 'Column 3', 'Column 4'))
            print("{0:20}    {1:20}    {2:20}    {3:20}\n".format(your_list[0], your_list[1], your_list[2], str(your_list[3])))
            

            Output:

            Column 1                Column 2                Column 3                Column 4            
            
            10                                        10    20                      20      
            

            If you convert the integers in your list to Strings demonstrated by the last element in my format statement using the Str() method then this won't happen :)

            str(your_list[3])
            

            https://docs.python.org/2/library/string.html

            qid & accept id: (35432378, 35432621) query: Python reshape list to ndim array soup:

            You can think of reshaping that the new shape is filled row by row (last dimension varies fastest) from the flattened original list/array.

            \n

            An easy solution is to shape the list into a (100, 28) array and then transpose it:

            \n
            x = np.reshape(list_data, (100, 28)).T\n
            \n

            Update regarding the updated example:

            \n
            np.reshape([0, 0, 1, 1, 2, 2, 3, 3], (4, 2)).T\n# array([[0, 1, 2, 3],\n#        [0, 1, 2, 3]])\n\nnp.reshape([0, 0, 1, 1, 2, 2, 3, 3], (2, 4))\n# array([[0, 0, 1, 1],\n#        [2, 2, 3, 3]])\n
            \n soup wrap:

            You can think of reshaping that the new shape is filled row by row (last dimension varies fastest) from the flattened original list/array.

            An easy solution is to shape the list into a (100, 28) array and then transpose it:

            x = np.reshape(list_data, (100, 28)).T
            

            Update regarding the updated example:

            np.reshape([0, 0, 1, 1, 2, 2, 3, 3], (4, 2)).T
            # array([[0, 1, 2, 3],
            #        [0, 1, 2, 3]])
            
            np.reshape([0, 0, 1, 1, 2, 2, 3, 3], (2, 4))
            # array([[0, 0, 1, 1],
            #        [2, 2, 3, 3]])
            
            qid & accept id: (35469417, 35469553) query: Selecting Tags With Multiple Part Class in BeautifulSoup soup:

            The problem is that class is a multi-valued attribute. Here is a quite detailed story in a context of a similar problem: BeautifulSoup returns empty list when searching by compound class names.

            \n
            \n

            As a workaround, you can make a filtering function:

            \n
            def filter_function(elm):\n    return elm and "class" in elm.attrs and "A" in elm["class"] and "Y" not in elm["class"]\n
            \n

            Complete example:

            \n
            from bs4 import BeautifulSoup\n\nhtml = """\n
            \n
            test1
            \n
            test2
            \n
            test3
            \n
            \n"""\n\nsoup = BeautifulSoup(html, "lxml")\n\ndef filter_function(elm):\n return elm and "class" in elm.attrs and "A" in elm["class"] and "Y" not in elm["class"]\n\nfor div in soup.find_all(filter_function):\n print(div.text)\n
            \n

            Prints:

            \n
            test1\ntest2\n
            \n soup wrap:

            The problem is that class is a multi-valued attribute. Here is a quite detailed story in a context of a similar problem: BeautifulSoup returns empty list when searching by compound class names.


            As a workaround, you can make a filtering function:

            def filter_function(elm):
                return elm and "class" in elm.attrs and "A" in elm["class"] and "Y" not in elm["class"]
            

            Complete example:

            from bs4 import BeautifulSoup
            
            html = """
            
            test1
            test2
            test3
            """ soup = BeautifulSoup(html, "lxml") def filter_function(elm): return elm and "class" in elm.attrs and "A" in elm["class"] and "Y" not in elm["class"] for div in soup.find_all(filter_function): print(div.text)

            Prints:

            test1
            test2
            
            qid & accept id: (35481842, 35482432) query: python: how can I get a new value in round robin style every time i invoke the script soup:

            As other answers have pointed out, you will need some external way to provide persistency between executions. You could use a separate file, a database record, or some other long-running process.

            \n

            However, you could also use the subnets file itself to provide the persistency. Just read the file, print the top value and then rotate its contents, ready for the next run.

            \n

            This example uses a deque to implement the rotation:

            \n
            import os\nimport collections\n\nsubnets_file = "subnets.txt"\n\n# Load the subnets file into a deque\nwith open(subnets_file, 'r') as f:\n    subnets = collections.deque(f.read().splitlines())\n\n# Print the top subnet\nprint subnets[0]\n\n# Rotate the subnets\nsubnets.rotate(-1)\n\n# Save the rotated subnets\nwith open(subnets_file, 'w') as f:\n    for s in subnets:\n        f.write("%s\n" % s)\n
            \n

            when run:

            \n
            $ python next_available_subnet.py \nsubnet1\n$ python next_available_subnet.py \nsubnet2\n$ python next_available_subnet.py \nsubnet3\n$ python next_available_subnet.py \nsubnet1\n
            \n soup wrap:

            As other answers have pointed out, you will need some external way to provide persistency between executions. You could use a separate file, a database record, or some other long-running process.

            However, you could also use the subnets file itself to provide the persistency. Just read the file, print the top value and then rotate its contents, ready for the next run.

            This example uses a deque to implement the rotation:

            import os
            import collections
            
            subnets_file = "subnets.txt"
            
            # Load the subnets file into a deque
            with open(subnets_file, 'r') as f:
                subnets = collections.deque(f.read().splitlines())
            
            # Print the top subnet
            print subnets[0]
            
            # Rotate the subnets
            subnets.rotate(-1)
            
            # Save the rotated subnets
            with open(subnets_file, 'w') as f:
                for s in subnets:
                    f.write("%s\n" % s)
            

            when run:

            $ python next_available_subnet.py 
            subnet1
            $ python next_available_subnet.py 
            subnet2
            $ python next_available_subnet.py 
            subnet3
            $ python next_available_subnet.py 
            subnet1
            
            qid & accept id: (35485675, 35486822) query: How to create a vector of Matrices in python soup:

            Before entering your time loop, create an empty list:

            \n
            listOfFlowMaps = []\n
            \n

            Then, after creating your flow map:

            \n
            flowmap = np.array([...]) # your flow map\nlistOfFlowMaps.append(flowmap) # add the flow map to the list\n
            \n soup wrap:

            Before entering your time loop, create an empty list:

            listOfFlowMaps = []
            

            Then, after creating your flow map:

            flowmap = np.array([...]) # your flow map
            listOfFlowMaps.append(flowmap) # add the flow map to the list
            
            qid & accept id: (35488781, 35489500) query: Selecting the value in a row closest to zero in a pandas DataFrame soup:

            Just use where to choose between the value in 'a' or 'b'

            \n
            df['a'].where(df['a'].abs() < df['b'].abs(), df['b'])\n
            \n

            My original answer

            \n

            You can add a column with the column name of the value closest to zero, then use it to select the closest value.

            \n
            (df.assign(closest=df.apply(lambda x: x.abs().argmin(), axis='columns'))\n .apply(lambda x: x[x['closest']], axis='columns'))\n
            \n

            Still it looks more cumbersome than it should be?

            \n soup wrap:

            Just use where to choose between the value in 'a' or 'b'

            df['a'].where(df['a'].abs() < df['b'].abs(), df['b'])
            

            My original answer

            You can add a column with the column name of the value closest to zero, then use it to select the closest value.

            (df.assign(closest=df.apply(lambda x: x.abs().argmin(), axis='columns'))
             .apply(lambda x: x[x['closest']], axis='columns'))
            

            Still it looks more cumbersome than it should be?

            qid & accept id: (35489107, 35489172) query: Django URL matching any 140 characters soup:
            \n

            Where text is any 140 characters (including spaces)

            \n
            \n

            Match words and spaces

            \n

            You can add {140} to specify the length, and add \s to match spaces, to your URL regex:

            \n
            url(r'^home/(?P[\w\s]{140})$',....),\n
            \n

            Here's how you can test it out, I've reduced it to 10 length to make it easy:

            \n
            >>> import re\n>>> regex = r'^home/(?P[\w\s]{10})$'\n>>> re.search(regex, "home/1234567890").group(1)\n'1234567890'\n>>> re.search(regex, "home/12345 7890").group(1)\n'12345 7890'\n>>> re.search(regex, "home/ abcd fghi").group(1)\n' sbcd fghi'\n
            \n

            and if you exceed the length, e.g.

            \n
            >>> re.search(regex, "home/ abcd fghizzz")\n# Doesn't match, returns None\n
            \n

            and if you go below the length, e.g.

            \n
            >>> re.search(regex, "home/ abc")\n# Doesn't match, returns None\n
            \n

            Match "anything" (not practical for a URL, because of e.g. ? and #)

            \n

            If you want to match pretty much anything:

            \n
            >>> regex = r'^home/(?P(.){10})$'\n>>> re.search(regex, "home/1@#$%^& &*").group(1)\n'1@#$%^& &*'\n>>> re.search(regex, "home/1@bcd^& &*").group(1)\n'1@bcd^& &*'\n
            \n soup wrap:

            Where text is any 140 characters (including spaces)

            Match words and spaces

            You can add {140} to specify the length, and add \s to match spaces, to your URL regex:

            url(r'^home/(?P[\w\s]{140})$',....),
            

            Here's how you can test it out, I've reduced it to 10 length to make it easy:

            >>> import re
            >>> regex = r'^home/(?P[\w\s]{10})$'
            >>> re.search(regex, "home/1234567890").group(1)
            '1234567890'
            >>> re.search(regex, "home/12345 7890").group(1)
            '12345 7890'
            >>> re.search(regex, "home/ abcd fghi").group(1)
            ' sbcd fghi'
            

            and if you exceed the length, e.g.

            >>> re.search(regex, "home/ abcd fghizzz")
            # Doesn't match, returns None
            

            and if you go below the length, e.g.

            >>> re.search(regex, "home/ abc")
            # Doesn't match, returns None
            

            Match "anything" (not practical for a URL, because of e.g. ? and #)

            If you want to match pretty much anything:

            >>> regex = r'^home/(?P(.){10})$'
            >>> re.search(regex, "home/1@#$%^& &*").group(1)
            '1@#$%^& &*'
            >>> re.search(regex, "home/1@bcd^& &*").group(1)
            '1@bcd^& &*'
            
            qid & accept id: (35493086, 35494379) query: Compare values in 2 columns and output the result in a third column in pandas soup:

            The key is pandas.Series.isin() which checks for membership of each element in the calling pandas.Series in the object passed to pandas.Series.isin(). You want to check for membership of each of element in b_received with c_consumed, but only within each group as defined by a_id. When using groupby with apply pandas will index the object by the grouping variable as well as its original index. In your case, you don't need the grouping variable in the index, so you can reset the index back to what it was originally with reset_index using drop=True.

            \n
            df['output'] = (df.groupby('a_id')\n               .apply(lambda x : x['b_received'].isin(x['c_consumed']).astype('i4'))\n               .reset_index(level='a_id', drop=True))\n
            \n

            Your DataFrame is now ...

            \n
                a_id b_received c_consumed  output\n0    sam       soap        oil       1\n1    sam        oil        NaN       1\n2    sam      brush       soap       0\n3  harry        oil      shoes       1\n4  harry      shoes        oil       1\n5  alice       beer       eggs       0\n6  alice      brush      brush       1\n7  alice       eggs        NaN       1\n
            \n

            Have a look a the documentation for split-apply-combine with pandas for a more thorough explanation.

            \n soup wrap:

            The key is pandas.Series.isin() which checks for membership of each element in the calling pandas.Series in the object passed to pandas.Series.isin(). You want to check for membership of each of element in b_received with c_consumed, but only within each group as defined by a_id. When using groupby with apply pandas will index the object by the grouping variable as well as its original index. In your case, you don't need the grouping variable in the index, so you can reset the index back to what it was originally with reset_index using drop=True.

            df['output'] = (df.groupby('a_id')
                           .apply(lambda x : x['b_received'].isin(x['c_consumed']).astype('i4'))
                           .reset_index(level='a_id', drop=True))
            

            Your DataFrame is now ...

                a_id b_received c_consumed  output
            0    sam       soap        oil       1
            1    sam        oil        NaN       1
            2    sam      brush       soap       0
            3  harry        oil      shoes       1
            4  harry      shoes        oil       1
            5  alice       beer       eggs       0
            6  alice      brush      brush       1
            7  alice       eggs        NaN       1
            

            Have a look a the documentation for split-apply-combine with pandas for a more thorough explanation.

            qid & accept id: (35510590, 35514860) query: How to get the location of a Zope installation from inside an instance? soup:

            In the past I had a similar use case.\nI solved it by declaring the path inside the zope.conf:

            \n
            zope-conf-additional +=\n  \n    logfile ${buildout:directory}/var/log/prenotazioni.log\n  \n
            \n

            See the README of this product:

            \n\n

            This zope configuration can then be interpreted with this code:

            \n
            from App.config import getConfiguration\n\nproduct_config = getattr(getConfiguration(), 'product_config', {})\nconfig = product_config.get('pd.prenotazioni', {})\nlogfile = config.get('logfile')\n
            \n

            See the full example

            \n\n

            Worth noting is the fact that the initial return avoids multiple logging if the init function is mistakenly called more than once.

            \n

            Anyway, if you do not want to play with buildout and custom zope configuration, you may want to get the default event log location.

            \n

            It is specified in the zope.conf. You should have something like this:

            \n
            \n  level INFO\n  \n    path /path/to/plone/var/log/instance.log\n    level INFO\n  \n\n
            \n

            I was able to obtain the path with this code:

            \n
            from App.config import getConfiguration \nimport os\n\neventlog = getConfiguration().eventlog\nlogpath = eventlog.handler_factories[0].instance.baseFilename\nlogfolder = os.path.split(logpath)[0] \n
            \n

            Probably looking at in the App module code you will find a more straightforward way of getting this value.

            \n

            Another possible (IMHO weaker) solution would be store (through buildout or your prefered method) the logfile path into an environment variable.

            \n soup wrap:

            In the past I had a similar use case. I solved it by declaring the path inside the zope.conf:

            zope-conf-additional +=
              
                logfile ${buildout:directory}/var/log/prenotazioni.log
              
            

            See the README of this product:

            This zope configuration can then be interpreted with this code:

            from App.config import getConfiguration
            
            product_config = getattr(getConfiguration(), 'product_config', {})
            config = product_config.get('pd.prenotazioni', {})
            logfile = config.get('logfile')
            

            See the full example

            Worth noting is the fact that the initial return avoids multiple logging if the init function is mistakenly called more than once.

            Anyway, if you do not want to play with buildout and custom zope configuration, you may want to get the default event log location.

            It is specified in the zope.conf. You should have something like this:

            
              level INFO
              
                path /path/to/plone/var/log/instance.log
                level INFO
              
            
            

            I was able to obtain the path with this code:

            from App.config import getConfiguration 
            import os
            
            eventlog = getConfiguration().eventlog
            logpath = eventlog.handler_factories[0].instance.baseFilename
            logfolder = os.path.split(logpath)[0] 
            

            Probably looking at in the App module code you will find a more straightforward way of getting this value.

            Another possible (IMHO weaker) solution would be store (through buildout or your prefered method) the logfile path into an environment variable.

            qid & accept id: (35526501, 35527977) query: how to groupby pandas dataframe on some condition soup:

            Similar to Anton's answer, but using apply

            \n
            users = df.groupby('buyer_id').apply(lambda r: r['item_id'].unique().shape[0] > 1 and \n                                               r['date'].unique().shape[0] > 1 )*1\ndf.set_index('buyer_id', inplace=True)\ndf['good_user'] = users\n
            \n

            result:

            \n
                      item_id  order_id        date  good_user\nbuyer_id\n139            57       387  2015-12-28          1\n140             9       388  2015-12-28          1\n140            57       389  2015-12-28          1\n36              9       390  2015-12-28          0\n64             49       404  2015-12-29          0\n146            49       405  2015-12-29          0\n81             49       406  2015-12-29          0\n140            80       407  2015-12-30          1\n139            81       408  2015-12-30          1\n
            \n

            EDIT because I thought of another case: suppose the data shows a buyer buys the same two (or more) goods on two different days. Should this user be flagged as 1 or 0? Because effectively, he/she does not actually choose anything different on the second date.\nSo take buyer 81 in the following table. You see they only buy 49 and 50 on both dates.

            \n
                buyer_id   item_id order_id    date\n         139        57      387    2015-12-28\n         140         9      388    2015-12-28\n         140        57      389    2015-12-28\n          36         9      390    2015-12-28\n          64        49      404    2015-12-29\n         146        49      405    2015-12-29\n          81        49      406    2015-12-29\n         140        80      407    2015-12-30\n         139        81      408    2015-12-30\n          81        50      406    2015-12-29\n          81        49      999    2015-12-30\n          81        50      999    2015-12-30\n
            \n

            To accomodate this, here's what I came up with (kinda ugly but should work)

            \n
            # this function is applied to all buyers\ndef find_good_buyers(buyer):\n    # which dates the buyer has made a purchase\n    buyer_dates = buyer.groupby('date')\n    # a string representing the unique items purchased at each date\n    items_on_date = buyer_dates.agg({'item_id': lambda x: '-'.join(x.unique())})\n    # if there is more than 1 combination of item_id, then it means that\n    # the buyer has purchased different things in different dates\n    # so this buyer must be flagged to 1\n    good_buyer = (len(items_on_date.groupby('item_id').groups) > 1) * 1\n    return good_buyer\n\n\ndf['item_id'] = df['item_id'].astype('S')\nbuyers = df.groupby('buyer_id') \n\ngood_buyer = buyers.apply(find_good_buyers)\ndf.set_index('buyer_id', inplace=True)\ndf['good_buyer'] = good_buyer\ndf.reset_index(inplace=True)\n
            \n

            This works on buyer 81 setting it to 0 because once you group by date, both dates at which a purchase was made will have the same "49-50" combination of items purchased, hence the number of combinations = 1 and the buyer will be flagged 0.

            \n soup wrap:

            Similar to Anton's answer, but using apply

            users = df.groupby('buyer_id').apply(lambda r: r['item_id'].unique().shape[0] > 1 and 
                                                           r['date'].unique().shape[0] > 1 )*1
            df.set_index('buyer_id', inplace=True)
            df['good_user'] = users
            

            result:

                      item_id  order_id        date  good_user
            buyer_id
            139            57       387  2015-12-28          1
            140             9       388  2015-12-28          1
            140            57       389  2015-12-28          1
            36              9       390  2015-12-28          0
            64             49       404  2015-12-29          0
            146            49       405  2015-12-29          0
            81             49       406  2015-12-29          0
            140            80       407  2015-12-30          1
            139            81       408  2015-12-30          1
            

            EDIT because I thought of another case: suppose the data shows a buyer buys the same two (or more) goods on two different days. Should this user be flagged as 1 or 0? Because effectively, he/she does not actually choose anything different on the second date. So take buyer 81 in the following table. You see they only buy 49 and 50 on both dates.

                buyer_id   item_id order_id    date
                     139        57      387    2015-12-28
                     140         9      388    2015-12-28
                     140        57      389    2015-12-28
                      36         9      390    2015-12-28
                      64        49      404    2015-12-29
                     146        49      405    2015-12-29
                      81        49      406    2015-12-29
                     140        80      407    2015-12-30
                     139        81      408    2015-12-30
                      81        50      406    2015-12-29
                      81        49      999    2015-12-30
                      81        50      999    2015-12-30
            

            To accomodate this, here's what I came up with (kinda ugly but should work)

            # this function is applied to all buyers
            def find_good_buyers(buyer):
                # which dates the buyer has made a purchase
                buyer_dates = buyer.groupby('date')
                # a string representing the unique items purchased at each date
                items_on_date = buyer_dates.agg({'item_id': lambda x: '-'.join(x.unique())})
                # if there is more than 1 combination of item_id, then it means that
                # the buyer has purchased different things in different dates
                # so this buyer must be flagged to 1
                good_buyer = (len(items_on_date.groupby('item_id').groups) > 1) * 1
                return good_buyer
            
            
            df['item_id'] = df['item_id'].astype('S')
            buyers = df.groupby('buyer_id') 
            
            good_buyer = buyers.apply(find_good_buyers)
            df.set_index('buyer_id', inplace=True)
            df['good_buyer'] = good_buyer
            df.reset_index(inplace=True)
            

            This works on buyer 81 setting it to 0 because once you group by date, both dates at which a purchase was made will have the same "49-50" combination of items purchased, hence the number of combinations = 1 and the buyer will be flagged 0.

            qid & accept id: (35559958, 35560079) query: Check list of tuples where first element of tuple is specified by defined string soup:

            You could use Python's filter function for this as follows:

            \n
            l = [('A', 2), ('A', 1), ('B', 0.2)]\nprint filter(lambda x: x[0] == 'A', l)\n
            \n

            Giving:

            \n
            [('A', 2), ('A', 1)]\n
            \n soup wrap:

            You could use Python's filter function for this as follows:

            l = [('A', 2), ('A', 1), ('B', 0.2)]
            print filter(lambda x: x[0] == 'A', l)
            

            Giving:

            [('A', 2), ('A', 1)]
            
            qid & accept id: (35560606, 35561097) query: Modifying HTML using python html package soup:

            BeautifulSoup would get you quite close to your desired behavior:

            \n
            from bs4 import BeautifulSoup\n\nhtml_table_string = """\n\n    \n        \n    \n
            Something else
            \n"""\ntable = BeautifulSoup(html_table_string, "html.parser")\n\n# Select first td element and set it's content to 'Something'\ntable.select_one('td').string = 'Something' # or table.find('td').string = 'Something'\n\nprint(table.prettify())\n
            \n

            Prints:

            \n
            \n \n  \n \n
            \n Something\n
            \n
            \n soup wrap:

            BeautifulSoup would get you quite close to your desired behavior:

            from bs4 import BeautifulSoup
            
            html_table_string = """
            
            Something else
            """ table = BeautifulSoup(html_table_string, "html.parser") # Select first td element and set it's content to 'Something' table.select_one('td').string = 'Something' # or table.find('td').string = 'Something' print(table.prettify())

            Prints:

            Something
            qid & accept id: (35561635, 36796661) query: Packaging a python application ( with enthought, matplotlib, wxpython) into executable soup:

            Here is the solution which worked for me.

            \n

            I have tried to use bbfreeze, PyInstaller , py2exe and cx_Freeze. In the end I decided to go with cx_Freeze as it is apparently popular with people who are packaging the applications with enthought classes.

            \n

            With cx_Freeze I got the similar error message as above. The problem is that it saves the necessary modules in "library.zip" file, which is something that enthought classes including mayavi have problem with. Fortunately cx_Freeze allows specifying "create_shared_zip": False option, which then makes it so that source files are directly copied into the build directory as they are, instead of in a zip file.

            \n

            Additionally, I have found that some Enthought files and folders have to be manually included in include_files (same goes for scipy, source: here). After this it worked. I add my setup file code below, hope it helps.

            \n
            import sys\nimport os\nfrom cx_Freeze import setup, Executable\nimport scipy\n\nscipy_path = os.path.dirname(scipy.__file__) #use this if you are also using scipy in your application\n\nbuild_exe_options = {"packages": ["pyface.ui.wx", "tvtk.vtk_module", "tvtk.pyface.ui.wx", "matplotlib.backends.backend_tkagg"],\n                     "excludes": ['numarray', 'IPython'],\n                     "include_files": [("C:\\Python27\\Lib\\site-packages\\tvtk\\pyface\\images\\", "tvtk\\pyface\\images"),\n                                       ("C:\\Python27\\Lib\\site-packages\\pyface\\images\\", "pyface\\images"),\n                                       ("C:\\Python27\\Lib\\site-packages\\tvtk\\plugins\\scene\\preferences.ini", "tvtk\\plugins\\scene\\preferences.ini"),\n                                       ("C:\\Python27\\Lib\\site-packages\\tvtk\\tvtk_classes.zip", "tvtk\\tvtk_classes.zip"),\n                                       ("C:\\Python27\\Lib\\site-packages\\mayavi\\core\\lut\\pylab_luts.pkl","mayavi\\core\\lut\\pylab_luts.pkl"),\n                                       ("C:\\Python27\\Lib\\site-packages\\mayavi\\preferences\\preferences.ini","mayavi\\preferences\\preferences.ini"),\n                                       ("C:\\Python27\\Lib\\site-packages\\numpy\\core\\libifcoremd.dll","numpy\\core\\libifcoremd.dll"),\n                                       ("C:\\Python27\\Lib\\site-packages\\numpy\\core\\libmmd.dll","numpy\\core\\libmmd.dll"),\n                                       (str(scipy_path), "scipy") #for scipy\n                                       ]                       \n                     ,"create_shared_zip": False #to avoid creating library.zip\n                     }\n\nexecutables = [\n    Executable('myfile.py', targetName="myfile.exe", base=None)\n]\n\nsetup(name='myfile',\n      version='1.0',\n      description='myfile',\n      options = {"build_exe": build_exe_options},\n      executables=executables\n      ) \n
            \n

            Configuration:

            \n
            python 2.7\naltgraph==0.9\napptools==4.3.0\nbbfreeze==1.1.3\nbbfreeze-loader==1.1.0\nconfigobj==5.0.6\ncx-Freeze==4.3.3\nCython==0.23.4\nmatplotlib==1.4.3\nmayavi==4.4.3\nMySQL-python==1.2.5\nnatgrid==0.2.1\nnumpy==1.10.0b1\nopencv-python==2.4.12\npandas==0.16.2\npefile==1.2.10.post114\nPillow==3.1.1\nplyfile==0.4\npsutil==4.1.0\npyface==5.0.0\nPygments==2.0.2\npygobject==2.28.6\npygtk==2.22.0\nPyInstaller==3.1\npyparsing==2.0.3\npypiwin32==219\nPySide==1.2.2\npython-dateutil==2.4.2\npytz==2015.4\nscipy==0.16.0\nsix==1.9.0\nsubprocess32==3.2.7\ntraits==4.5.0\ntraitsui==5.0.0\ntransforms3d==0.2.1\nVTK==6.2.0\n
            \n soup wrap:

            Here is the solution which worked for me.

            I have tried to use bbfreeze, PyInstaller , py2exe and cx_Freeze. In the end I decided to go with cx_Freeze as it is apparently popular with people who are packaging the applications with enthought classes.

            With cx_Freeze I got the similar error message as above. The problem is that it saves the necessary modules in "library.zip" file, which is something that enthought classes including mayavi have problem with. Fortunately cx_Freeze allows specifying "create_shared_zip": False option, which then makes it so that source files are directly copied into the build directory as they are, instead of in a zip file.

            Additionally, I have found that some Enthought files and folders have to be manually included in include_files (same goes for scipy, source: here). After this it worked. I add my setup file code below, hope it helps.

            import sys
            import os
            from cx_Freeze import setup, Executable
            import scipy
            
            scipy_path = os.path.dirname(scipy.__file__) #use this if you are also using scipy in your application
            
            build_exe_options = {"packages": ["pyface.ui.wx", "tvtk.vtk_module", "tvtk.pyface.ui.wx", "matplotlib.backends.backend_tkagg"],
                                 "excludes": ['numarray', 'IPython'],
                                 "include_files": [("C:\\Python27\\Lib\\site-packages\\tvtk\\pyface\\images\\", "tvtk\\pyface\\images"),
                                                   ("C:\\Python27\\Lib\\site-packages\\pyface\\images\\", "pyface\\images"),
                                                   ("C:\\Python27\\Lib\\site-packages\\tvtk\\plugins\\scene\\preferences.ini", "tvtk\\plugins\\scene\\preferences.ini"),
                                                   ("C:\\Python27\\Lib\\site-packages\\tvtk\\tvtk_classes.zip", "tvtk\\tvtk_classes.zip"),
                                                   ("C:\\Python27\\Lib\\site-packages\\mayavi\\core\\lut\\pylab_luts.pkl","mayavi\\core\\lut\\pylab_luts.pkl"),
                                                   ("C:\\Python27\\Lib\\site-packages\\mayavi\\preferences\\preferences.ini","mayavi\\preferences\\preferences.ini"),
                                                   ("C:\\Python27\\Lib\\site-packages\\numpy\\core\\libifcoremd.dll","numpy\\core\\libifcoremd.dll"),
                                                   ("C:\\Python27\\Lib\\site-packages\\numpy\\core\\libmmd.dll","numpy\\core\\libmmd.dll"),
                                                   (str(scipy_path), "scipy") #for scipy
                                                   ]                       
                                 ,"create_shared_zip": False #to avoid creating library.zip
                                 }
            
            executables = [
                Executable('myfile.py', targetName="myfile.exe", base=None)
            ]
            
            setup(name='myfile',
                  version='1.0',
                  description='myfile',
                  options = {"build_exe": build_exe_options},
                  executables=executables
                  ) 
            

            Configuration:

            python 2.7
            altgraph==0.9
            apptools==4.3.0
            bbfreeze==1.1.3
            bbfreeze-loader==1.1.0
            configobj==5.0.6
            cx-Freeze==4.3.3
            Cython==0.23.4
            matplotlib==1.4.3
            mayavi==4.4.3
            MySQL-python==1.2.5
            natgrid==0.2.1
            numpy==1.10.0b1
            opencv-python==2.4.12
            pandas==0.16.2
            pefile==1.2.10.post114
            Pillow==3.1.1
            plyfile==0.4
            psutil==4.1.0
            pyface==5.0.0
            Pygments==2.0.2
            pygobject==2.28.6
            pygtk==2.22.0
            PyInstaller==3.1
            pyparsing==2.0.3
            pypiwin32==219
            PySide==1.2.2
            python-dateutil==2.4.2
            pytz==2015.4
            scipy==0.16.0
            six==1.9.0
            subprocess32==3.2.7
            traits==4.5.0
            traitsui==5.0.0
            transforms3d==0.2.1
            VTK==6.2.0
            
            qid & accept id: (35580801, 35581686) query: Chunking bytes (not strings) in Python 2 and 3 soup:

            Funcy (a library offering various useful utilities, supporting both Python 2 and 3) offers a chunks function that does exactly this:

            \n
            >>> import funcy\n>>> data = b'abcdefghijklmnopqrstuvwxyz'\n>>> list(funcy.chunks(6, data))\n[b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz']   # Python 3\n['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz']        # Python 2.7\n
            \n

            Alternatively, you could include a simple implementation of this in your program (compatible with both Python 2.7 and 3):

            \n
            def chunked(size, source):\n    for i in range(0, len(source), size):\n        yield source[i:i+size]\n
            \n

            It behaves the same (at least for your data; Funcy's chunks also works with iterators, this doesn't):

            \n
            >>> list(chunked(6, data))\n[b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz']   # Python 3\n['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz']        # Python 2.7\n
            \n soup wrap:

            Funcy (a library offering various useful utilities, supporting both Python 2 and 3) offers a chunks function that does exactly this:

            >>> import funcy
            >>> data = b'abcdefghijklmnopqrstuvwxyz'
            >>> list(funcy.chunks(6, data))
            [b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz']   # Python 3
            ['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz']        # Python 2.7
            

            Alternatively, you could include a simple implementation of this in your program (compatible with both Python 2.7 and 3):

            def chunked(size, source):
                for i in range(0, len(source), size):
                    yield source[i:i+size]
            

            It behaves the same (at least for your data; Funcy's chunks also works with iterators, this doesn't):

            >>> list(chunked(6, data))
            [b'abcdef', b'ghijkl', b'mnopqr', b'stuvwx', b'yz']   # Python 3
            ['abcdef', 'ghijkl', 'mnopqr', 'stuvwx', 'yz']        # Python 2.7
            
            qid & accept id: (35595836, 35595986) query: pysvn: How to find out if local dir is under version control? soup:

            pysvn.Client.info will raise pysvn.ClientError if you pass non working copy directory:

            \n
            >>> import pysvn\n>>> client = pysvn.Client()\n>>> client.info('/tmp')\nTraceback (most recent call last):\n  File "", line 1, in \npysvn._pysvn_2_7.ClientError: '/tmp' is not a working copy\n
            \n

            You can use that behavior. By catching the exception:

            \n
            >>> try:\n...     client.info('/tmp')\n... except pysvn.ClientError:\n...     print('not working copy')\n... else:\n...     print('working copy')\n...\nnot working copy\n
            \n soup wrap:

            pysvn.Client.info will raise pysvn.ClientError if you pass non working copy directory:

            >>> import pysvn
            >>> client = pysvn.Client()
            >>> client.info('/tmp')
            Traceback (most recent call last):
              File "", line 1, in 
            pysvn._pysvn_2_7.ClientError: '/tmp' is not a working copy
            

            You can use that behavior. By catching the exception:

            >>> try:
            ...     client.info('/tmp')
            ... except pysvn.ClientError:
            ...     print('not working copy')
            ... else:
            ...     print('working copy')
            ...
            not working copy
            
            qid & accept id: (35596269, 35596474) query: Replacing strings in specific positions into a text and then rewriting all the text soup:

            Try to do it like this:

            \n
            file = open('1qib.pdb', 'r')\nfile2 = open('new.pdb', 'w')\n\nfor i, line in enumerate(file):\n    spl = line.split() #1\n    spl[4] = spl[4].replace("A", "B") #2\n    newline = " ".join(spl) #3\n    file2.write(newline) #4\n\nfile.close()\nfile2.close()\n
            \n

            Step-by-step explanation:

            \n
              \n
            1. Note that what you do here is to split the line first into list of string.

              \n
              spl = line.split()\n
            2. \n
            3. Then you only want to replace the index=4 of the list of string to new item.

              \n
              spl[4] = spl[4].replace("A", "B")\n
            4. \n
            5. And finally you rejoin the list

              \n
              newline = " ".join(spl)\n
            6. \n
            7. before you write it to the file again

              \n
              file2.write(newline)\n
            8. \n
            \n soup wrap:

            Try to do it like this:

            file = open('1qib.pdb', 'r')
            file2 = open('new.pdb', 'w')
            
            for i, line in enumerate(file):
                spl = line.split() #1
                spl[4] = spl[4].replace("A", "B") #2
                newline = " ".join(spl) #3
                file2.write(newline) #4
            
            file.close()
            file2.close()
            

            Step-by-step explanation:

            1. Note that what you do here is to split the line first into list of string.

              spl = line.split()
              
            2. Then you only want to replace the index=4 of the list of string to new item.

              spl[4] = spl[4].replace("A", "B")
              
            3. And finally you rejoin the list

              newline = " ".join(spl)
              
            4. before you write it to the file again

              file2.write(newline)
              
            qid & accept id: (35608326, 35609157) query: pyspark - multiple input files into one RDD and one output file soup:

            This should load all the files matching the pattern.

            \n
            rdd = sc.textFile("file:///path/*.txt")\n
            \n

            Now, you do not need to do any union. You have only one RDD.

            \n

            Coming to your question - why are you getting many output files. The number of output files depends on number of partitions in the RDD. When you run word count logic, your resultant RDD can have more than 1 partitions. If you want to save the RDD as single file, use coalesce or repartition to have only one partition.

            \n

            The code below works, taken from Examples.

            \n
            rdd = sc.textFile("file:///path/*.txt")\ncounts = rdd.flatMap(lambda line: line.split(" ")) \\n...              .map(lambda word: (word, 1)) \\n...              .reduceByKey(lambda a, b: a + b)\n\ncounts.coalesce(1).saveAsTextFile("res.csv")\n
            \n soup wrap:

            This should load all the files matching the pattern.

            rdd = sc.textFile("file:///path/*.txt")
            

            Now, you do not need to do any union. You have only one RDD.

            Coming to your question - why are you getting many output files. The number of output files depends on number of partitions in the RDD. When you run word count logic, your resultant RDD can have more than 1 partitions. If you want to save the RDD as single file, use coalesce or repartition to have only one partition.

            The code below works, taken from Examples.

            rdd = sc.textFile("file:///path/*.txt")
            counts = rdd.flatMap(lambda line: line.split(" ")) \
            ...              .map(lambda word: (word, 1)) \
            ...              .reduceByKey(lambda a, b: a + b)
            
            counts.coalesce(1).saveAsTextFile("res.csv")
            
            qid & accept id: (35609991, 35610062) query: How do I print a sorted Dictionary in Python 3.4.3 soup:
            >>> class1 = { 'Ethan':'9','Ian':'3','Helen':'8','Holly':'6' }\n>>> print(sorted(class1.items()))\n[('Ethan', '9'), ('Helen', '8'), ('Holly', '6'), ('Ian', '3')]\n
            \n

             

            \n
            >>> for k,v in sorted(class1.items()):\n...     print(k, v)\n...\nEthan 9\nHelen 8\nHolly 6\nIan 3\n
            \n

             

            \n
            >>> for k,v in sorted(class1.items(), key=lambda p:p[1]):\n...     print(k,v)\n...\nIan 3\nHolly 6\nHelen 8\nEthan 9\n\n>>> for k,v in sorted(class1.items(), key=lambda p:p[1], reverse=True):\n...     print(k,v)\n...\nEthan 9\nHelen 8\nHolly 6\nIan 3\n
            \n soup wrap:
            >>> class1 = { 'Ethan':'9','Ian':'3','Helen':'8','Holly':'6' }
            >>> print(sorted(class1.items()))
            [('Ethan', '9'), ('Helen', '8'), ('Holly', '6'), ('Ian', '3')]
            

             

            >>> for k,v in sorted(class1.items()):
            ...     print(k, v)
            ...
            Ethan 9
            Helen 8
            Holly 6
            Ian 3
            

             

            >>> for k,v in sorted(class1.items(), key=lambda p:p[1]):
            ...     print(k,v)
            ...
            Ian 3
            Holly 6
            Helen 8
            Ethan 9
            
            >>> for k,v in sorted(class1.items(), key=lambda p:p[1], reverse=True):
            ...     print(k,v)
            ...
            Ethan 9
            Helen 8
            Holly 6
            Ian 3
            
            qid & accept id: (35611992, 35612149) query: recursive way to go through a nested list and remove all of a select value soup:

            Every time you return, you are quitting the function. Here is the updated code:

            \n
            def listcleaner(lst):\n    if not lst:   # If list is empty\n        return [] # Go no further\n    if isinstance(lst[0], list):\n        if lst[0]: # If the list has something in it, we want to run listcleaner() on it.\n            return [listcleaner(lst[0])] + listcleaner(lst[1:])\n        else: # Otherwise, just skip that list\n            return listcleaner(lst[1:])\n    else:\n        return [lst[0]] + listcleaner(lst[1:]) # If it is not a list, return it unchanged plus listcleaner() on the rest.\n\na = listcleaner([1, [], [2, []], 5]) \nprint(a)\n
            \n

            Output:

            \n
            [1, [2], 5]\n
            \n soup wrap:

            Every time you return, you are quitting the function. Here is the updated code:

            def listcleaner(lst):
                if not lst:   # If list is empty
                    return [] # Go no further
                if isinstance(lst[0], list):
                    if lst[0]: # If the list has something in it, we want to run listcleaner() on it.
                        return [listcleaner(lst[0])] + listcleaner(lst[1:])
                    else: # Otherwise, just skip that list
                        return listcleaner(lst[1:])
                else:
                    return [lst[0]] + listcleaner(lst[1:]) # If it is not a list, return it unchanged plus listcleaner() on the rest.
            
            a = listcleaner([1, [], [2, []], 5]) 
            print(a)
            

            Output:

            [1, [2], 5]
            
            qid & accept id: (35619038, 35619129) query: Generate random numbers without using the last n values in Python soup:

            Think of it the other way round.

            \n

            Instead of generating a random number and then checking if it is already generated before, you can generate the set of non-duplicate numbers first to be picked up one by one - thus removing the possibility of generating duplicate number at all.

            \n

            And you also need to track the last 5 items generated to exclude them from the picked items.

            \n

            Something like this will do:

            \n
            s = set(range(0, 100))\nlast5 = []\ndef get_next_number():\n    reduced_list = list(s - set(last5))\n    i = randint(0, len(reduced_list) - 1)\n    last5.append(reduced_list[i])\n    if len(last5) > 5:\n        last5.pop(0)\n    return reduced_list[i]\n
            \n

            To test:

            \n
            result = []\nfor i in range(0, 5000):\n    result.append(get_next_number())\nprint(result)\n
            \n
            \n

            Step-by-step explanations:

            \n
              \n
            1. Generate the set of numbers to be picked up (say, 0 to 99) and generate an empty list to store the last 5 picked numbers:

              \n
              s = set(range(0, 100))\nlast5 = []\n
            2. \n
            3. In the method, exclude the last 5 picked numbers from the possibility from being picked:

              \n
              reduced_list = list(s - set(last5))\n
            4. \n
            5. Pick random number from the reduced_list, all numbers left in the reduced_list is valid for picking. Append the number to the last5 list

              \n
              i = randint(0, len(reduced_list) - 1) #get any valid index. -1 is needed because randint upperbound is inclusive\nlast5.append(reduced_list[i]) #the number is as what it pointed by the index: reduced_list[i], append that number to the last 5 list\n
            6. \n
            7. Check if the last5 list already have members > 5. If it does, you need to remove its first member:

              \n
              if len(last5) > 5:\n    last5.pop(0)\n
            8. \n
            9. return you selected member:

              \n
              return reduced_list[i]\n
            10. \n
            \n soup wrap:

            Think of it the other way round.

            Instead of generating a random number and then checking if it is already generated before, you can generate the set of non-duplicate numbers first to be picked up one by one - thus removing the possibility of generating duplicate number at all.

            And you also need to track the last 5 items generated to exclude them from the picked items.

            Something like this will do:

            s = set(range(0, 100))
            last5 = []
            def get_next_number():
                reduced_list = list(s - set(last5))
                i = randint(0, len(reduced_list) - 1)
                last5.append(reduced_list[i])
                if len(last5) > 5:
                    last5.pop(0)
                return reduced_list[i]
            

            To test:

            result = []
            for i in range(0, 5000):
                result.append(get_next_number())
            print(result)
            

            Step-by-step explanations:

            1. Generate the set of numbers to be picked up (say, 0 to 99) and generate an empty list to store the last 5 picked numbers:

              s = set(range(0, 100))
              last5 = []
              
            2. In the method, exclude the last 5 picked numbers from the possibility from being picked:

              reduced_list = list(s - set(last5))
              
            3. Pick random number from the reduced_list, all numbers left in the reduced_list is valid for picking. Append the number to the last5 list

              i = randint(0, len(reduced_list) - 1) #get any valid index. -1 is needed because randint upperbound is inclusive
              last5.append(reduced_list[i]) #the number is as what it pointed by the index: reduced_list[i], append that number to the last 5 list
              
            4. Check if the last5 list already have members > 5. If it does, you need to remove its first member:

              if len(last5) > 5:
                  last5.pop(0)
              
            5. return you selected member:

              return reduced_list[i]
              
            qid & accept id: (35631192, 35631777) query: Element-wise constraints in scipy.optimize.minimize soup:

            The first constraint x > 0 can be expressed very simply:

            \n
            {'type':'ineq', 'fun': lambda x: x}\n
            \n

            The second constraint is an equality constraint, which COBYLA doesn't natively support. However, you could express it as two separate inequality constraints instead:

            \n
            {'type':'ineq', 'fun': lambda x: np.sum(x, 0) - 1}  # row sum >= 1\n{'type':'ineq', 'fun': lambda x: 1 - np.sum(x, 0)}  # row sum <= 1\n
            \n

            Otherwise you could try SLSQP instead, which does support equality constraints.

            \n soup wrap:

            The first constraint x > 0 can be expressed very simply:

            {'type':'ineq', 'fun': lambda x: x}
            

            The second constraint is an equality constraint, which COBYLA doesn't natively support. However, you could express it as two separate inequality constraints instead:

            {'type':'ineq', 'fun': lambda x: np.sum(x, 0) - 1}  # row sum >= 1
            {'type':'ineq', 'fun': lambda x: 1 - np.sum(x, 0)}  # row sum <= 1
            

            Otherwise you could try SLSQP instead, which does support equality constraints.

            qid & accept id: (35633421, 35664089) query: How to remove/omit smaller contour lines using matplotlib soup:

            General idea

            \n

            Your question seems to have 2 very different halves: one about omitting small contours, and another one about smoothing the contour lines. The latter is simpler, since I can't really think of anything else other than decreasing the resolution of your contour() call, just like you said.

            \n

            As for removing a few contour lines, here's a solution which is based on directly removing contour lines individually. You have to loop over the collections of the object returned by contour(), and for each element check each Path, and delete the ones you don't need. Redrawing the figure's canvas will get rid of the unnecessary lines:

            \n
            # dummy example based on matplotlib.pyplot.clabel example:\nimport matplotlib\nimport numpy as np\nimport matplotlib.cm as cm\nimport matplotlib.mlab as mlab\nimport matplotlib.pyplot as plt\n\ndelta = 0.025\nx = np.arange(-3.0, 3.0, delta)\ny = np.arange(-2.0, 2.0, delta)\nX, Y = np.meshgrid(x, y)\nZ1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)\nZ2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)\n# difference of Gaussians\nZ = 10.0 * (Z2 - Z1)\n\n\nplt.figure()\nCS = plt.contour(X, Y, Z)\n\nfor level in CS.collections:\n    for kp,path in reversed(list(enumerate(level.get_paths()))):\n        # go in reversed order due to deletions!\n\n        # include test for "smallness" of your choice here:\n        # I'm using a simple estimation for the diameter based on the\n        #    x and y diameter...\n        verts = path.vertices # (N,2)-shape array of contour line coordinates\n        diameter = np.max(verts.max(axis=0) - verts.min(axis=0))\n\n        if diameter<1: # threshold to be refined for your actual dimensions!\n            del(level.get_paths()[kp])  # no remove() for Path objects:(\n\n# this might be necessary on interactive sessions: redraw figure\nplt.gcf().canvas.draw()\n
            \n

            Here's the original(left) and the removed version(right) for a diameter threshold of 1 (note the little piece of the 0 level at the top):

            \n

            original for reference remove smaller than d=1

            \n

            Note that the top little line is removed while the huge cyan one in the middle doesn't, even though both correspond to the same collections element i.e. the same contour level. If we didn't want to allow this, we could've called CS.collections[k].remove(), which would probably be a much safer way of doing the same thing (but it wouldn't allow us to differentiate between multiple lines corresponding to the same contour level).

            \n

            To show that fiddling around with the cut-off diameter works as expected, here's the result for a threshold of 2:

            \n

            result with threshold of 2

            \n

            All in all it seems quite reasonable.

            \n
            \n

            Your actual case

            \n

            Since you've added your actual data, here's the application to your case. Note that you can directly generate the levels in a single line using np, which will almost give you the same result. The exact same can be achieved in 2 lines (generating an arange, then selecting those that fall between p1 and p2). Also, since you're setting levels in the call to contour, I believe the V=2 part of the function call has no effect.

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\n\n# insert actual data here...\nZ = np.loadtxt('mslp.txt',delimiter=',')\nX,Y = np.meshgrid(np.linspace(0,300000,Z.shape[1]),np.linspace(0,200000,Z.shape[0]))\np1,p2 = 1006,1018\n\n# this is almost the same as the original, although it will produce\n# [p1, p1+2, ...] instead of `[Z.min()+n, Z.min()+n+2, ...]`\nlevels = np.arange(np.maximum(Z.min(),p1),np.minimum(Z.max(),p2),2)\n\n\n#control\nplt.figure()\nCS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)\n\n\n#modified\nplt.figure()\nCS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)\n\nfor level in CS.collections:\n    for kp,path in reversed(list(enumerate(level.get_paths()))):\n        # go in reversed order due to deletions!\n\n        # include test for "smallness" of your choice here:\n        # I'm using a simple estimation for the diameter based on the\n        #    x and y diameter...\n        verts = path.vertices # (N,2)-shape array of contour line coordinates\n        diameter = np.max(verts.max(axis=0) - verts.min(axis=0))\n\n        if diameter<15000: # threshold to be refined for your actual dimensions!\n            del(level.get_paths()[kp])  # no remove() for Path objects:(\n\n# this might be necessary on interactive sessions: redraw figure\nplt.gcf().canvas.draw()\nplt.show()\n
            \n

            Results, original(left) vs new(right):

            \n

            before after

            \n
            \n

            Smoothing by resampling

            \n

            I've decided to tackle the smoothing problem as well. All I could come up with is downsampling your original data, then upsampling again using griddata (interpolation). The downsampling part could also be done with interpolation, although the small-scale variation in your input data might make this problem ill-posed. So here's the crude version:

            \n
            import scipy.interpolate as interp   #the new one\n\n# assume you have X,Y,Z,levels defined as before\n\n# start resampling stuff\ndN = 10 # use every dN'th element of the gridded input data\nmy_slice = [slice(None,None,dN),slice(None,None,dN)]\n\n# downsampled data\nX2,Y2,Z2 = X[my_slice],Y[my_slice],Z[my_slice]\n# same as X2 = X[::dN,::dN] etc.\n\n# upsampling with griddata over original mesh\nZsmooth = interp.griddata(np.array([X2.ravel(),Y2.ravel()]).T,Z2.ravel(),(X,Y),method='cubic')\n\n# plot\nplt.figure()\nCS = plt.contour(X, Y, Zsmooth, colors='b', linewidths=2, levels=levels)\n
            \n

            You can freely play around with the grids used for interpolation, in this case I just used the original mesh, as it was at hand. You can also play around with different kinds of interpolation: the default 'linear' one will be faster, but less smooth.

            \n

            Result after downsampling(left) and upsampling(right):

            \n

            after downsample after upsample

            \n

            Of course you should still apply the small-line-removal algorithm after this resampling business, and keep in mind that this heavily distorts your input data (since if it wasn't distorted, then it wouldn't be smooth). Also, note that due to the crude method used in the downsampling step, we introduce some missing values near the top/right edges of the region under consideraton. If this is a problem, you should consider doing the downsampling based on griddata as I've noted earlier.

            \n soup wrap:

            General idea

            Your question seems to have 2 very different halves: one about omitting small contours, and another one about smoothing the contour lines. The latter is simpler, since I can't really think of anything else other than decreasing the resolution of your contour() call, just like you said.

            As for removing a few contour lines, here's a solution which is based on directly removing contour lines individually. You have to loop over the collections of the object returned by contour(), and for each element check each Path, and delete the ones you don't need. Redrawing the figure's canvas will get rid of the unnecessary lines:

            # dummy example based on matplotlib.pyplot.clabel example:
            import matplotlib
            import numpy as np
            import matplotlib.cm as cm
            import matplotlib.mlab as mlab
            import matplotlib.pyplot as plt
            
            delta = 0.025
            x = np.arange(-3.0, 3.0, delta)
            y = np.arange(-2.0, 2.0, delta)
            X, Y = np.meshgrid(x, y)
            Z1 = mlab.bivariate_normal(X, Y, 1.0, 1.0, 0.0, 0.0)
            Z2 = mlab.bivariate_normal(X, Y, 1.5, 0.5, 1, 1)
            # difference of Gaussians
            Z = 10.0 * (Z2 - Z1)
            
            
            plt.figure()
            CS = plt.contour(X, Y, Z)
            
            for level in CS.collections:
                for kp,path in reversed(list(enumerate(level.get_paths()))):
                    # go in reversed order due to deletions!
            
                    # include test for "smallness" of your choice here:
                    # I'm using a simple estimation for the diameter based on the
                    #    x and y diameter...
                    verts = path.vertices # (N,2)-shape array of contour line coordinates
                    diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
            
                    if diameter<1: # threshold to be refined for your actual dimensions!
                        del(level.get_paths()[kp])  # no remove() for Path objects:(
            
            # this might be necessary on interactive sessions: redraw figure
            plt.gcf().canvas.draw()
            

            Here's the original(left) and the removed version(right) for a diameter threshold of 1 (note the little piece of the 0 level at the top):

            original for reference remove smaller than d=1

            Note that the top little line is removed while the huge cyan one in the middle doesn't, even though both correspond to the same collections element i.e. the same contour level. If we didn't want to allow this, we could've called CS.collections[k].remove(), which would probably be a much safer way of doing the same thing (but it wouldn't allow us to differentiate between multiple lines corresponding to the same contour level).

            To show that fiddling around with the cut-off diameter works as expected, here's the result for a threshold of 2:

            result with threshold of 2

            All in all it seems quite reasonable.


            Your actual case

            Since you've added your actual data, here's the application to your case. Note that you can directly generate the levels in a single line using np, which will almost give you the same result. The exact same can be achieved in 2 lines (generating an arange, then selecting those that fall between p1 and p2). Also, since you're setting levels in the call to contour, I believe the V=2 part of the function call has no effect.

            import numpy as np
            import matplotlib.pyplot as plt
            
            # insert actual data here...
            Z = np.loadtxt('mslp.txt',delimiter=',')
            X,Y = np.meshgrid(np.linspace(0,300000,Z.shape[1]),np.linspace(0,200000,Z.shape[0]))
            p1,p2 = 1006,1018
            
            # this is almost the same as the original, although it will produce
            # [p1, p1+2, ...] instead of `[Z.min()+n, Z.min()+n+2, ...]`
            levels = np.arange(np.maximum(Z.min(),p1),np.minimum(Z.max(),p2),2)
            
            
            #control
            plt.figure()
            CS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)
            
            
            #modified
            plt.figure()
            CS = plt.contour(X, Y, Z, colors='b', linewidths=2, levels=levels)
            
            for level in CS.collections:
                for kp,path in reversed(list(enumerate(level.get_paths()))):
                    # go in reversed order due to deletions!
            
                    # include test for "smallness" of your choice here:
                    # I'm using a simple estimation for the diameter based on the
                    #    x and y diameter...
                    verts = path.vertices # (N,2)-shape array of contour line coordinates
                    diameter = np.max(verts.max(axis=0) - verts.min(axis=0))
            
                    if diameter<15000: # threshold to be refined for your actual dimensions!
                        del(level.get_paths()[kp])  # no remove() for Path objects:(
            
            # this might be necessary on interactive sessions: redraw figure
            plt.gcf().canvas.draw()
            plt.show()
            

            Results, original(left) vs new(right):

            before after


            Smoothing by resampling

            I've decided to tackle the smoothing problem as well. All I could come up with is downsampling your original data, then upsampling again using griddata (interpolation). The downsampling part could also be done with interpolation, although the small-scale variation in your input data might make this problem ill-posed. So here's the crude version:

            import scipy.interpolate as interp   #the new one
            
            # assume you have X,Y,Z,levels defined as before
            
            # start resampling stuff
            dN = 10 # use every dN'th element of the gridded input data
            my_slice = [slice(None,None,dN),slice(None,None,dN)]
            
            # downsampled data
            X2,Y2,Z2 = X[my_slice],Y[my_slice],Z[my_slice]
            # same as X2 = X[::dN,::dN] etc.
            
            # upsampling with griddata over original mesh
            Zsmooth = interp.griddata(np.array([X2.ravel(),Y2.ravel()]).T,Z2.ravel(),(X,Y),method='cubic')
            
            # plot
            plt.figure()
            CS = plt.contour(X, Y, Zsmooth, colors='b', linewidths=2, levels=levels)
            

            You can freely play around with the grids used for interpolation, in this case I just used the original mesh, as it was at hand. You can also play around with different kinds of interpolation: the default 'linear' one will be faster, but less smooth.

            Result after downsampling(left) and upsampling(right):

            after downsample after upsample

            Of course you should still apply the small-line-removal algorithm after this resampling business, and keep in mind that this heavily distorts your input data (since if it wasn't distorted, then it wouldn't be smooth). Also, note that due to the crude method used in the downsampling step, we introduce some missing values near the top/right edges of the region under consideraton. If this is a problem, you should consider doing the downsampling based on griddata as I've noted earlier.

            qid & accept id: (35636806, 35638099) query: Authentication to use for user notifications using Crossbar/Autobahn? soup:

            Use authorizer.

            \n

            See http://crossbar.io/docs/Authorization/#dynamic-authorization

            \n

            Register a dynamic authorizer for the user role that session was assigned when joining/authenticating:

            \n
                       {\n              "name": "authorizer",\n              "permissions": [\n                {\n                  "uri": "com.example.authorize",\n                  "register": true\n                }\n              ]\n            },\n            {\n              "name": "authenticator",\n              "permissions": [\n                {\n                  "uri": "com.example.authenticate",\n                  "register": true\n                }\n              ]\n            },\n            {\n              "name": "user",\n              "authorizer": "com.example.authorize"\n            },\n...\n"components": [\n    {\n      "type": "class",\n      "classname": "example.AuthenticatorSession",\n      "realm": "realm1",\n      "role": "authenticator",\n      "extra": {\n        "backend_base_url": "http://localhost:8080/ws"\n      }\n    },\n    {\n      "type": "class",\n      "classname": "example.AuthorizerSession",\n      "realm": "realm1",\n      "role": "authorizer"\n    }\n  ]\n
            \n

            Write a class

            \n
            class AuthorizerSession(ApplicationSession):\n    @inlineCallbacks\n    def onJoin(self, details):\n        print("In AuthorizerSession.onJoin({})".format(details))\n        try:\n            yield self.register(self.authorize, 'com.example.authorize')\n            print("AuthorizerSession: authorizer registered")\n        except Exception as e:\n            print("AuthorizerSession: failed to register authorizer procedure ({})".format(e))\n\n    def authorize(self, session, uri, action):\n        print("AuthorizerSession.authorize({}, {}, {})".format(session, uri, action))\n        if session['authrole'] == u'backend':  # backnend can do whatever\n            return True\n        [Authorization logic here]\n        return authorized\n
            \n soup wrap:

            Use authorizer.

            See http://crossbar.io/docs/Authorization/#dynamic-authorization

            Register a dynamic authorizer for the user role that session was assigned when joining/authenticating:

                       {
                          "name": "authorizer",
                          "permissions": [
                            {
                              "uri": "com.example.authorize",
                              "register": true
                            }
                          ]
                        },
                        {
                          "name": "authenticator",
                          "permissions": [
                            {
                              "uri": "com.example.authenticate",
                              "register": true
                            }
                          ]
                        },
                        {
                          "name": "user",
                          "authorizer": "com.example.authorize"
                        },
            ...
            "components": [
                {
                  "type": "class",
                  "classname": "example.AuthenticatorSession",
                  "realm": "realm1",
                  "role": "authenticator",
                  "extra": {
                    "backend_base_url": "http://localhost:8080/ws"
                  }
                },
                {
                  "type": "class",
                  "classname": "example.AuthorizerSession",
                  "realm": "realm1",
                  "role": "authorizer"
                }
              ]
            

            Write a class

            class AuthorizerSession(ApplicationSession):
                @inlineCallbacks
                def onJoin(self, details):
                    print("In AuthorizerSession.onJoin({})".format(details))
                    try:
                        yield self.register(self.authorize, 'com.example.authorize')
                        print("AuthorizerSession: authorizer registered")
                    except Exception as e:
                        print("AuthorizerSession: failed to register authorizer procedure ({})".format(e))
            
                def authorize(self, session, uri, action):
                    print("AuthorizerSession.authorize({}, {}, {})".format(session, uri, action))
                    if session['authrole'] == u'backend':  # backnend can do whatever
                        return True
                    [Authorization logic here]
                    return authorized
            
            qid & accept id: (35656186, 35657071) query: Sun Grid Engine, force one job per node soup:

            Step 1: Add a complex values to your cluster. Run

            \n
            qconf -mc\n
            \n

            Add a line like

            \n
            exclusive        excl      INT         <=    YES         YES        0        0\n
            \n

            Step 2: For each of your nodes, define a value for that complex value.

            \n
            qconf -rattr exechost complex_values exclusive=1 \n
            \n

            Here we set exclusive to 1. Then, when you launch jobs, request "1" of that resource. Eg.:

            \n
            qrsh -l exclusive=1 \n
            \n

            If you were willing to have 2 jobs per node, you could define that value to 2 at step 2.

            \n

            EDIT: This is how to configure it per node. You could have done it for the entire cluster in step 1 by setting the value into the "default" column to 1.

            \n soup wrap:

            Step 1: Add a complex values to your cluster. Run

            qconf -mc
            

            Add a line like

            exclusive        excl      INT         <=    YES         YES        0        0
            

            Step 2: For each of your nodes, define a value for that complex value.

            qconf -rattr exechost complex_values exclusive=1 
            

            Here we set exclusive to 1. Then, when you launch jobs, request "1" of that resource. Eg.:

            qrsh -l exclusive=1 
            

            If you were willing to have 2 jobs per node, you could define that value to 2 at step 2.

            EDIT: This is how to configure it per node. You could have done it for the entire cluster in step 1 by setting the value into the "default" column to 1.

            qid & accept id: (35664103, 35664122) query: Iterator that supports pushback soup:
            class PushbackWrapper(object):\n\n    def __init__(self, iterator):\n        self.__dict__['_iterator'] = iterator\n        self.__dict__['_pushed'] = []\n\n    def next(self):\n        if len(self._pushed):\n            return self._pushed.pop()\n        else:\n            return self._iterator.next()\n\n    def pushback(self, item):\n        self._pushed.append(item)\n\n    def __getattr__(self, attr):\n        return getattr(self._iterator, attr)\n\n    def __setattr__(self, attr, value):\n        return setattr(self._iterator, attr, value)\n
            \n

            To use it:

            \n
            pushback_enabled_iterator = PushbackWrapper(original_iterator)\n\nitem = next(pushback_enabled_iterator)\nif went_too_far(item):\n    pushback_enabled_iterator.pushback(item)\n    break;\n
            \n soup wrap:
            class PushbackWrapper(object):
            
                def __init__(self, iterator):
                    self.__dict__['_iterator'] = iterator
                    self.__dict__['_pushed'] = []
            
                def next(self):
                    if len(self._pushed):
                        return self._pushed.pop()
                    else:
                        return self._iterator.next()
            
                def pushback(self, item):
                    self._pushed.append(item)
            
                def __getattr__(self, attr):
                    return getattr(self._iterator, attr)
            
                def __setattr__(self, attr, value):
                    return setattr(self._iterator, attr, value)
            

            To use it:

            pushback_enabled_iterator = PushbackWrapper(original_iterator)
            
            item = next(pushback_enabled_iterator)
            if went_too_far(item):
                pushback_enabled_iterator.pushback(item)
                break;
            
            qid & accept id: (35667931, 35667956) query: How to transform a pair of values into a sorted unique array? soup:

            Create the set union of all tuples, then sort the result:

            \n
            sorted(set().union(*input_list))\n
            \n

            Demo:

            \n
            >>> input_list = [(196, 128), (196, 128), (196, 128), (128, 196),\n...  (196, 128), (128, 196), (128, 196), (196, 128),\n...  (128, 196), (128, 196)]\n>>> sorted(set().union(*input_list))\n[128, 196]\n
            \n soup wrap:

            Create the set union of all tuples, then sort the result:

            sorted(set().union(*input_list))
            

            Demo:

            >>> input_list = [(196, 128), (196, 128), (196, 128), (128, 196),
            ...  (196, 128), (128, 196), (128, 196), (196, 128),
            ...  (128, 196), (128, 196)]
            >>> sorted(set().union(*input_list))
            [128, 196]
            
            qid & accept id: (35668472, 35668575) query: How can i search a array from a large array by numpy soup:

            Assuming the inputs are NumPy arrays and that there are no duplicates within each row of A, here's an approach using np.in1d -

            \n
            A[np.in1d(A,B).reshape(A.shape).sum(1) == len(B)]\n
            \n

            Explanation -

            \n
              \n
            1. Get a mask of matches in A against any element in B with np.in1d(A,B). Note that this would be a 1D boolean array.

            2. \n
            3. Reshape the boolean array obtained from np.in1d(A,B) to A's shape and then look for rows that have n matches for each row, where n is the number of elements in B. Since, there are unique elements within each row, the rows with n matches are the rows we want in the final output.

            4. \n
            5. Therefore, sum the 2D reshaped boolean array along the rows and compare against n giving us a boolean mask, which when indexed into A would give us selective rows from it as the desired output.

            6. \n
            \n

            Sample run -

            \n
            In [23]: A\nOut[23]: \narray([['03', '04', '18', '22', '25', '29', '30'],\n       ['02', '04', '07', '09', '14', '29', '30'],\n       ['06', '08', '11', '13', '17', '19', '30'],\n       ['04', '08', '22', '23', '27', '29', '30'],\n       ['03', '05', '15', '22', '24', '25', '30']], \n      dtype='|S2')\n\nIn [24]: B\nOut[24]: \narray(['04', '22'], \n      dtype='|S2')\n\nIn [25]: A[np.in1d(A,B).reshape(A.shape).sum(1) == len(B)]\nOut[25]: \narray([['03', '04', '18', '22', '25', '29', '30'],\n       ['04', '08', '22', '23', '27', '29', '30']], \n      dtype='|S2')\n
            \n soup wrap:

            Assuming the inputs are NumPy arrays and that there are no duplicates within each row of A, here's an approach using np.in1d -

            A[np.in1d(A,B).reshape(A.shape).sum(1) == len(B)]
            

            Explanation -

            1. Get a mask of matches in A against any element in B with np.in1d(A,B). Note that this would be a 1D boolean array.

            2. Reshape the boolean array obtained from np.in1d(A,B) to A's shape and then look for rows that have n matches for each row, where n is the number of elements in B. Since, there are unique elements within each row, the rows with n matches are the rows we want in the final output.

            3. Therefore, sum the 2D reshaped boolean array along the rows and compare against n giving us a boolean mask, which when indexed into A would give us selective rows from it as the desired output.

            Sample run -

            In [23]: A
            Out[23]: 
            array([['03', '04', '18', '22', '25', '29', '30'],
                   ['02', '04', '07', '09', '14', '29', '30'],
                   ['06', '08', '11', '13', '17', '19', '30'],
                   ['04', '08', '22', '23', '27', '29', '30'],
                   ['03', '05', '15', '22', '24', '25', '30']], 
                  dtype='|S2')
            
            In [24]: B
            Out[24]: 
            array(['04', '22'], 
                  dtype='|S2')
            
            In [25]: A[np.in1d(A,B).reshape(A.shape).sum(1) == len(B)]
            Out[25]: 
            array([['03', '04', '18', '22', '25', '29', '30'],
                   ['04', '08', '22', '23', '27', '29', '30']], 
                  dtype='|S2')
            
            qid & accept id: (35678083, 35678287) query: Pandas: Delete rows of a DataFrame if total count of a particular column occurs only 1 time soup:

            You can do this by creating a boolean list/array by either list comprehensions or using DataFrame's string manipulation methods.

            \n

            The list comprehension approach is:

            \n
            vc = df['Series'].value_counts()\nu  = [i not in set(vc[vc==1].index) for i in df['Series']]\ndf = df[u]\n
            \n

            The other approach is to use the str.contains method to check whether the values of the Series column contain a given string or match a given regular expression (used in this case as we are using multiple strings):

            \n
            vc  = df['Series'].value_counts()\npat = r'|'.join(vc[vc==1].index)          #Regular expression\ndf  = df[~df['Series'].str.contains(pat)] #Tilde is to negate boolean\n
            \n

            Using this regular expressions approach is a bit more hackish and may require some extra processing (character escaping, etc) on pat in case you have regex metacharacters in the strings you want to filter out (which requires some basic regex knowledge). However, it's worth noting this approach is about 4x faster than using the list comprehension approach (tested on the data provided in the question).

            \n

            As a side note, I recommend avoiding using the word Series as a column name as that's the name of a pandas object.

            \n soup wrap:

            You can do this by creating a boolean list/array by either list comprehensions or using DataFrame's string manipulation methods.

            The list comprehension approach is:

            vc = df['Series'].value_counts()
            u  = [i not in set(vc[vc==1].index) for i in df['Series']]
            df = df[u]
            

            The other approach is to use the str.contains method to check whether the values of the Series column contain a given string or match a given regular expression (used in this case as we are using multiple strings):

            vc  = df['Series'].value_counts()
            pat = r'|'.join(vc[vc==1].index)          #Regular expression
            df  = df[~df['Series'].str.contains(pat)] #Tilde is to negate boolean
            

            Using this regular expressions approach is a bit more hackish and may require some extra processing (character escaping, etc) on pat in case you have regex metacharacters in the strings you want to filter out (which requires some basic regex knowledge). However, it's worth noting this approach is about 4x faster than using the list comprehension approach (tested on the data provided in the question).

            As a side note, I recommend avoiding using the word Series as a column name as that's the name of a pandas object.

            qid & accept id: (35707475, 35707677) query: Sum of calculation in a variable soup:

            This can help you.....

            \n
            def calc(x=0, y=0, z=0):\n    expression = raw_input('Enter an expression: ')\n\n    return eval(expression, None, locals())\n
            \n

            Example:

            \n
            >>> calc()\nEnter an expression: 8 + 5 - 7\n6\n
            \n soup wrap:

            This can help you.....

            def calc(x=0, y=0, z=0):
                expression = raw_input('Enter an expression: ')
            
                return eval(expression, None, locals())
            

            Example:

            >>> calc()
            Enter an expression: 8 + 5 - 7
            6
            
            qid & accept id: (35720234, 35721656) query: Pivotting via Python and Pandas soup:

            You can use concatwith str.get_dummies:

            \n
            print pd.concat([df['ID'], df['Word'].str.get_dummies()], axis=1)\n   ID  and  it  long  road  take  the  walk\n0   1    0   0     0     0     1    0     0\n1   2    0   0     0     0     0    1     0\n2   3    0   0     1     0     0    0     0\n3   4    0   0     1     0     0    0     0\n4   5    0   0     0     1     0    0     0\n5   6    1   0     0     0     0    0     0\n6   7    0   0     0     0     0    0     1\n7   8    0   1     0     0     0    0     0\n8   9    0   0     0     0     0    0     1\n9  10    0   1     0     0     0    0     0\n
            \n

            Or as Edchum mentioned in comments - pd.get_dummies:

            \n
            print pd.concat([df['ID'], pd.get_dummies(df['Word'])], axis=1)\n   ID  and  it  long  road  take  the  walk\n0   1    0   0     0     0     1    0     0\n1   2    0   0     0     0     0    1     0\n2   3    0   0     1     0     0    0     0\n3   4    0   0     1     0     0    0     0\n4   5    0   0     0     1     0    0     0\n5   6    1   0     0     0     0    0     0\n6   7    0   0     0     0     0    0     1\n7   8    0   1     0     0     0    0     0\n8   9    0   0     0     0     0    0     1\n9  10    0   1     0     0     0    0     0\n
            \n soup wrap:

            You can use concatwith str.get_dummies:

            print pd.concat([df['ID'], df['Word'].str.get_dummies()], axis=1)
               ID  and  it  long  road  take  the  walk
            0   1    0   0     0     0     1    0     0
            1   2    0   0     0     0     0    1     0
            2   3    0   0     1     0     0    0     0
            3   4    0   0     1     0     0    0     0
            4   5    0   0     0     1     0    0     0
            5   6    1   0     0     0     0    0     0
            6   7    0   0     0     0     0    0     1
            7   8    0   1     0     0     0    0     0
            8   9    0   0     0     0     0    0     1
            9  10    0   1     0     0     0    0     0
            

            Or as Edchum mentioned in comments - pd.get_dummies:

            print pd.concat([df['ID'], pd.get_dummies(df['Word'])], axis=1)
               ID  and  it  long  road  take  the  walk
            0   1    0   0     0     0     1    0     0
            1   2    0   0     0     0     0    1     0
            2   3    0   0     1     0     0    0     0
            3   4    0   0     1     0     0    0     0
            4   5    0   0     0     1     0    0     0
            5   6    1   0     0     0     0    0     0
            6   7    0   0     0     0     0    0     1
            7   8    0   1     0     0     0    0     0
            8   9    0   0     0     0     0    0     1
            9  10    0   1     0     0     0    0     0
            
            qid & accept id: (35720330, 35720457) query: Getting specific field from chosen Row in Pyspark DataFrame soup:

            Just filter and select:

            \n
            result = users_df.where(users_df._id == chosen_user).select("gender")\n
            \n

            or with col

            \n
            from pyspark.sql.functions import col\n\nresult = users_df.where(col("_id") == chosen_user).select(col("gender"))\n
            \n

            Finally PySpark Row is just a tuple with some extensions so you can for example flatMap:

            \n
            result.rdd.flatMap(list).first()\n
            \n

            or map with something like this:

            \n
            result.rdd.map(lambda x: x.gender).first()\n
            \n soup wrap:

            Just filter and select:

            result = users_df.where(users_df._id == chosen_user).select("gender")
            

            or with col

            from pyspark.sql.functions import col
            
            result = users_df.where(col("_id") == chosen_user).select(col("gender"))
            

            Finally PySpark Row is just a tuple with some extensions so you can for example flatMap:

            result.rdd.flatMap(list).first()
            

            or map with something like this:

            result.rdd.map(lambda x: x.gender).first()
            
            qid & accept id: (35734026, 35735195) query: Numpy drawing from urn soup:

            What you want is an implementation of the multivariate hypergeometric distribution. I don't know of one in numpy or scipy, but it might already exist out there somewhere.

            \n

            You can implement it using repeated calls to numpy.random.hypergeometric. Whether that will be more efficient than your implementation depends on how many colors there are and how many balls of each color.

            \n

            For example, here's a script that prints the result of drawing from an urn containing three colors (red, green and blue):

            \n
            from __future__ import print_function\n\nimport numpy as np\n\n\nnred = 12\nngreen = 4\nnblue = 18\n\nm = 15\n\nred = np.random.hypergeometric(nred, ngreen + nblue, m)\ngreen = np.random.hypergeometric(ngreen, nblue, m - red)\nblue = m - (red + green)\n\nprint("red:   %2i" % red)\nprint("green: %2i" % green)\nprint("blue:  %2i" % blue)\n
            \n

            Sample output:

            \n
            red:    6\ngreen:  1\nblue:   8\n
            \n

            The following function generalizes that to choosing m balls given an array colors holding the number of each color:

            \n
            def sample(m, colors):\n    """\n    Parameters\n    ----------\n    m : number balls to draw from the urn\n    colors : one-dimensional array of number balls of each color in the urn\n\n    Returns\n    -------\n    One-dimensional array with the same length as `colors` containing the\n    number of balls of each color in a random sample.\n    """\n\n    remaining = np.cumsum(colors[::-1])[::-1]\n    result = np.zeros(len(colors), dtype=np.int)\n    for i in range(len(colors)-1):\n        if m < 1:\n            break\n        result[i] = np.random.hypergeometric(colors[i], remaining[i+1], m)\n        m -= result[i]\n    result[-1] = m\n    return result\n
            \n

            For example,

            \n
            >>> sample(10, [2, 4, 8, 16])\narray([2, 3, 1, 4])\n
            \n soup wrap:

            What you want is an implementation of the multivariate hypergeometric distribution. I don't know of one in numpy or scipy, but it might already exist out there somewhere.

            You can implement it using repeated calls to numpy.random.hypergeometric. Whether that will be more efficient than your implementation depends on how many colors there are and how many balls of each color.

            For example, here's a script that prints the result of drawing from an urn containing three colors (red, green and blue):

            from __future__ import print_function
            
            import numpy as np
            
            
            nred = 12
            ngreen = 4
            nblue = 18
            
            m = 15
            
            red = np.random.hypergeometric(nred, ngreen + nblue, m)
            green = np.random.hypergeometric(ngreen, nblue, m - red)
            blue = m - (red + green)
            
            print("red:   %2i" % red)
            print("green: %2i" % green)
            print("blue:  %2i" % blue)
            

            Sample output:

            red:    6
            green:  1
            blue:   8
            

            The following function generalizes that to choosing m balls given an array colors holding the number of each color:

            def sample(m, colors):
                """
                Parameters
                ----------
                m : number balls to draw from the urn
                colors : one-dimensional array of number balls of each color in the urn
            
                Returns
                -------
                One-dimensional array with the same length as `colors` containing the
                number of balls of each color in a random sample.
                """
            
                remaining = np.cumsum(colors[::-1])[::-1]
                result = np.zeros(len(colors), dtype=np.int)
                for i in range(len(colors)-1):
                    if m < 1:
                        break
                    result[i] = np.random.hypergeometric(colors[i], remaining[i+1], m)
                    m -= result[i]
                result[-1] = m
                return result
            

            For example,

            >>> sample(10, [2, 4, 8, 16])
            array([2, 3, 1, 4])
            
            qid & accept id: (35774261, 35779309) query: Read a dense matrix from a file directly into a sparse numpy array? soup:

            loadtxt works with an open file, or any iterable that gives it lines.

            \n

            So one option is to open the file, and perform loadtxt on blocks of lines. Then convert the resulting array to sparse. Collect those sparse matrices into a list, and use the block format to assemble them into one matrix.

            \n

            I haven't used the block format much, but I think it will handle this task correctly. Under the cover block collects the coo attributes (data, rows, cols) of each of the blocks, joins them into 3 master coo attributes.

            \n

            Under the cover loadtxt just reads each line, parses it into an array or list; collects all those lines into a list, and finally passes that nested list to np.array().

            \n

            So you could read each line, parse it into a list or array of values, find the nonzero values, and assemble relevant coo arrays.

            \n

            Large sparse matrices are often created by assembling the data,i, j 1d arrays, and then calling coo_matrix((data,(i,j)),...). One way or other that's what you need to do with this CSV data.

            \n
            \n

            Here's a line by line approach, which is equivalent to using loadtxt on 1 line chunks:

            \n

            A test text list, equivalent to a file:

            \n
            In [840]: txt=b"""1,0,0,2,3\n0,0,0,0,0\n4,0,0,0,0\n0,0,0,3,0\n""".splitlines()\nIn [841]: \nIn [841]: np.loadtxt(txt,delimiter=',',dtype=int)\nOut[841]: \narray([[1, 0, 0, 2, 3],\n       [0, 0, 0, 0, 0],\n       [4, 0, 0, 0, 0],\n       [0, 0, 0, 3, 0]])\n
            \n

            Process it line by line

            \n
            In [842]: ll=[]\nIn [843]: for line in txt:\n    ll.append(np.loadtxt([line],delimiter=','))\n   .....:     \nIn [844]: ll\nOut[844]: \n[array([ 1.,  0.,  0.,  2.,  3.]),\n array([ 0.,  0.,  0.,  0.,  0.]),\n array([ 4.,  0.,  0.,  0.,  0.]),\n array([ 0.,  0.,  0.,  3.,  0.])]\n
            \n

            Now turn each array into a coo matrix:

            \n
            In [845]: lc=[[sparse.coo_matrix(l)] for l in ll]\nIn [846]: lc\nOut[846]: \n[[<1x5 sparse matrix of type ''\n    with 3 stored elements in COOrdinate format>],\n [<1x5 sparse matrix of type ''\n    with 0 stored elements in COOrdinate format>],\n [<1x5 sparse matrix of type ''\n    with 1 stored elements in COOrdinate format>],\n [<1x5 sparse matrix of type ''\n    with 1 stored elements in COOrdinate format>]]\n
            \n

            and assemble the list with bmat (a 'cover' for bsr_matrix):

            \n
            In [847]: B=sparse.bmat(lc)\nIn [848]: B\nOut[848]: \n<4x5 sparse matrix of type ''\n    with 5 stored elements in COOrdinate format>\nIn [849]: B.A\nOut[849]: \narray([[ 1.,  0.,  0.,  2.,  3.],\n       [ 0.,  0.,  0.,  0.,  0.],\n       [ 4.,  0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  3.,  0.]])\n
            \n

            sparse.coo_matrix(l) is just an easy way of compressing each line to bmat compatible objects.

            \n

            To process the text in 2 line chunks:

            \n
            In [874]: ld=[]\nIn [875]: for i in range(0,4,2):\n    arr = np.loadtxt(txt[i:i+2],delimiter=',')\n    ld.append([sparse.coo_matrix(arr)])\n   .....:     \nIn [876]: ld\nOut[876]: \n[[<2x5 sparse matrix of type ''\n    with 3 stored elements in COOrdinate format>],\n [<2x5 sparse matrix of type ''\n    with 2 stored elements in COOrdinate format>]]\n
            \n

            which feeds sparse.bmat just like before.

            \n soup wrap:

            loadtxt works with an open file, or any iterable that gives it lines.

            So one option is to open the file, and perform loadtxt on blocks of lines. Then convert the resulting array to sparse. Collect those sparse matrices into a list, and use the block format to assemble them into one matrix.

            I haven't used the block format much, but I think it will handle this task correctly. Under the cover block collects the coo attributes (data, rows, cols) of each of the blocks, joins them into 3 master coo attributes.

            Under the cover loadtxt just reads each line, parses it into an array or list; collects all those lines into a list, and finally passes that nested list to np.array().

            So you could read each line, parse it into a list or array of values, find the nonzero values, and assemble relevant coo arrays.

            Large sparse matrices are often created by assembling the data,i, j 1d arrays, and then calling coo_matrix((data,(i,j)),...). One way or other that's what you need to do with this CSV data.


            Here's a line by line approach, which is equivalent to using loadtxt on 1 line chunks:

            A test text list, equivalent to a file:

            In [840]: txt=b"""1,0,0,2,3
            0,0,0,0,0
            4,0,0,0,0
            0,0,0,3,0
            """.splitlines()
            In [841]: 
            In [841]: np.loadtxt(txt,delimiter=',',dtype=int)
            Out[841]: 
            array([[1, 0, 0, 2, 3],
                   [0, 0, 0, 0, 0],
                   [4, 0, 0, 0, 0],
                   [0, 0, 0, 3, 0]])
            

            Process it line by line

            In [842]: ll=[]
            In [843]: for line in txt:
                ll.append(np.loadtxt([line],delimiter=','))
               .....:     
            In [844]: ll
            Out[844]: 
            [array([ 1.,  0.,  0.,  2.,  3.]),
             array([ 0.,  0.,  0.,  0.,  0.]),
             array([ 4.,  0.,  0.,  0.,  0.]),
             array([ 0.,  0.,  0.,  3.,  0.])]
            

            Now turn each array into a coo matrix:

            In [845]: lc=[[sparse.coo_matrix(l)] for l in ll]
            In [846]: lc
            Out[846]: 
            [[<1x5 sparse matrix of type ''
                with 3 stored elements in COOrdinate format>],
             [<1x5 sparse matrix of type ''
                with 0 stored elements in COOrdinate format>],
             [<1x5 sparse matrix of type ''
                with 1 stored elements in COOrdinate format>],
             [<1x5 sparse matrix of type ''
                with 1 stored elements in COOrdinate format>]]
            

            and assemble the list with bmat (a 'cover' for bsr_matrix):

            In [847]: B=sparse.bmat(lc)
            In [848]: B
            Out[848]: 
            <4x5 sparse matrix of type ''
                with 5 stored elements in COOrdinate format>
            In [849]: B.A
            Out[849]: 
            array([[ 1.,  0.,  0.,  2.,  3.],
                   [ 0.,  0.,  0.,  0.,  0.],
                   [ 4.,  0.,  0.,  0.,  0.],
                   [ 0.,  0.,  0.,  3.,  0.]])
            

            sparse.coo_matrix(l) is just an easy way of compressing each line to bmat compatible objects.

            To process the text in 2 line chunks:

            In [874]: ld=[]
            In [875]: for i in range(0,4,2):
                arr = np.loadtxt(txt[i:i+2],delimiter=',')
                ld.append([sparse.coo_matrix(arr)])
               .....:     
            In [876]: ld
            Out[876]: 
            [[<2x5 sparse matrix of type ''
                with 3 stored elements in COOrdinate format>],
             [<2x5 sparse matrix of type ''
                with 2 stored elements in COOrdinate format>]]
            

            which feeds sparse.bmat just like before.

            qid & accept id: (35775207, 35777386) query: Remove unnecessary whitespace from Jinja rendered template soup:

            Jinja has multiple ways to control whitespace. It does not have a way to prettify output, you have to manually make sure everything looks "nice".

            \n

            The broadest solution is to set trim_blocks and lstrip_blocks on the env.

            \n
            app.jinja_env.trim_blocks = True\napp.jinja_env.lstrip_blocks = True\n
            \n

            If you want to keep a newline at the end of the file, set strip_trailing_newlines = False.

            \n

            You can use control characters to modify how the whitespace around a block works. - always removes whitespace, + always preserves it, overriding the env settings for that block. The character can go at the beginning or end (or both) of a block to control the whitespace in that direction.

            \n
            {%- if ... %} strips before\n{% if ... +%} preserves after\n{%+ if ... -%} preserves before and strips after\nremember that `{% endif %}` is treated separately\n
            \n

            Note that the control characters only apply to templates you write. If you include a template or use a macro from a 3rd party, however they wrote the template will apply to that part.

            \n soup wrap:

            Jinja has multiple ways to control whitespace. It does not have a way to prettify output, you have to manually make sure everything looks "nice".

            The broadest solution is to set trim_blocks and lstrip_blocks on the env.

            app.jinja_env.trim_blocks = True
            app.jinja_env.lstrip_blocks = True
            

            If you want to keep a newline at the end of the file, set strip_trailing_newlines = False.

            You can use control characters to modify how the whitespace around a block works. - always removes whitespace, + always preserves it, overriding the env settings for that block. The character can go at the beginning or end (or both) of a block to control the whitespace in that direction.

            {%- if ... %} strips before
            {% if ... +%} preserves after
            {%+ if ... -%} preserves before and strips after
            remember that `{% endif %}` is treated separately
            

            Note that the control characters only apply to templates you write. If you include a template or use a macro from a 3rd party, however they wrote the template will apply to that part.

            qid & accept id: (35781083, 35781208) query: python- combining list and making them a dictionary soup:

            A solution that should work for an arbitrary length of one:

            \n
            d = {}\n\n# Create a 1-to-1 mapping for the first n-1 items in `one`\nfor i in one[:-1]:\n    d[i] = [elements.pop(0)]\n\n# Append the remainder of `elements`\nd[one[-1]] = [elements]\n
            \n

            Or with a dict-comp:

            \n
            d = {i:[elements.pop(0)] for i in one[:-1]}.\nd[one[-1]] = [elements]\n
            \n

            Or in one nice easy-to-read line:

            \n
            d = {i:[elements.pop(0)] for i in one[:-1]}.update({one[-1]:[elements]})\n
            \n soup wrap:

            A solution that should work for an arbitrary length of one:

            d = {}
            
            # Create a 1-to-1 mapping for the first n-1 items in `one`
            for i in one[:-1]:
                d[i] = [elements.pop(0)]
            
            # Append the remainder of `elements`
            d[one[-1]] = [elements]
            

            Or with a dict-comp:

            d = {i:[elements.pop(0)] for i in one[:-1]}.
            d[one[-1]] = [elements]
            

            Or in one nice easy-to-read line:

            d = {i:[elements.pop(0)] for i in one[:-1]}.update({one[-1]:[elements]})
            
            qid & accept id: (35787294, 35791136) query: expose C++ function to python soup:

            To expose constructor it needs to be passed to class_ instead of def:

            \n

            class_("A", init()).

            \n

            def is to be used for additional contructors, see docs.

            \n

            To expose vector use vector_indexing_suite.

            \n

            Complete example:

            \n
            #include \n#include \n#include \n\nstruct A {\n    A(int x, int y) :a(x), b(y) {}\n    int a;\n    int b;\n\n    bool operator==(const A& data)\n    {\n        return this->a == data.a && this->b == data.b;\n    }\n};\nstd::vector get_a(const A& a1, const A& a2)\n{\n    const std::vector ret = { a1,a2 };\n    return ret;\n}\n\nBOOST_PYTHON_MODULE(hello) \n{\n    using namespace boost::python;\n\n    class_ >("vecA")\n        .def(vector_indexing_suite>())\n        ;\n\n    class_("A", init())\n        .def_readwrite("a", &A::a)\n        .def_readwrite("b", &A::b);\n    def("get_a", get_a);\n}\n
            \n

            Test script:

            \n
            import hello\n\na1 = hello.A(1,2)\na2 = hello.A(3,4)\nret = hello.get_a(a1, a2)\nprint "size:", len(ret)\nprint "values:"\nfor x in ret:\n  print x.a, x.b \n
            \n

            Output:

            \n
            size: 2\nvalues:\n1 2\n3 4\n
            \n soup wrap:

            To expose constructor it needs to be passed to class_ instead of def:

            class_("A", init()).

            def is to be used for additional contructors, see docs.

            To expose vector use vector_indexing_suite.

            Complete example:

            #include 
            #include 
            #include 
            
            struct A {
                A(int x, int y) :a(x), b(y) {}
                int a;
                int b;
            
                bool operator==(const A& data)
                {
                    return this->a == data.a && this->b == data.b;
                }
            };
            std::vector get_a(const A& a1, const A& a2)
            {
                const std::vector ret = { a1,a2 };
                return ret;
            }
            
            BOOST_PYTHON_MODULE(hello) 
            {
                using namespace boost::python;
            
                class_ >("vecA")
                    .def(vector_indexing_suite>())
                    ;
            
                class_("A", init())
                    .def_readwrite("a", &A::a)
                    .def_readwrite("b", &A::b);
                def("get_a", get_a);
            }
            

            Test script:

            import hello
            
            a1 = hello.A(1,2)
            a2 = hello.A(3,4)
            ret = hello.get_a(a1, a2)
            print "size:", len(ret)
            print "values:"
            for x in ret:
              print x.a, x.b 
            

            Output:

            size: 2
            values:
            1 2
            3 4
            
            qid & accept id: (35805891, 35806234) query: How to get only even numbers from list soup:

            I would split it into two functions: one which checks if a list contains only even numbers, and the other one is your main function (I renamed it to get_even_lists()), which gets all the even lists from a list of lists:

            \n
            def only_even_elements(l):\n    """ (list of int) -> bool\n\n    Return a whether a list contains only even integers.\n\n    >>> only_even_elements([1, 2, 4])  # 1 is not even\n    False\n    """\n    for e in l:\n        if e % 2 == 1:\n            return False\n    return True\n\ndef get_even_lists(lst):\n    """ (list of list of int) -> list of list of int\n\n    Return a list of the lists in lst that contain only even integers. \n\n    >>> only_evens([[1, 2, 4], [4, 0, 6], [22, 4, 3], [2]])\n    [[4, 0, 6], [2]]\n    """\n    # return [l for l in lst if only_even_elements(l)]\n    even_lists = []\n    for sublist in lst:\n        if only_even_elements(sublist):\n            even_lists.append(sublist)\n    return even_lists\n
            \n

            Although, this could be done with for/else:

            \n
            def get_even_lists(lst):\n    """ (list of list of int) -> list of list of int\n\n    Return a list of the lists in lst that contain only even integers. \n\n    >>> only_evens([[1, 2, 4], [4, 0, 6], [22, 4, 3], [2]])\n    [[4, 0, 6], [2]]\n    """\n    even_lists = []\n    for sublist in lst:\n        for i in sublist:\n            if i % 2 == 1:\n                break\n        else:\n            even_lists.append(sublist)\n    return even_lists\n
            \n

            Or as others have suggested, a one-liner:

            \n
            def get_even_lists(lst):\n    """ (list of list of int) -> list of list of int\n\n    Return a list of the lists in lst that contain only even integers. \n\n    >>> only_evens([[1, 2, 4], [4, 0, 6], [22, 4, 3], [2]])\n    [[4, 0, 6], [2]]\n    """\n    return [sublst for sublst in lst if all(i % 2 == 0 for i in sublst)]\n
            \n

            But let's be honest here: while it's arguable that using two functions might be a bit longer and not as "cool" as the other two solutions, it's reusable, easy to read and understand, and it's maintainable. I'd argue it's much better than any other option out there.

            \n soup wrap:

            I would split it into two functions: one which checks if a list contains only even numbers, and the other one is your main function (I renamed it to get_even_lists()), which gets all the even lists from a list of lists:

            def only_even_elements(l):
                """ (list of int) -> bool
            
                Return a whether a list contains only even integers.
            
                >>> only_even_elements([1, 2, 4])  # 1 is not even
                False
                """
                for e in l:
                    if e % 2 == 1:
                        return False
                return True
            
            def get_even_lists(lst):
                """ (list of list of int) -> list of list of int
            
                Return a list of the lists in lst that contain only even integers. 
            
                >>> only_evens([[1, 2, 4], [4, 0, 6], [22, 4, 3], [2]])
                [[4, 0, 6], [2]]
                """
                # return [l for l in lst if only_even_elements(l)]
                even_lists = []
                for sublist in lst:
                    if only_even_elements(sublist):
                        even_lists.append(sublist)
                return even_lists
            

            Although, this could be done with for/else:

            def get_even_lists(lst):
                """ (list of list of int) -> list of list of int
            
                Return a list of the lists in lst that contain only even integers. 
            
                >>> only_evens([[1, 2, 4], [4, 0, 6], [22, 4, 3], [2]])
                [[4, 0, 6], [2]]
                """
                even_lists = []
                for sublist in lst:
                    for i in sublist:
                        if i % 2 == 1:
                            break
                    else:
                        even_lists.append(sublist)
                return even_lists
            

            Or as others have suggested, a one-liner:

            def get_even_lists(lst):
                """ (list of list of int) -> list of list of int
            
                Return a list of the lists in lst that contain only even integers. 
            
                >>> only_evens([[1, 2, 4], [4, 0, 6], [22, 4, 3], [2]])
                [[4, 0, 6], [2]]
                """
                return [sublst for sublst in lst if all(i % 2 == 0 for i in sublst)]
            

            But let's be honest here: while it's arguable that using two functions might be a bit longer and not as "cool" as the other two solutions, it's reusable, easy to read and understand, and it's maintainable. I'd argue it's much better than any other option out there.

            qid & accept id: (35840403, 35840585) query: Python - filling a list of tuples with zeros in places of missing indexes soup:

            Just a straight for loop is probably easier than a list comprehension:

            \n
            data = [(0.0, 287999.70000000007),\n(1.0, 161123.23000000001),\n(2.0, 93724.140000000014),\n(3.0, 60347.309999999983),\n(4.0, 55687.239999999998),\n(5.0, 29501.349999999999),\n(6.0, 14993.920000000002),\n(7.0, 14941.970000000001),\n(8.0, 13066.229999999998),\n(9.0, 10101.040000000001),\n(10.0, 4151.6900000000005),\n(11.0, 2998.8899999999999),\n(12.0, 1548.9300000000001),\n(15.0, 1595.54),\n(16.0, 1435.98),\n(17.0, 1383.01)]\n\nresult = []\nlast = 0.0\nfor d in data:\n    while last < d[0]:\n        result.append((last, 0))\n        last += 1\n    result.append(d)\n    last = d[0]+1\n
            \n

            Slightly shorter (and including a list comprehension):

            \n
            result, last = [], 0.0\nfor d in data:\n    result.extend((r,0) for r in range(int(last), int(d[0])))\n    result.append(d)\n    last = d[0]+1\n
            \n soup wrap:

            Just a straight for loop is probably easier than a list comprehension:

            data = [(0.0, 287999.70000000007),
            (1.0, 161123.23000000001),
            (2.0, 93724.140000000014),
            (3.0, 60347.309999999983),
            (4.0, 55687.239999999998),
            (5.0, 29501.349999999999),
            (6.0, 14993.920000000002),
            (7.0, 14941.970000000001),
            (8.0, 13066.229999999998),
            (9.0, 10101.040000000001),
            (10.0, 4151.6900000000005),
            (11.0, 2998.8899999999999),
            (12.0, 1548.9300000000001),
            (15.0, 1595.54),
            (16.0, 1435.98),
            (17.0, 1383.01)]
            
            result = []
            last = 0.0
            for d in data:
                while last < d[0]:
                    result.append((last, 0))
                    last += 1
                result.append(d)
                last = d[0]+1
            

            Slightly shorter (and including a list comprehension):

            result, last = [], 0.0
            for d in data:
                result.extend((r,0) for r in range(int(last), int(d[0])))
                result.append(d)
                last = d[0]+1
            
            qid & accept id: (35847865, 35847951) query: Retaining category order when charting/plotting ordered categorical Series soup:

            You can add parameter sort=False in value_counts:

            \n
            print df.value_counts()\nawful    2\ngood     2\nbad      2\nok       1\ndtype: int64\n\nprint df.value_counts(sort=False)\nbad      2\nok       1\ngood     2\nawful    2\ndtype: int64\n\nprint df.value_counts(sort=False).plot.bar()\n
            \n

            graph

            \n

            Or add sort_index:

            \n
            print df.value_counts().sort_index()\nbad      2\nok       1\ngood     2\nawful    2\ndtype: int64\n\nprint df.value_counts().sort_index().plot.bar()\n
            \n soup wrap:

            You can add parameter sort=False in value_counts:

            print df.value_counts()
            awful    2
            good     2
            bad      2
            ok       1
            dtype: int64
            
            print df.value_counts(sort=False)
            bad      2
            ok       1
            good     2
            awful    2
            dtype: int64
            
            print df.value_counts(sort=False).plot.bar()
            

            graph

            Or add sort_index:

            print df.value_counts().sort_index()
            bad      2
            ok       1
            good     2
            awful    2
            dtype: int64
            
            print df.value_counts().sort_index().plot.bar()
            
            qid & accept id: (35867650, 35868084) query: Python multiprocessing and shared numpy array soup:

            A simple way to parallelize that code would be to use a Pool of processes:

            \n
            pool = multiprocessing.Pool()\nresults = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))\n\nfor i, res in enumerate(results):\n    C[i*10:(i+1)*10,:10] = res\n
            \n

            I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).

            \n

            Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.

            \n
            \n

            It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:

            \n
            def f(args):\n    return get_sub_matrix_C(*args)\n\npool = multiprocessing.Pool()\nresults = pool.map(f, ((i, other_args) for i in range(10)))\n\nfor i, res in enumerate(results):\n    C[i*10:(i+1)*10,:10] = res\n
            \n soup wrap:

            A simple way to parallelize that code would be to use a Pool of processes:

            pool = multiprocessing.Pool()
            results = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))
            
            for i, res in enumerate(results):
                C[i*10:(i+1)*10,:10] = res
            

            I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).

            Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.


            It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:

            def f(args):
                return get_sub_matrix_C(*args)
            
            pool = multiprocessing.Pool()
            results = pool.map(f, ((i, other_args) for i in range(10)))
            
            for i, res in enumerate(results):
                C[i*10:(i+1)*10,:10] = res
            
            qid & accept id: (35890697, 35890966) query: Python sorting array according to date soup:

            Write a function to extract the date from your string with a regular expression, and use that as key to sorted:

            \n
            import re\n\nl = ['',\n     'q//Attachments/Swoop_coverletter_311386_20120103.doc',\n     'q//Attachments/Swoop_RESUME_311386_20091012.doc',\n     'q//Attachments/Swoop_Resume_311386_20100901.doc',\n     'q//Attachments/Swoop_reSume_311386_20120103.doc',\n     'q//Attachments/Swoop_coverletter_311386_20100901.doc',\n     'q//Attachments/Swoop_coverletter_311386_20091012.doc']\n\ndef get_date(line):\n    pattern = '.*_(\d{8}).doc'\n    m = re.match(pattern, line)\n    if m:\n        return int(m.group(1))\n    else:\n        return -1 # or do something else with lines that contain no date\n\n\nprint sorted(l, key=get_date, reverse=True)\n
            \n

            prints:

            \n
            ['q//Attachments/Swoop_coverletter_311386_20120103.doc', \n 'q//Attachments/Swoop_reSume_311386_20120103.doc', \n 'q//Attachments/Swoop_Resume_311386_20100901.doc', \n 'q//Attachments/Swoop_coverletter_311386_20100901.doc', \n 'q//Attachments/Swoop_RESUME_311386_20091012.doc', \n 'q//Attachments/Swoop_coverletter_311386_20091012.doc', \n '']\n
            \n soup wrap:

            Write a function to extract the date from your string with a regular expression, and use that as key to sorted:

            import re
            
            l = ['',
                 'q//Attachments/Swoop_coverletter_311386_20120103.doc',
                 'q//Attachments/Swoop_RESUME_311386_20091012.doc',
                 'q//Attachments/Swoop_Resume_311386_20100901.doc',
                 'q//Attachments/Swoop_reSume_311386_20120103.doc',
                 'q//Attachments/Swoop_coverletter_311386_20100901.doc',
                 'q//Attachments/Swoop_coverletter_311386_20091012.doc']
            
            def get_date(line):
                pattern = '.*_(\d{8}).doc'
                m = re.match(pattern, line)
                if m:
                    return int(m.group(1))
                else:
                    return -1 # or do something else with lines that contain no date
            
            
            print sorted(l, key=get_date, reverse=True)
            

            prints:

            ['q//Attachments/Swoop_coverletter_311386_20120103.doc', 
             'q//Attachments/Swoop_reSume_311386_20120103.doc', 
             'q//Attachments/Swoop_Resume_311386_20100901.doc', 
             'q//Attachments/Swoop_coverletter_311386_20100901.doc', 
             'q//Attachments/Swoop_RESUME_311386_20091012.doc', 
             'q//Attachments/Swoop_coverletter_311386_20091012.doc', 
             '']
            
            qid & accept id: (35913509, 35913981) query: Array from interpolated plot in python soup:

            Yes it is possible. The interpolation returns a callable so you can just call it with your wanted grid:

            \n
            from scipy.interpolate import interp2d\nimport numpy as np\n\nx = 50\ny = 150\n\na = np.random.uniform(0,10,(y, x))\nb = interp2d(np.arange(x), np.arange(y), a)\n
            \n

            That was just to create some test data. To get what you want you need to evaluate the interpolation b on a grid of your choosing:

            \n
            upsample_factor = 2\nc = b(np.linspace(0,x,x*upsample), np.linspace(0,y,y*upsample))\n
            \n

            That creates a grid with 2 times the number of elements that your original array had.

            \n

            Or you could just define some box around your region of interest and call it with np.linspace with the required number of elements.

            \n soup wrap:

            Yes it is possible. The interpolation returns a callable so you can just call it with your wanted grid:

            from scipy.interpolate import interp2d
            import numpy as np
            
            x = 50
            y = 150
            
            a = np.random.uniform(0,10,(y, x))
            b = interp2d(np.arange(x), np.arange(y), a)
            

            That was just to create some test data. To get what you want you need to evaluate the interpolation b on a grid of your choosing:

            upsample_factor = 2
            c = b(np.linspace(0,x,x*upsample), np.linspace(0,y,y*upsample))
            

            That creates a grid with 2 times the number of elements that your original array had.

            Or you could just define some box around your region of interest and call it with np.linspace with the required number of elements.

            qid & accept id: (35952815, 35952915) query: Python: Binning one coordinate and averaging another based on these bins soup:

            I suppose you are using Python 2 but if not you should change the division when calculating the step to // (floor division) otherwise numpy will be annoyed that it cannot interpret floats as step.

            \n
            binwidth = numpy.max(rev_count)//10 # Changed this to floor division\nrevbin = range(0, numpy.max(rev_count), binwidth)\nrevbinnedstars = [None]*len(revbin)\n\nfor i in range(0, len(revbin)-1):\n    # I actually don't know what you wanted to do but I guess you wanted the\n    # "logical and" combination in that bin (you don't need to use np.where here)\n    # You can put that all in one statement but it gets crowded so I'll split it:\n    index1 = revbin[i]-binwidth/2 < rev_count\n    index2 = rev_count < revbin[i]+binwidth/2)\n    revbinnedstars[i] = numpy.mean(stars[np.logical_and(index1, index2)])\n
            \n

            That at least should work and gives the right results. It will be very inefficient if you have huge datasets and want more than 10 bins.

            \n

            One very important takeaway:

            \n
              \n
            • Don't use np.argwhere if you want to index an array. That result is just supposed to be human readable. If you really want the coordinates use np.where. That can be used as index but isn't that pretty to read if you have multidimensional inputs.
            • \n
            \n

            The numpy documentation supports me on that point:

            \n
            \n

            The output of argwhere is not suitable for indexing arrays. For this purpose use where(a) instead.

            \n
            \n

            That's also the reason why your code was so slow. It tried to do something you don't want it to do and which can be very expensive in memory and cpu usage. Without giving you the right result.

            \n

            What I have done here is called boolean masks. It's shorter to write than np.where(condition) and involves one less calculation.

            \n
            \n

            A completly vectorized approach could be used by defining a grid that knows which stars are in which bin:

            \n
            bins = 10\nbinwidth = numpy.max(rev_count)//bins\nrevbin = np.arange(0, np.max(rev_count)+binwidth+1, binwidth)\n
            \n

            an even better approach for defining the bins would be. Beware that you have to add one to the maximum since you want to include it and one to the number of bins because you are interested in the bin-start and end-points not the center of the bins:

            \n
            number_of_bins = 10\nrevbin = np.linspace(np.min(rev_count), np.max(rev_count)+1, number_of_bins+1)\n
            \n

            and then you can setup the grid:

            \n
            grid = np.logical_and(rev_count[None, :] >= revbin[:-1, None], rev_count[None, :] < revbin[1:, None])\n
            \n

            The grid is bins x rev_count big (because of the broadcasting, I increased the dimensions of each of those arrays by one BUT not the same). This essentially checkes if a point is bigger than the lower bin range and smaller than the upper bin range (therefore the [:-1] and [1:] indices). This is done multidimensional where the counts are in the second dimension (numpy axis=1) and the bins in the first dimension (numpy axis=0)

            \n

            So we can get the Y coordinates of the stars in the appropriate bin by just multiplying these with this grid:

            \n
            stars * grid\n
            \n

            To calculate the mean we need the sum of the coordinates in this bin and divide it by the number of stars in that bin (bins are along the axis=1, stars that are not in this bin only have a value of zero along this axis):

            \n
            revbinnedstars = np.sum(stars * grid, axis=1) / np.sum(grid, axis=1)\n
            \n

            I actually don't know if that's more efficient. It'll be a lot more expensive in memory but maybe a bit less expensive in CPU.

            \n soup wrap:

            I suppose you are using Python 2 but if not you should change the division when calculating the step to // (floor division) otherwise numpy will be annoyed that it cannot interpret floats as step.

            binwidth = numpy.max(rev_count)//10 # Changed this to floor division
            revbin = range(0, numpy.max(rev_count), binwidth)
            revbinnedstars = [None]*len(revbin)
            
            for i in range(0, len(revbin)-1):
                # I actually don't know what you wanted to do but I guess you wanted the
                # "logical and" combination in that bin (you don't need to use np.where here)
                # You can put that all in one statement but it gets crowded so I'll split it:
                index1 = revbin[i]-binwidth/2 < rev_count
                index2 = rev_count < revbin[i]+binwidth/2)
                revbinnedstars[i] = numpy.mean(stars[np.logical_and(index1, index2)])
            

            That at least should work and gives the right results. It will be very inefficient if you have huge datasets and want more than 10 bins.

            One very important takeaway:

            • Don't use np.argwhere if you want to index an array. That result is just supposed to be human readable. If you really want the coordinates use np.where. That can be used as index but isn't that pretty to read if you have multidimensional inputs.

            The numpy documentation supports me on that point:

            The output of argwhere is not suitable for indexing arrays. For this purpose use where(a) instead.

            That's also the reason why your code was so slow. It tried to do something you don't want it to do and which can be very expensive in memory and cpu usage. Without giving you the right result.

            What I have done here is called boolean masks. It's shorter to write than np.where(condition) and involves one less calculation.


            A completly vectorized approach could be used by defining a grid that knows which stars are in which bin:

            bins = 10
            binwidth = numpy.max(rev_count)//bins
            revbin = np.arange(0, np.max(rev_count)+binwidth+1, binwidth)
            

            an even better approach for defining the bins would be. Beware that you have to add one to the maximum since you want to include it and one to the number of bins because you are interested in the bin-start and end-points not the center of the bins:

            number_of_bins = 10
            revbin = np.linspace(np.min(rev_count), np.max(rev_count)+1, number_of_bins+1)
            

            and then you can setup the grid:

            grid = np.logical_and(rev_count[None, :] >= revbin[:-1, None], rev_count[None, :] < revbin[1:, None])
            

            The grid is bins x rev_count big (because of the broadcasting, I increased the dimensions of each of those arrays by one BUT not the same). This essentially checkes if a point is bigger than the lower bin range and smaller than the upper bin range (therefore the [:-1] and [1:] indices). This is done multidimensional where the counts are in the second dimension (numpy axis=1) and the bins in the first dimension (numpy axis=0)

            So we can get the Y coordinates of the stars in the appropriate bin by just multiplying these with this grid:

            stars * grid
            

            To calculate the mean we need the sum of the coordinates in this bin and divide it by the number of stars in that bin (bins are along the axis=1, stars that are not in this bin only have a value of zero along this axis):

            revbinnedstars = np.sum(stars * grid, axis=1) / np.sum(grid, axis=1)
            

            I actually don't know if that's more efficient. It'll be a lot more expensive in memory but maybe a bit less expensive in CPU.

            qid & accept id: (35962295, 35962359) query: How to create a Dictionary in Python with 2 string keys to access an integer? soup:

            Here are two ways to create this. First, the traditional dict

            \n
            dic = {}\ndic['New York'] = {}\ndic['New York']['Chicago'] = 25\n
            \n

            or using a defaultdict:

            \n
            from collections import defaultdict\ndic2 = defaultdict(dict)\ndic2['New York']['Chicago'] = 25\n
            \n soup wrap:

            Here are two ways to create this. First, the traditional dict

            dic = {}
            dic['New York'] = {}
            dic['New York']['Chicago'] = 25
            

            or using a defaultdict:

            from collections import defaultdict
            dic2 = defaultdict(dict)
            dic2['New York']['Chicago'] = 25
            
            qid & accept id: (35966940, 35967279) query: finding the max of a column in an array soup:

            An example with a random input array, showing that you can take the max in either axis easily with one command.

            \n
            import numpy as np\n\naa= np.random.random([4,3]) \nprint aa\nprint\nprint np.max(aa,axis=0)\nprint\nprint np.max(aa,axis=1)\n
            \n

            Output:

            \n
            [[ 0.51972266  0.35930957  0.60381998]\n [ 0.34577217  0.27908173  0.52146593]\n [ 0.12101346  0.52268843  0.41704152]\n [ 0.24181773  0.40747905  0.14980534]]\n\n[ 0.51972266  0.52268843  0.60381998]\n\n[ 0.60381998  0.52146593  0.52268843  0.40747905]\n
            \n soup wrap:

            An example with a random input array, showing that you can take the max in either axis easily with one command.

            import numpy as np
            
            aa= np.random.random([4,3]) 
            print aa
            print
            print np.max(aa,axis=0)
            print
            print np.max(aa,axis=1)
            

            Output:

            [[ 0.51972266  0.35930957  0.60381998]
             [ 0.34577217  0.27908173  0.52146593]
             [ 0.12101346  0.52268843  0.41704152]
             [ 0.24181773  0.40747905  0.14980534]]
            
            [ 0.51972266  0.52268843  0.60381998]
            
            [ 0.60381998  0.52146593  0.52268843  0.40747905]
            
            qid & accept id: (36017497, 36017855) query: Binary search of a number within a list in Python soup:

            Apart from bisect and ( as listed in discussion) you can create your own function.\nBelow is a possible way this can be done.

            \n

            If you use integer division within python , it will take care of even / odd list. As an e.g. 10 / 3 = 3 and 9 / 3 = 3.

            \n

            Sample Code

            \n
            import random\ndef binarySearch(alist, item):\n        first = 0\n        last = len(alist) - 1\n        found = False\n\n        while first<=last and not found:\n            midpoint = (first + last)//2            \n            if alist[midpoint] == item:\n                found = True\n            else:\n                if item < alist[midpoint]:\n                    last = midpoint-1\n                else:\n                    first = midpoint+1  \n        return found\n\ndef findThisNum(mynum):\n\n    testlist = [x for x in range(listlength)]\n\n    print "testlist = ", testlist\n    print "finding number ", mynum\n\n    if (binarySearch(testlist, findnum)) == True:\n        print "found %d" %mynum\n    else:\n        print "Not found %d" %mynum\n\n\n\n\n#### Main_Function ####\n\nif __name__ == "__main__":\n    #\n\n    #Search 1 [ Even numbered list ]\n    listlength = 10    \n    findnum = random.randrange(0,listlength)\n    findThisNum(findnum)     \n\n    #Search 2 [ [ Odd numbered list ]\n    listlength = 13    \n    findnum = random.randrange(0,listlength)\n    findThisNum(findnum)\n\n    #search 3  [ find item not in the list ]\n\n    listlength = 13    \n    findnum = random.randrange(0,listlength) + listlength\n    findThisNum(findnum)\n
            \n

            Output

            \n
            Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32\nType "copyright", "credits" or "license()" for more information.\n>>> ================================ RESTART ================================\n>>> \ntestlist =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\nfinding number  4\nfound 4\ntestlist =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]\nfinding number  9\nfound 9\ntestlist =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]\nfinding number  21\nNot found 21\n
            \n soup wrap:

            Apart from bisect and ( as listed in discussion) you can create your own function. Below is a possible way this can be done.

            If you use integer division within python , it will take care of even / odd list. As an e.g. 10 / 3 = 3 and 9 / 3 = 3.

            Sample Code

            import random
            def binarySearch(alist, item):
                    first = 0
                    last = len(alist) - 1
                    found = False
            
                    while first<=last and not found:
                        midpoint = (first + last)//2            
                        if alist[midpoint] == item:
                            found = True
                        else:
                            if item < alist[midpoint]:
                                last = midpoint-1
                            else:
                                first = midpoint+1  
                    return found
            
            def findThisNum(mynum):
            
                testlist = [x for x in range(listlength)]
            
                print "testlist = ", testlist
                print "finding number ", mynum
            
                if (binarySearch(testlist, findnum)) == True:
                    print "found %d" %mynum
                else:
                    print "Not found %d" %mynum
            
            
            
            
            #### Main_Function ####
            
            if __name__ == "__main__":
                #
            
                #Search 1 [ Even numbered list ]
                listlength = 10    
                findnum = random.randrange(0,listlength)
                findThisNum(findnum)     
            
                #Search 2 [ [ Odd numbered list ]
                listlength = 13    
                findnum = random.randrange(0,listlength)
                findThisNum(findnum)
            
                #search 3  [ find item not in the list ]
            
                listlength = 13    
                findnum = random.randrange(0,listlength) + listlength
                findThisNum(findnum)
            

            Output

            Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
            Type "copyright", "credits" or "license()" for more information.
            >>> ================================ RESTART ================================
            >>> 
            testlist =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
            finding number  4
            found 4
            testlist =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
            finding number  9
            found 9
            testlist =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
            finding number  21
            Not found 21
            
            qid & accept id: (36041797, 36041844) query: Python list to txt soup:

            The second option of open, w, will overwrite the file's contents. Use a to append instead, but also put a newline character \n after each line.

            \n
            def random_grid(file):\n    grid = []\n    num_rows = raw_input("How many raws would you like in your grid? ")\n    num_columns = raw_input("How many columns would you like in your grid? ")\n    min_range = raw_input("What is the minimum number you would like in your grid? ")\n    max_range = raw_input("what is the maximum number you would like in your grid? ")\n    for row in range(int(num_rows)):\n        grid.append([])\n        for column in range(int(num_columns)):\n            grid[row].append(random.randint((int(min_range)),(int(max_range))))         \n    for row in grid:\n        x = (' '.join([str(x) for x in row])) \n        print x\n\n        with open(r"test.txt", 'a') as text_file:\n            text_file.write(x)\n            text_file.write("\n")\n
            \n

            The other, more efficient way to do it is to move your file-writing code outside the loop, like this:

            \n
            def random_grid(file):\n    grid = []\n    num_rows = raw_input("How many raws would you like in your grid? ")\n    num_columns = raw_input("How many columns would you like in your grid? ")\n    min_range = raw_input("What is the minimum number you would like in your grid? ")\n    max_range = raw_input("what is the maximum number you would like in your grid? ")\n    for row in range(int(num_rows)):\n        grid.append([])\n        for column in range(int(num_columns)):\n            grid[row].append(random.randint((int(min_range)),(int(max_range))))    \n    x = ""\n    for row in grid:\n        x += (' '.join([str(x) for x in row])) + "\n" \n        print x\n\n    with open(r"test.txt", 'w') as text_file:\n        text_file.write(x)\n
            \n soup wrap:

            The second option of open, w, will overwrite the file's contents. Use a to append instead, but also put a newline character \n after each line.

            def random_grid(file):
                grid = []
                num_rows = raw_input("How many raws would you like in your grid? ")
                num_columns = raw_input("How many columns would you like in your grid? ")
                min_range = raw_input("What is the minimum number you would like in your grid? ")
                max_range = raw_input("what is the maximum number you would like in your grid? ")
                for row in range(int(num_rows)):
                    grid.append([])
                    for column in range(int(num_columns)):
                        grid[row].append(random.randint((int(min_range)),(int(max_range))))         
                for row in grid:
                    x = (' '.join([str(x) for x in row])) 
                    print x
            
                    with open(r"test.txt", 'a') as text_file:
                        text_file.write(x)
                        text_file.write("\n")
            

            The other, more efficient way to do it is to move your file-writing code outside the loop, like this:

            def random_grid(file):
                grid = []
                num_rows = raw_input("How many raws would you like in your grid? ")
                num_columns = raw_input("How many columns would you like in your grid? ")
                min_range = raw_input("What is the minimum number you would like in your grid? ")
                max_range = raw_input("what is the maximum number you would like in your grid? ")
                for row in range(int(num_rows)):
                    grid.append([])
                    for column in range(int(num_columns)):
                        grid[row].append(random.randint((int(min_range)),(int(max_range))))    
                x = ""
                for row in grid:
                    x += (' '.join([str(x) for x in row])) + "\n" 
                    print x
            
                with open(r"test.txt", 'w') as text_file:
                    text_file.write(x)
            
            qid & accept id: (36050713, 36051027) query: Using Py_BuildValue() to create a list of tuples in C soup:

            You can use PyList_New(), PyTuple_New(), PyList_Append(), and PyTuple_SetItem() to accomplish this...

            \n
            const Py_ssize_t tuple_length = 4;\nconst unsigned some_limit = 4;\n\nPyObject *my_list = PyList_New(0);\nif(my_list == NULL) {\n    // ...\n}\n\nfor(unsigned i = 0; i < some_limit; i++) {\n    PyObject *the_tuple = PyTuple_New(tuple_length);\n    if(the_tuple == NULL) {\n        // ...\n    }\n\n    for(Py_ssize_t j = 0; i < tuple_length; i++) {\n        PyObject *the_object = PyLong_FromSsize_t(i * tuple_length + j);\n        if(the_object == NULL) {\n            // ...\n        }\n\n        PyTuple_SET_ITEM(the_tuple, j, the_object);\n    }\n\n    if(PyList_Append(my_list, the_tuple) == -1) {\n        // ...\n    }\n}\n
            \n

            That will create a list of the form...

            \n
            [(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14, 15)]\n
            \n soup wrap:

            You can use PyList_New(), PyTuple_New(), PyList_Append(), and PyTuple_SetItem() to accomplish this...

            const Py_ssize_t tuple_length = 4;
            const unsigned some_limit = 4;
            
            PyObject *my_list = PyList_New(0);
            if(my_list == NULL) {
                // ...
            }
            
            for(unsigned i = 0; i < some_limit; i++) {
                PyObject *the_tuple = PyTuple_New(tuple_length);
                if(the_tuple == NULL) {
                    // ...
                }
            
                for(Py_ssize_t j = 0; i < tuple_length; i++) {
                    PyObject *the_object = PyLong_FromSsize_t(i * tuple_length + j);
                    if(the_object == NULL) {
                        // ...
                    }
            
                    PyTuple_SET_ITEM(the_tuple, j, the_object);
                }
            
                if(PyList_Append(my_list, the_tuple) == -1) {
                    // ...
                }
            }
            

            That will create a list of the form...

            [(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14, 15)]
            
            qid & accept id: (36061608, 36061711) query: Erasing list of phrases from list of texts in python soup:

            You can use list comprehension:

            \n
            def find_words_and_remove(words, strings):\n    return [" ".join(word for word in string.split() if word not in words) for string in strings]\n
            \n

            That will work only when there are single words in b, but because of your edit and comment, I now know that you really do need _find_word_and_remove(). Your recursion way isn't really too bad, but if you don't want recursion, do this:

            \n
            def find_words_and_remove(words, strings):\n    strings_copy = strings[:]\n    for i, word in enumerate(words):\n        for string in strings:\n            strings_copy[i] = _find_word_and_remove(word, string)\n    return strings_copy\n
            \n soup wrap:

            You can use list comprehension:

            def find_words_and_remove(words, strings):
                return [" ".join(word for word in string.split() if word not in words) for string in strings]
            

            That will work only when there are single words in b, but because of your edit and comment, I now know that you really do need _find_word_and_remove(). Your recursion way isn't really too bad, but if you don't want recursion, do this:

            def find_words_and_remove(words, strings):
                strings_copy = strings[:]
                for i, word in enumerate(words):
                    for string in strings:
                        strings_copy[i] = _find_word_and_remove(word, string)
                return strings_copy
            
            qid & accept id: (36066726, 36067528) query: python - numpy: read csv into numpy with proper value type soup:

            Since the first character at each line is a string you'll have to use a more flexible type in numpy called "object". Try with this function and see if this is what you are looking for:

            \n
                def readCSVToNumpyArray(dataset):\n        values = [[]]\n        with open(dataset) as f:\n            counter = 0\n            for i in csv.reader(f):\n                for j in i:\n                    try:\n                        values[counter].append(float(j))\n                    except ValueError:\n                        values[counter].append(j)\n                counter = counter + 1\n                values.append([])\n\n        data = numpy.array(values[:-1],dtype='object')\n\n        return data\n\n    numpyArray = readCSVToNumpyArray('test_data.csv')\n    print(numpyArray)\n
            \n

            The results are:

            \n
                [['A' 1.0 2.0 3.0 4.0 5.0]\n     ['B' 6.0 7.0 8.0 9.0 10.0]\n     ['C' 11.0 12.0 13.0 14.0 15.0]\n     ['A' 16.0 17.0 18.0 19.0 20.0]]\n
            \n soup wrap:

            Since the first character at each line is a string you'll have to use a more flexible type in numpy called "object". Try with this function and see if this is what you are looking for:

                def readCSVToNumpyArray(dataset):
                    values = [[]]
                    with open(dataset) as f:
                        counter = 0
                        for i in csv.reader(f):
                            for j in i:
                                try:
                                    values[counter].append(float(j))
                                except ValueError:
                                    values[counter].append(j)
                            counter = counter + 1
                            values.append([])
            
                    data = numpy.array(values[:-1],dtype='object')
            
                    return data
            
                numpyArray = readCSVToNumpyArray('test_data.csv')
                print(numpyArray)
            

            The results are:

                [['A' 1.0 2.0 3.0 4.0 5.0]
                 ['B' 6.0 7.0 8.0 9.0 10.0]
                 ['C' 11.0 12.0 13.0 14.0 15.0]
                 ['A' 16.0 17.0 18.0 19.0 20.0]]
            
            qid & accept id: (36071592, 36072568) query: Find difference between two multi dimensional lists soup:

            You can use two dicts and find the set difference of the keys, where the keys are each first element:

            \n
            a = [["greg", 1.2, 400, 234], ["top", 9.0, 5.1, 2300], ["file", 5.7, 2.2, 900], ["stop", 1.6, 6.7, 200]]\n\nb = [["hall", 5.2, 460, 234], ["line", 5.3, 5.91, 100], ["file", 2.7, 3.3, 6.4], ["stop", 6.6, 5.7, 230]]\n\nd1 = {sub[0]: sub for sub in a}\nd2 = {sub[0]: sub for sub in b}\n\nprint([d2[k] for k in d2.keys() - d1])\nprint([d1[k] for k in d1.keys() - d2])\n
            \n

            Output:

            \n
            [['hall', 5.2, 460, 234], ['line', 5.3, 5.91, 100]]\n[['top', 9.0, 5.1, 2300], ['greg', 1.2, 400, 234]]\n
            \n

            The correct output is [['top', 9.0, 5.1, 2300], ['greg', 1.2, 400, 234]] not just [['greg', 1.2, 400, 234]] as per your expected output in your question.

            \n

            The equivalent python 2 code would use need to use viewkeys:

            \n
            print([d2[k] for k in d2.viewkeys() - d1])\nprint([d1[k] for k in d1.viewkeys() - d2])\n
            \n soup wrap:

            You can use two dicts and find the set difference of the keys, where the keys are each first element:

            a = [["greg", 1.2, 400, 234], ["top", 9.0, 5.1, 2300], ["file", 5.7, 2.2, 900], ["stop", 1.6, 6.7, 200]]
            
            b = [["hall", 5.2, 460, 234], ["line", 5.3, 5.91, 100], ["file", 2.7, 3.3, 6.4], ["stop", 6.6, 5.7, 230]]
            
            d1 = {sub[0]: sub for sub in a}
            d2 = {sub[0]: sub for sub in b}
            
            print([d2[k] for k in d2.keys() - d1])
            print([d1[k] for k in d1.keys() - d2])
            

            Output:

            [['hall', 5.2, 460, 234], ['line', 5.3, 5.91, 100]]
            [['top', 9.0, 5.1, 2300], ['greg', 1.2, 400, 234]]
            

            The correct output is [['top', 9.0, 5.1, 2300], ['greg', 1.2, 400, 234]] not just [['greg', 1.2, 400, 234]] as per your expected output in your question.

            The equivalent python 2 code would use need to use viewkeys:

            print([d2[k] for k in d2.viewkeys() - d1])
            print([d1[k] for k in d1.viewkeys() - d2])
            
            qid & accept id: (36086075, 36088812) query: Unique duplicate rows with range soup:

            It doesn't make sense to use Set here because you have to break the elements into tokens so they are awkward to manage. I would use a pair of two-dimensional arrays, one for your candidate lines and one for results.

            \n

            I would read the whole file into a candidates array and create an empty results array. Then I would traverse the candidates array and look for matches in the results array. If I didn't find a match in the results array I would copy the candidate into the results array.

            \n

            Something like:

            \n
            candidates = []\nresults = []\nfor line in my_file:\n    candidates.append(line.split('\t'))\nfor line in candidates:\n    seen = false\n    for possible_match in results:\n        if matching_line(possible_match, line):\n            seen = true\n    if seen:\n        continue\n    else:\n        results.append(line)\n
            \n

            Then you need a function to decide if two arrays match:

            \n
            function matching_line(array1, array2):\n    if array1[0] = array2[0]\n    ..etc\n
            \n soup wrap:

            It doesn't make sense to use Set here because you have to break the elements into tokens so they are awkward to manage. I would use a pair of two-dimensional arrays, one for your candidate lines and one for results.

            I would read the whole file into a candidates array and create an empty results array. Then I would traverse the candidates array and look for matches in the results array. If I didn't find a match in the results array I would copy the candidate into the results array.

            Something like:

            candidates = []
            results = []
            for line in my_file:
                candidates.append(line.split('\t'))
            for line in candidates:
                seen = false
                for possible_match in results:
                    if matching_line(possible_match, line):
                        seen = true
                if seen:
                    continue
                else:
                    results.append(line)
            

            Then you need a function to decide if two arrays match:

            function matching_line(array1, array2):
                if array1[0] = array2[0]
                ..etc
            
            qid & accept id: (36144303, 36144349) query: Python - split list of lists by value soup:

            You're better off making a dictionary. If you really want to make a bunch of variables, you'll have to use globals(), which isn't really recommended.

            \n
            a = [["aa",1,3]\n     ["aa",3,3]\n     ["sdsd",1,3]\n     ["sdsd",6,0]\n     ["sdsd",2,5]\n     ["fffffff",1,3]]\n\nd = {}\nfor sub in a:\n    key = sub[0]\n    if key not in d: d[key] = []\n    d[key].append(sub)\n
            \n

            OR

            \n
            import collections\n\nd = collections.defaultdict(list)\nfor sub in a:\n    d[sub[0]].append(sub)\n
            \n soup wrap:

            You're better off making a dictionary. If you really want to make a bunch of variables, you'll have to use globals(), which isn't really recommended.

            a = [["aa",1,3]
                 ["aa",3,3]
                 ["sdsd",1,3]
                 ["sdsd",6,0]
                 ["sdsd",2,5]
                 ["fffffff",1,3]]
            
            d = {}
            for sub in a:
                key = sub[0]
                if key not in d: d[key] = []
                d[key].append(sub)
            

            OR

            import collections
            
            d = collections.defaultdict(list)
            for sub in a:
                d[sub[0]].append(sub)
            
            qid & accept id: (36149707, 36149943) query: Modify a python script with bash and execute it with the changes soup:

            As multiple comments have already suggested, don't do that. Change the Python script so that it accepts a second parameter instead. Modifying your code on the fly is brittle and complex, and the standard solution to that is to parametrize the things you want to change.

            \n
            QUERY = 'www.foo.com' + '/bar?' \\n        + '&title=%(title)s' \\n        + '&start=%(start)i' \\n        + '&num=%(num)s'\n
            \n

            Then run the loop something like

            \n
            start=0\nfor ... in ...\ndo\n    echo "Foo:$foo"\n    echo "Bar:$bar"\n    ./pythonScript.py --argument1 "arg" --start "$start"\n    ((start += 20))  # bash only\ndone    \n
            \n soup wrap:

            As multiple comments have already suggested, don't do that. Change the Python script so that it accepts a second parameter instead. Modifying your code on the fly is brittle and complex, and the standard solution to that is to parametrize the things you want to change.

            QUERY = 'www.foo.com' + '/bar?' \
                    + '&title=%(title)s' \
                    + '&start=%(start)i' \
                    + '&num=%(num)s'
            

            Then run the loop something like

            start=0
            for ... in ...
            do
                echo "Foo:$foo"
                echo "Bar:$bar"
                ./pythonScript.py --argument1 "arg" --start "$start"
                ((start += 20))  # bash only
            done    
            
            qid & accept id: (36165854, 36165908) query: BeautifulSoup scraping information from multiple divs using loops into JSON soup:

            Iterate over every track and make context specific searches:

            \n
            from pprint import pprint\n\nfrom bs4 import BeautifulSoup\n\ndata = """\n
            \n
            \n

            Title 1

            \n

            Description 1

            \n
            \n \n \n
            \n
            \n

            Title 2

            \n

            Description 2

            \n
            \n \n \n
            \n
            """\n\nsoup = BeautifulSoup(data, "html.parser")\n\ntracks = soup.find_all('div', {'class':"audioBoxWrap clearBoth"})\nresult = {\n "podcasts": [\n {\n "title": track.h3.get_text(strip=True),\n "description": track.p.get_text(strip=True),\n "link": track.a["href"]\n }\n for track in tracks\n ]\n}\npprint(result)\n
            \n

            Prints:

            \n
            {'podcasts': [{'description': 'Description 1',\n               'link': 'link1.mp3',\n               'title': 'Title 1'},\n              {'description': 'Description 2',\n               'link': 'link2.mp3',\n               'title': 'Title 2'}]}\n
            \n soup wrap:

            Iterate over every track and make context specific searches:

            from pprint import pprint
            
            from bs4 import BeautifulSoup
            
            data = """
            

            Title 1

            Description 1

            Title 2

            Description 2

            """ soup = BeautifulSoup(data, "html.parser") tracks = soup.find_all('div', {'class':"audioBoxWrap clearBoth"}) result = { "podcasts": [ { "title": track.h3.get_text(strip=True), "description": track.p.get_text(strip=True), "link": track.a["href"] } for track in tracks ] } pprint(result)

            Prints:

            {'podcasts': [{'description': 'Description 1',
                           'link': 'link1.mp3',
                           'title': 'Title 1'},
                          {'description': 'Description 2',
                           'link': 'link2.mp3',
                           'title': 'Title 2'}]}
            
            qid & accept id: (36170581, 36171535) query: How to find the all text files from the path and combine all the lines in that text files to one text file soup:

            You can read in all the text files like this:

            \n
            import os\n\nfile_contents = []\nfor file in os.listdir("directory_to_search"):\n    if file.endswith(".txt"):\n        with open('input.txt', 'rb') as f:\n            file_contents.append(" ".join(line.strip() for line in f))\n
            \n

            This will populate file_contents with the contents of each file, so now you can write them all to the output file:

            \n
            with open('output.txt', 'w+b') as f1:\n    all_files_as_one_string = ' '.join(file_contents)\n    f1.write(all_files_as_one_string)\n
            \n

            Note that if there is more than one word in each of your files, you will need to loop through the file_contents list and join all the lines up before you make the single big string from them.

            \n soup wrap:

            You can read in all the text files like this:

            import os
            
            file_contents = []
            for file in os.listdir("directory_to_search"):
                if file.endswith(".txt"):
                    with open('input.txt', 'rb') as f:
                        file_contents.append(" ".join(line.strip() for line in f))
            

            This will populate file_contents with the contents of each file, so now you can write them all to the output file:

            with open('output.txt', 'w+b') as f1:
                all_files_as_one_string = ' '.join(file_contents)
                f1.write(all_files_as_one_string)
            

            Note that if there is more than one word in each of your files, you will need to loop through the file_contents list and join all the lines up before you make the single big string from them.

            qid & accept id: (36186624, 36188326) query: parse list of tuple in python and eliminate doubles soup:

            I have found the desired solution. I used :

            \n
                apt_pkg.version_compare(a,b).\n
            \n

            Thank you all.

            \n

            Function :

            \n
                def comparePackages(package_dictionary):\n     #loop in keys and values of package_dictionary\n        for package_name, list_versions in zip(package_dictionary.keys(), package_dictionary.values()) :\n            #loop on each sublist\n            for position in xrange(len(list_versions)) :\n                a = str(list_versions[position])\n                b = str(list_versions[position-1])\n                #the only way it worked was by using a and b\n                vc = apt_pkg.version_compare(a,b)\n                if vc > 0:\n                    #a>b\n                    max_version = a\n                elif vc == 0:\n                    #a==b\n                    max_version = a         \n                elif vc < 0:\n                    #a
            \n

            output :

            \n
                lib32c-dev : Not Specified\n    libc6-x32 : 2.16\n    libc6-i386 : 2.16\n    libncurses5-dev : 5.9+20150516-2ubuntu1\n    libc6-dev : Not Specified\n    libc-dev : Not Specified\n    libncursesw5-dev : 5.9+20150516-2ubuntu1\n    libc6-dev-x32 : Not Specified\n
            \n soup wrap:

            I have found the desired solution. I used :

                apt_pkg.version_compare(a,b).
            

            Thank you all.

            Function :

                def comparePackages(package_dictionary):
                 #loop in keys and values of package_dictionary
                    for package_name, list_versions in zip(package_dictionary.keys(), package_dictionary.values()) :
                        #loop on each sublist
                        for position in xrange(len(list_versions)) :
                            a = str(list_versions[position])
                            b = str(list_versions[position-1])
                            #the only way it worked was by using a and b
                            vc = apt_pkg.version_compare(a,b)
                            if vc > 0:
                                #a>b
                                max_version = a
                            elif vc == 0:
                                #a==b
                                max_version = a         
                            elif vc < 0:
                                #a

            output :

                lib32c-dev : Not Specified
                libc6-x32 : 2.16
                libc6-i386 : 2.16
                libncurses5-dev : 5.9+20150516-2ubuntu1
                libc6-dev : Not Specified
                libc-dev : Not Specified
                libncursesw5-dev : 5.9+20150516-2ubuntu1
                libc6-dev-x32 : Not Specified
            
            qid & accept id: (36193225, 36193445) query: Numpy Array Rank All Elements soup:

            That's a routine work for np.unique with its optional argument return_inverse that tags each element based on the uniqueness among other elements, like so -

            \n
            _,id = np.unique(anArray,return_inverse=True)\nout = (id.max() - id + 1).reshape(anArray.shape)\n
            \n

            Sample run -

            \n
            In [17]: anArray\nOut[17]: \narray([[ 18.5,  25.9,   7.4,  11.1,  11.1],\n       [ 33.3,  37. ,  14.8,  22.2,  25.9],\n       [ 29.6,  29.6,  11.1,  14.8,  11.1],\n       [ 25.9,  25.9,  14.8,  14.8,  11.1],\n       [ 29.6,  25.9,  14.8,  11.1,   7.4]])\n\nIn [18]: _,id = np.unique(anArray,return_inverse=True)\n\nIn [19]: (id.max() - id + 1).reshape(anArray.shape)\nOut[19]: \narray([[6, 4, 9, 8, 8],\n       [2, 1, 7, 5, 4],\n       [3, 3, 8, 7, 8],\n       [4, 4, 7, 7, 8],\n       [3, 4, 7, 8, 9]])\n
            \n soup wrap:

            That's a routine work for np.unique with its optional argument return_inverse that tags each element based on the uniqueness among other elements, like so -

            _,id = np.unique(anArray,return_inverse=True)
            out = (id.max() - id + 1).reshape(anArray.shape)
            

            Sample run -

            In [17]: anArray
            Out[17]: 
            array([[ 18.5,  25.9,   7.4,  11.1,  11.1],
                   [ 33.3,  37. ,  14.8,  22.2,  25.9],
                   [ 29.6,  29.6,  11.1,  14.8,  11.1],
                   [ 25.9,  25.9,  14.8,  14.8,  11.1],
                   [ 29.6,  25.9,  14.8,  11.1,   7.4]])
            
            In [18]: _,id = np.unique(anArray,return_inverse=True)
            
            In [19]: (id.max() - id + 1).reshape(anArray.shape)
            Out[19]: 
            array([[6, 4, 9, 8, 8],
                   [2, 1, 7, 5, 4],
                   [3, 3, 8, 7, 8],
                   [4, 4, 7, 7, 8],
                   [3, 4, 7, 8, 9]])
            
            qid & accept id: (36226959, 36227005) query: Collect values of pandas dataframe column A if column B is NaN (Python) soup:

            It should be pretty straightforward:

            \n
            In [10]: df\nOut[10]:\n     a  b  c\n0  NaN  9  7\n1  1.0  7  6\n2  5.0  9  1\n3  7.0  4  0\n4  NaN  2  3\n5  2.0  4  6\n6  6.0  3  6\n7  0.0  2  7\n8  9.0  1  4\n9  2.0  9  3\n\nIn [11]: df.loc[df['a'].isnull(), 'b']\nOut[11]:\n0    9\n4    2\nName: b, dtype: int32\n
            \n

            UPDATE:

            \n
            In [166]: df\nOut[166]:\n     a    b  c\n0  NaN  5.0  3\n1  7.0  NaN  8\n2  4.0  9.0  7\n3  8.0  NaN  9\n4  3.0  0.0  5\n5  NaN  3.0  5\n6  9.0  0.0  3\n7  0.0  2.0  6\n8  7.0  8.0  7\n9  1.0  7.0  6\n\n\nIn [163]: df[['a','b']].isnull().any(axis=1)\nOut[163]:\n0     True\n1     True\n2    False\n3     True\n4    False\n5     True\n6    False\n7    False\n8    False\n9    False\ndtype: bool\n\nIn [164]: df.loc[df[['a','b']].isnull().any(axis=1)]\nOut[164]:\n     a    b  c\n0  NaN  5.0  3\n1  7.0  NaN  8\n3  8.0  NaN  9\n5  NaN  3.0  5\n\nIn [165]: df.loc[df[['a','b']].isnull().any(axis=1), 'c']\nOut[165]:\n0    3\n1    8\n3    9\n5    5\nName: c, dtype: int32\n
            \n soup wrap:

            It should be pretty straightforward:

            In [10]: df
            Out[10]:
                 a  b  c
            0  NaN  9  7
            1  1.0  7  6
            2  5.0  9  1
            3  7.0  4  0
            4  NaN  2  3
            5  2.0  4  6
            6  6.0  3  6
            7  0.0  2  7
            8  9.0  1  4
            9  2.0  9  3
            
            In [11]: df.loc[df['a'].isnull(), 'b']
            Out[11]:
            0    9
            4    2
            Name: b, dtype: int32
            

            UPDATE:

            In [166]: df
            Out[166]:
                 a    b  c
            0  NaN  5.0  3
            1  7.0  NaN  8
            2  4.0  9.0  7
            3  8.0  NaN  9
            4  3.0  0.0  5
            5  NaN  3.0  5
            6  9.0  0.0  3
            7  0.0  2.0  6
            8  7.0  8.0  7
            9  1.0  7.0  6
            
            
            In [163]: df[['a','b']].isnull().any(axis=1)
            Out[163]:
            0     True
            1     True
            2    False
            3     True
            4    False
            5     True
            6    False
            7    False
            8    False
            9    False
            dtype: bool
            
            In [164]: df.loc[df[['a','b']].isnull().any(axis=1)]
            Out[164]:
                 a    b  c
            0  NaN  5.0  3
            1  7.0  NaN  8
            3  8.0  NaN  9
            5  NaN  3.0  5
            
            In [165]: df.loc[df[['a','b']].isnull().any(axis=1), 'c']
            Out[165]:
            0    3
            1    8
            3    9
            5    5
            Name: c, dtype: int32
            
            qid & accept id: (36241474, 36241738) query: How to plot real-time graph, with both axis dependent on time? soup:

            Answer based on original question

            \n

            You need to use a generator to produce your y data. This works:

            \n
            import numpy as np\nfrom matplotlib import pyplot as plt\nfrom matplotlib import animation\n\n# First set up the figure, the axis, and the plot element we want to animate\nfig = plt.figure()\nax = plt.axes(xlim=(0, 2), ylim=(-2, 2))\nline, = ax.plot([], [], ' o', lw=2)\ng = 9.81\nh = 2\ntc = 200\nxs = [1] # the vertical position is fixed on x-axis\nys = [h, h]\n\n\n# animation function.  This is called sequentially\ndef animate(y):\n    ys[-1] = y\n    line.set_data(xs, ys)\n    return line,\n\ndef get_y():\n  for step in range(tc):\n    t = step / 100.0\n    y = -0.5*g*t**2 + h  # the equation of diver's displacement on y axis\n    yield y\n\n# call the animator.  blit=True means only re-draw the parts that have changed.\nanim = animation.FuncAnimation(fig, animate, frames=get_y, interval=100)\n\nplt.show()\n
            \n

            enter image description here

            \n

            Integrated answer

            \n

            This should work:

            \n
            # -*- coding: utf-8 -*-\n\nfrom math import *\nimport numpy as np\nfrom matplotlib import pyplot as plt\nfrom matplotlib import animation\n\n\ndef Plongeon():\n    h = float(input("height = "))\n    g = 9.81\n\n    #calculate air time, Tc\n    Tc = sqrt(2 * h / g)\n\n    # First set up the figure, the axis, and the plot element we want to animate\n    fig = plt.figure()\n    ax = plt.axes(xlim=(0, 2), ylim=(-2, h+1))  #ymax : initial height+1\n    line, = ax.plot([], [], ' o', lw=2)\n\n    step = 0.01  # animation step\n    xs = [1]  # the vertical position is fixed on x-axis\n    ys = [h]\n\n\n    # animation function.  This is called sequentially\n    def animate(y):\n        ys[-1] = y\n        line.set_data(xs, ys)\n        return line,\n\n    def get_y():\n        t = 0\n        while t <= Tc:\n            y = -0.5 * g * t**2 + h  # the equation of diver's displacement on y axis\n            yield y\n            t += step\n\n    # call the animator.  blit=True means only re-draw the parts that have changed.\n    anim = animation.FuncAnimation(fig, animate, frames=get_y, interval=100)\n\n    plt.show()\nPlongeon()\n
            \n

            I removed unneeded lines. No need for global. Also mass has never been used anywhere in the program.

            \n

            This is the most important part:

            \n
            def get_y():\n     t = 0\n     while t <= Tc:\n         y = -0.5 * g * t**2 + h \n         yield y\n         t += step\n
            \n

            You need to advance your time by an increment.

            \n soup wrap:

            Answer based on original question

            You need to use a generator to produce your y data. This works:

            import numpy as np
            from matplotlib import pyplot as plt
            from matplotlib import animation
            
            # First set up the figure, the axis, and the plot element we want to animate
            fig = plt.figure()
            ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
            line, = ax.plot([], [], ' o', lw=2)
            g = 9.81
            h = 2
            tc = 200
            xs = [1] # the vertical position is fixed on x-axis
            ys = [h, h]
            
            
            # animation function.  This is called sequentially
            def animate(y):
                ys[-1] = y
                line.set_data(xs, ys)
                return line,
            
            def get_y():
              for step in range(tc):
                t = step / 100.0
                y = -0.5*g*t**2 + h  # the equation of diver's displacement on y axis
                yield y
            
            # call the animator.  blit=True means only re-draw the parts that have changed.
            anim = animation.FuncAnimation(fig, animate, frames=get_y, interval=100)
            
            plt.show()
            

            enter image description here

            Integrated answer

            This should work:

            # -*- coding: utf-8 -*-
            
            from math import *
            import numpy as np
            from matplotlib import pyplot as plt
            from matplotlib import animation
            
            
            def Plongeon():
                h = float(input("height = "))
                g = 9.81
            
                #calculate air time, Tc
                Tc = sqrt(2 * h / g)
            
                # First set up the figure, the axis, and the plot element we want to animate
                fig = plt.figure()
                ax = plt.axes(xlim=(0, 2), ylim=(-2, h+1))  #ymax : initial height+1
                line, = ax.plot([], [], ' o', lw=2)
            
                step = 0.01  # animation step
                xs = [1]  # the vertical position is fixed on x-axis
                ys = [h]
            
            
                # animation function.  This is called sequentially
                def animate(y):
                    ys[-1] = y
                    line.set_data(xs, ys)
                    return line,
            
                def get_y():
                    t = 0
                    while t <= Tc:
                        y = -0.5 * g * t**2 + h  # the equation of diver's displacement on y axis
                        yield y
                        t += step
            
                # call the animator.  blit=True means only re-draw the parts that have changed.
                anim = animation.FuncAnimation(fig, animate, frames=get_y, interval=100)
            
                plt.show()
            Plongeon()
            

            I removed unneeded lines. No need for global. Also mass has never been used anywhere in the program.

            This is the most important part:

            def get_y():
                 t = 0
                 while t <= Tc:
                     y = -0.5 * g * t**2 + h 
                     yield y
                     t += step
            

            You need to advance your time by an increment.

            qid & accept id: (36242061, 36242533) query: Most efficient way to delete needless newlines in Python soup:

            The nearest equivalent to the tcl string map would be str.translate, but unfortunately it can only map single characters. So it would be necessary to use a regexp to get a similarly compact example. This can be done with look-behind/look-ahead assertions, but the \r's have to be replaced first:

            \n
            import re\n\noldtext = """\\nThis would keep paragraphs separated.\nThis would keep paragraphs separated.\n\nThis would keep paragraphs separated.\n\tThis would keep paragraphs separated.\n\n\rWhen, in the course\nof human events,\nit becomes necessary\n\rfor one people\n"""\n\nnewtext = re.sub(r'(?
            \n

            output:

            \n
            This would keep paragraphs separated. This would keep paragraphs separated.\n\nThis would keep paragraphs separated.\n    This would keep paragraphs separated.\n\nWhen, in the course of human events, it becomes necessary for one people\n
            \n

            I doubt whether this is as efficient as the tcl code, though.

            \n

            UPDATE:

            \n

            I did a little test using this Project Gutenberg EBook of War and Peace (Plain Text UTF-8, 3.1 MB). Here's my tcl script:

            \n
            set fp [open "gutenberg.txt" r]\nset oldtext [read $fp]\nclose $fp\n\nset newtext [string map "{\r} {} {\n\n} {\n\n} {\n\t} {\n\t} {\n} { }" $oldtext]\n\nputs $newtext\n
            \n

            and my python equivalent:

            \n
            import re\n\nwith open('gutenberg.txt') as stream:\n    oldtext = stream.read()\n\n    newtext = re.sub(r'(?
            \n

            Crude performance test:

            \n
            $ /usr/bin/time -f '%E' tclsh gutenberg.tcl > output1.txt\n0:00.18\n$ /usr/bin/time -f '%E' python gutenberg.py > output2.txt\n0:00.30\n
            \n

            So, as expected, the tcl version is more efficient. However, the output from the python version seems somewhat cleaner (no extra spaces inserted at the beginning of lines).

            \n soup wrap:

            The nearest equivalent to the tcl string map would be str.translate, but unfortunately it can only map single characters. So it would be necessary to use a regexp to get a similarly compact example. This can be done with look-behind/look-ahead assertions, but the \r's have to be replaced first:

            import re
            
            oldtext = """\
            This would keep paragraphs separated.
            This would keep paragraphs separated.
            
            This would keep paragraphs separated.
            \tThis would keep paragraphs separated.
            
            \rWhen, in the course
            of human events,
            it becomes necessary
            \rfor one people
            """
            
            newtext = re.sub(r'(?

            output:

            This would keep paragraphs separated. This would keep paragraphs separated.
            
            This would keep paragraphs separated.
                This would keep paragraphs separated.
            
            When, in the course of human events, it becomes necessary for one people
            

            I doubt whether this is as efficient as the tcl code, though.

            UPDATE:

            I did a little test using this Project Gutenberg EBook of War and Peace (Plain Text UTF-8, 3.1 MB). Here's my tcl script:

            set fp [open "gutenberg.txt" r]
            set oldtext [read $fp]
            close $fp
            
            set newtext [string map "{\r} {} {\n\n} {\n\n} {\n\t} {\n\t} {\n} { }" $oldtext]
            
            puts $newtext
            

            and my python equivalent:

            import re
            
            with open('gutenberg.txt') as stream:
                oldtext = stream.read()
            
                newtext = re.sub(r'(?

            Crude performance test:

            $ /usr/bin/time -f '%E' tclsh gutenberg.tcl > output1.txt
            0:00.18
            $ /usr/bin/time -f '%E' python gutenberg.py > output2.txt
            0:00.30
            

            So, as expected, the tcl version is more efficient. However, the output from the python version seems somewhat cleaner (no extra spaces inserted at the beginning of lines).

            qid & accept id: (36282772, 36282977) query: How to perform a 'one-liner' assignment on all elements of a list of lists in python soup:

            I would not change your own approach but to answer your question:

            \n
            lol = [[1,3],[3,4]]\nfrom operator import setitem\n\nmap(lambda x: setitem(x, 1, -2), lol)\nprint(lol)\n[[1, -2], [3, -2]]\n
            \n

            It does the assignment in place but you are basically using map for side effects and creating a list of None's:

            \n
            In [1]: lol = [[1, 3], [3, 4]]\n\n\nIn [2]: from operator import setitem\n\nIn [3]: map(lambda x: setitem(x, 1, -2), lol)\nOut[3]: [None, None]\n\nIn [4]: lol\nOut[4]: [[1, -2], [3, -2]]\n
            \n

            So really stick to your own for loop logic.

            \n

            They simple loop is also the more performant:

            \n
            In [13]: %%timeit                                          \nlol = [[1,2,3,4,5,6,7,8] for _ in range(100000)]\nmap(lambda x: setitem(x, 1, -2), lol)\n   ....: \n\n10 loops, best of 3: 45.4 ms per loop\n\nIn [14]: \n\nIn [14]: %%timeit                                          \nlol = [[1,2,3,4,5,6,7,8] for _ in range(100000)]\nfor sub in lol:\n    sub[1] = -2\n   ....: \n10 loops, best of 3: 31.7 ms per \n
            \n

            The only time map. filter etc.. really do well is if you can call them with a builtin function or method i.e map(str.strip, iterable), once you include a lambda the performance will usually take a big hit.

            \n soup wrap:

            I would not change your own approach but to answer your question:

            lol = [[1,3],[3,4]]
            from operator import setitem
            
            map(lambda x: setitem(x, 1, -2), lol)
            print(lol)
            [[1, -2], [3, -2]]
            

            It does the assignment in place but you are basically using map for side effects and creating a list of None's:

            In [1]: lol = [[1, 3], [3, 4]]
            
            
            In [2]: from operator import setitem
            
            In [3]: map(lambda x: setitem(x, 1, -2), lol)
            Out[3]: [None, None]
            
            In [4]: lol
            Out[4]: [[1, -2], [3, -2]]
            

            So really stick to your own for loop logic.

            They simple loop is also the more performant:

            In [13]: %%timeit                                          
            lol = [[1,2,3,4,5,6,7,8] for _ in range(100000)]
            map(lambda x: setitem(x, 1, -2), lol)
               ....: 
            
            10 loops, best of 3: 45.4 ms per loop
            
            In [14]: 
            
            In [14]: %%timeit                                          
            lol = [[1,2,3,4,5,6,7,8] for _ in range(100000)]
            for sub in lol:
                sub[1] = -2
               ....: 
            10 loops, best of 3: 31.7 ms per 
            

            The only time map. filter etc.. really do well is if you can call them with a builtin function or method i.e map(str.strip, iterable), once you include a lambda the performance will usually take a big hit.

            qid & accept id: (36336637, 36339413) query: how to know the type of sql query result before it is executed in sqlalchemy soup:

            You can get column types from column_descriptions

            \n
            [c['type'] for c in query.column_descriptions]\n
            \n

            Or if you need to know Python types:

            \n
            [c['type'].python_type for c in query.column_descriptions]\n
            \n soup wrap:

            You can get column types from column_descriptions

            [c['type'] for c in query.column_descriptions]
            

            Or if you need to know Python types:

            [c['type'].python_type for c in query.column_descriptions]
            
            qid & accept id: (36344619, 36344860) query: Getting the key and value of br.forms() in Mechanize soup:

            If you just want a particular value and you know the key:

            \n
            In [18]: response = br.open("http://www.w3schools.com/html/html_forms.asp")\n\nIn [19]: f = list(br.forms())\n\nIn [20]: f[0].get_value("firstname")\nOut[20]: 'Mickey'\nIn [21]: f[0].get_value("lastname")\nOut[21]: 'Mouse'\n
            \n

            You can access all pairs with f._pairs():

            \n
            for f in br.forms():\n    print(f._pairs())\n\nresponse = br.open("http://www.w3schools.com/html/html_forms.asp")\nfor f in br.forms():\n    print(f)\n    print(f._pairs())\n
            \n

            You see it gives you key,value pairs:

            \n
            \n  \n  =Submit) (readonly)>>\n[('firstname', 'Mickey'), ('lastname', 'Mouse')]\n\n  \n  =Submit) (readonly)>>\n[('firstname', 'Mickey'), ('lastname', 'Mouse')]\n\n  \n  \n  =)>>\n[('err_email', ''), ('err_desc', '')]\n
            \n soup wrap:

            If you just want a particular value and you know the key:

            In [18]: response = br.open("http://www.w3schools.com/html/html_forms.asp")
            
            In [19]: f = list(br.forms())
            
            In [20]: f[0].get_value("firstname")
            Out[20]: 'Mickey'
            In [21]: f[0].get_value("lastname")
            Out[21]: 'Mouse'
            

            You can access all pairs with f._pairs():

            for f in br.forms():
                print(f._pairs())
            
            response = br.open("http://www.w3schools.com/html/html_forms.asp")
            for f in br.forms():
                print(f)
                print(f._pairs())
            

            You see it gives you key,value pairs:

            
              
              =Submit) (readonly)>>
            [('firstname', 'Mickey'), ('lastname', 'Mouse')]
            
              
              =Submit) (readonly)>>
            [('firstname', 'Mickey'), ('lastname', 'Mouse')]
            
              
              
              =)>>
            [('err_email', ''), ('err_desc', '')]
            
            qid & accept id: (36364188, 36364234) query: Python - dataframe conditional index value selection soup:

            try this:

            \n
            In [334]: df\nOut[334]:\n               close_price  short_lower_band  long_lower_band\nEquity(8554)        180.53        184.235603       183.964306\nEquity(2174)        166.83        157.450404       157.160282\nEquity(23921)       124.67        127.243468       126.072039\nEquity(26807)       117.91        108.761587       107.190081\nEquity(42950)       108.07         97.491851        96.868036\nEquity(4151)         97.38         98.954371        98.335786\n\nIn [335]:\n\nIn [335]: df[(df.close_price < df.short_lower_band) & \\n   .....:    (df.close_price < df.long_lower_band)].index.values\nOut[335]: array(['Equity(8554)', 'Equity(23921)', 'Equity(4151)'], dtype=object)\n
            \n

            or if you need a plain list instead of numpy array:

            \n
            In [336]: df[(df.close_price < df.short_lower_band) & \\n   .....:    (df.close_price < df.long_lower_band)].index.tolist()\nOut[336]: ['Equity(8554)', 'Equity(23921)', 'Equity(4151)']\n
            \n soup wrap:

            try this:

            In [334]: df
            Out[334]:
                           close_price  short_lower_band  long_lower_band
            Equity(8554)        180.53        184.235603       183.964306
            Equity(2174)        166.83        157.450404       157.160282
            Equity(23921)       124.67        127.243468       126.072039
            Equity(26807)       117.91        108.761587       107.190081
            Equity(42950)       108.07         97.491851        96.868036
            Equity(4151)         97.38         98.954371        98.335786
            
            In [335]:
            
            In [335]: df[(df.close_price < df.short_lower_band) & \
               .....:    (df.close_price < df.long_lower_band)].index.values
            Out[335]: array(['Equity(8554)', 'Equity(23921)', 'Equity(4151)'], dtype=object)
            

            or if you need a plain list instead of numpy array:

            In [336]: df[(df.close_price < df.short_lower_band) & \
               .....:    (df.close_price < df.long_lower_band)].index.tolist()
            Out[336]: ['Equity(8554)', 'Equity(23921)', 'Equity(4151)']
            
            qid & accept id: (36364512, 36364631) query: alternate for multiple constructors soup:

            There are a few ways that you can do something like this. One way would be to have the options have a default value that indicates that you want the default. This could look like this:

            \n
            class MyClass:\n    def __init__(self, options=None):\n        if options is None:\n            options = create_default_parser()\n        self.options = options\n\n    def create_default_parser(self):\n        parser = argparse.ArgumentParser(description='something')\n        parser.add_argument('-v', '--victor', dest='vic', default="winning")\n        options = parser.parse_args()\n        return options\n
            \n

            initializing the default would then look like

            \n
            default = MyClass()\n
            \n

            Another method would be to use a class method like this:

            \n
            class MyClass:\n    def __init__(self, options):\n        self.options = options\n\n    @classmethod\n    def create_default_parser(cls):\n        parser = argparse.ArgumentParser(description='something')\n        parser.add_argument('-v', '--victor', dest='vic', default="winning")\n        options = parser.parse_args()\n        return cls(options)\n
            \n

            and the default would be created like this:

            \n
            default = MyClass.create_default_parser()\n
            \n soup wrap:

            There are a few ways that you can do something like this. One way would be to have the options have a default value that indicates that you want the default. This could look like this:

            class MyClass:
                def __init__(self, options=None):
                    if options is None:
                        options = create_default_parser()
                    self.options = options
            
                def create_default_parser(self):
                    parser = argparse.ArgumentParser(description='something')
                    parser.add_argument('-v', '--victor', dest='vic', default="winning")
                    options = parser.parse_args()
                    return options
            

            initializing the default would then look like

            default = MyClass()
            

            Another method would be to use a class method like this:

            class MyClass:
                def __init__(self, options):
                    self.options = options
            
                @classmethod
                def create_default_parser(cls):
                    parser = argparse.ArgumentParser(description='something')
                    parser.add_argument('-v', '--victor', dest='vic', default="winning")
                    options = parser.parse_args()
                    return cls(options)
            

            and the default would be created like this:

            default = MyClass.create_default_parser()
            
            qid & accept id: (36408096, 36408347) query: is there a way to change the return value of a function without changing the function's body? soup:

            You are looking for a function wrapping another function. This can be done in Python using decorators.

            \n

            Given your function f(x), let's say you'd like to receive the negative function value. And f(x) might be any function with any number of arguments. And possibly you don't really know f(x) at all.

            \n

            Python's standard library comes with functools.wraps, which can be really handy in this case:

            \n
            def g(func):\n    @wraps(func)\n    def wrapper(*args, **kwargs):\n        func_value = func(*args, **kwargs)\n        return -func_value\n    return wrapper\n
            \n

            Now the function g(func) returns a wrapper wrapping func post-processing its output:

            \n
            >>> new_func = g(f)  # your original f(x)\n>>> print(new_func(1))\n0.5\n
            \n

            This works with any function func with any number of positional or keyword arguments.

            \n soup wrap:

            You are looking for a function wrapping another function. This can be done in Python using decorators.

            Given your function f(x), let's say you'd like to receive the negative function value. And f(x) might be any function with any number of arguments. And possibly you don't really know f(x) at all.

            Python's standard library comes with functools.wraps, which can be really handy in this case:

            def g(func):
                @wraps(func)
                def wrapper(*args, **kwargs):
                    func_value = func(*args, **kwargs)
                    return -func_value
                return wrapper
            

            Now the function g(func) returns a wrapper wrapping func post-processing its output:

            >>> new_func = g(f)  # your original f(x)
            >>> print(new_func(1))
            0.5
            

            This works with any function func with any number of positional or keyword arguments.

            qid & accept id: (36409213, 36410187) query: Modifying a recursive function that counts no. of paths, to get sequence of all paths soup:

            The idea is exactly the same as your function, except that you return a tuple of coordinates instead of increasing a counter by 1 when you reach the bottom. By making it a generator you only create the paths as you need them.

            \n
            def generate_paths(depth, x=0, y=0):\n    if x == depth:\n        yield ((x, y),)\n    else:\n        for path in generate_paths(depth, x+1, y):\n            yield ((x, y),) + path\n        for path in generate_paths(depth, x+1, y+1):\n            yield ((x, y),) + path\n
            \n

            Examples.

            \n
            >>> for path in generate_paths(3):\n...    print(path)\n\n((0, 0), (1, 0), (2, 0), (3, 0))\n((0, 0), (1, 0), (2, 0), (3, 1))\n((0, 0), (1, 0), (2, 1), (3, 1))\n((0, 0), (1, 0), (2, 1), (3, 2))\n((0, 0), (1, 1), (2, 1), (3, 1))\n((0, 0), (1, 1), (2, 1), (3, 2))\n((0, 0), (1, 1), (2, 2), (3, 2))\n((0, 0), (1, 1), (2, 2), (3, 3))\n>>> print(len(tuple(generate_paths(14))))\n16384\n
            \n

            This generates all the paths in less than a second. However, just as the problem suggests, you're encouraged to find a more efficient because the complexity is exponential and for longer depths this will be infeasible.

            \n soup wrap:

            The idea is exactly the same as your function, except that you return a tuple of coordinates instead of increasing a counter by 1 when you reach the bottom. By making it a generator you only create the paths as you need them.

            def generate_paths(depth, x=0, y=0):
                if x == depth:
                    yield ((x, y),)
                else:
                    for path in generate_paths(depth, x+1, y):
                        yield ((x, y),) + path
                    for path in generate_paths(depth, x+1, y+1):
                        yield ((x, y),) + path
            

            Examples.

            >>> for path in generate_paths(3):
            ...    print(path)
            
            ((0, 0), (1, 0), (2, 0), (3, 0))
            ((0, 0), (1, 0), (2, 0), (3, 1))
            ((0, 0), (1, 0), (2, 1), (3, 1))
            ((0, 0), (1, 0), (2, 1), (3, 2))
            ((0, 0), (1, 1), (2, 1), (3, 1))
            ((0, 0), (1, 1), (2, 1), (3, 2))
            ((0, 0), (1, 1), (2, 2), (3, 2))
            ((0, 0), (1, 1), (2, 2), (3, 3))
            >>> print(len(tuple(generate_paths(14))))
            16384
            

            This generates all the paths in less than a second. However, just as the problem suggests, you're encouraged to find a more efficient because the complexity is exponential and for longer depths this will be infeasible.

            qid & accept id: (36436065, 36436123) query: Parsing through a file soup:

            You should be splitting on your delimiter to get each column's value for that row:

            \n
            for line in fileinput.readlines():\n    a, b, c = line.split('\t')      # Variable unpacking; assumes each line has three columns\n    if a == '?':\n        function_a()\n    if b == '?':\n        function_b()\n    if c == '?':\n        function_c()\n
            \n

            Or if you're a fan of obnoxious and ugly one-liners:

            \n
            [(function_a, function_b, function_c)[line.split('\t').index('?')]() for line in fileinput.readlines()]\n
            \n soup wrap:

            You should be splitting on your delimiter to get each column's value for that row:

            for line in fileinput.readlines():
                a, b, c = line.split('\t')      # Variable unpacking; assumes each line has three columns
                if a == '?':
                    function_a()
                if b == '?':
                    function_b()
                if c == '?':
                    function_c()
            

            Or if you're a fan of obnoxious and ugly one-liners:

            [(function_a, function_b, function_c)[line.split('\t').index('?')]() for line in fileinput.readlines()]
            
            qid & accept id: (36436953, 36436999) query: Removing word and replacing character in a column of strings soup:

            Try this:

            \n
            In [175]: df.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+)': r'\1_\2'}}, regex=True)\nOut[175]:\n   MemberID Year         DSFS DrugCount\n0  48925661   Y2  9_10 months        7+\n1  90764620   Y3   8_9 months         3\n2  61221204   Y1   2_3 months         1\n
            \n

            In place:

            \n
            In [176]: df\nOut[176]:\n   MemberID Year         DSFS DrugCount\n0  48925661   Y2  9-10 months        7+\n1  90764620   Y3  8- 9 months         3\n2  61221204   Y1  2- 3 months         1\n\nIn [177]: df.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+)': r'\1_\2'}}, regex=True, inplace=True)\n\nIn [178]: df\nOut[178]:\n   MemberID Year         DSFS DrugCount\n0  48925661   Y2  9_10 months        7+\n1  90764620   Y3   8_9 months         3\n2  61221204   Y1   2_3 months         1\n
            \n

            If you want to preserve only numbers you can do it this way:

            \n
            In [183]: df.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'}}, regex=True)\nOut[183]:\n   MemberID Year  DSFS DrugCount\n0  48925661   Y2  9_10        7+\n1  90764620   Y3   8_9         3\n2  61221204   Y1   2_3         1\n
            \n soup wrap:

            Try this:

            In [175]: df.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+)': r'\1_\2'}}, regex=True)
            Out[175]:
               MemberID Year         DSFS DrugCount
            0  48925661   Y2  9_10 months        7+
            1  90764620   Y3   8_9 months         3
            2  61221204   Y1   2_3 months         1
            

            In place:

            In [176]: df
            Out[176]:
               MemberID Year         DSFS DrugCount
            0  48925661   Y2  9-10 months        7+
            1  90764620   Y3  8- 9 months         3
            2  61221204   Y1  2- 3 months         1
            
            In [177]: df.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+)': r'\1_\2'}}, regex=True, inplace=True)
            
            In [178]: df
            Out[178]:
               MemberID Year         DSFS DrugCount
            0  48925661   Y2  9_10 months        7+
            1  90764620   Y3   8_9 months         3
            2  61221204   Y1   2_3 months         1
            

            If you want to preserve only numbers you can do it this way:

            In [183]: df.replace({'DSFS': {r'(\d+)\s*\-\s*(\d+).*': r'\1_\2'}}, regex=True)
            Out[183]:
               MemberID Year  DSFS DrugCount
            0  48925661   Y2  9_10        7+
            1  90764620   Y3   8_9         3
            2  61221204   Y1   2_3         1
            
            qid & accept id: (36445193, 36445821) query: splitting one csv into multiple files in python soup:

            I suggest you not inventing a wheel. There is existing solution. Source here

            \n
            import os\n\n\ndef split(filehandler, delimiter=',', row_limit=1000,\n          output_name_template='output_%s.csv', output_path='.', keep_headers=True):\n    import csv\n    reader = csv.reader(filehandler, delimiter=delimiter)\n    current_piece = 1\n    current_out_path = os.path.join(\n        output_path,\n        output_name_template % current_piece\n    )\n    current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)\n    current_limit = row_limit\n    if keep_headers:\n        headers = reader.next()\n        current_out_writer.writerow(headers)\n    for i, row in enumerate(reader):\n        if i + 1 > current_limit:\n            current_piece += 1\n            current_limit = row_limit * current_piece\n            current_out_path = os.path.join(\n                output_path,\n                output_name_template % current_piece\n            )\n            current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)\n            if keep_headers:\n                current_out_writer.writerow(headers)\n        current_out_writer.writerow(row)\n
            \n

            Use it like:

            \n
            split(open('/your/pat/input.csv', 'r'));\n
            \n soup wrap:

            I suggest you not inventing a wheel. There is existing solution. Source here

            import os
            
            
            def split(filehandler, delimiter=',', row_limit=1000,
                      output_name_template='output_%s.csv', output_path='.', keep_headers=True):
                import csv
                reader = csv.reader(filehandler, delimiter=delimiter)
                current_piece = 1
                current_out_path = os.path.join(
                    output_path,
                    output_name_template % current_piece
                )
                current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)
                current_limit = row_limit
                if keep_headers:
                    headers = reader.next()
                    current_out_writer.writerow(headers)
                for i, row in enumerate(reader):
                    if i + 1 > current_limit:
                        current_piece += 1
                        current_limit = row_limit * current_piece
                        current_out_path = os.path.join(
                            output_path,
                            output_name_template % current_piece
                        )
                        current_out_writer = csv.writer(open(current_out_path, 'w'), delimiter=delimiter)
                        if keep_headers:
                            current_out_writer.writerow(headers)
                    current_out_writer.writerow(row)
            

            Use it like:

            split(open('/your/pat/input.csv', 'r'));
            
            qid & accept id: (36458482, 36459665) query: How to not render a entire string with jinja2 soup:

            Look at docs

            \n

            Jinja2 has truncate filter truncate(s, length=255, killwords=False, end='...'). Example usage

            \n
            {{ blogpost.text|truncate }}
            \n
            \n

            Or

            \n
            {{ blogpost.text|truncate(1024, True) }}
            \n
            \n soup wrap:

            Look at docs

            Jinja2 has truncate filter truncate(s, length=255, killwords=False, end='...'). Example usage

            {{ blogpost.text|truncate }}

            Or

            {{ blogpost.text|truncate(1024, True) }}
            qid & accept id: (36459148, 36460184) query: Pandas: Collapse first n rows in each group by aggregation soup:

            You can start by setting the grp_idx:

            \n
            df["grp_idx"] = np.where(df.groupby("id").cumcount()<3, 0, df["grp_idx"])\n
            \n

            Now id and grp_idx create the grouping you want:

            \n
            df.groupby(["id", "type", "grp_idx"]).sum().reset_index()\n\n    id  type    grp_idx col_1   col_2   flag\n0   283 A       0       12      18      0\n1   283 A       4       8       12      0\n2   283 A       5       10      15      0\n3   283 A       6       12      18      0\n4   283 A       7       14      21      1\n5   756 X       0       30      6       1\n
            \n

            I assumed the type cannot be different for the same id as you didn't give any conditions for that column. I also assumed the df is sorted by id. If not, you can first sort it for grp_idx to be correct.

            \n soup wrap:

            You can start by setting the grp_idx:

            df["grp_idx"] = np.where(df.groupby("id").cumcount()<3, 0, df["grp_idx"])
            

            Now id and grp_idx create the grouping you want:

            df.groupby(["id", "type", "grp_idx"]).sum().reset_index()
            
                id  type    grp_idx col_1   col_2   flag
            0   283 A       0       12      18      0
            1   283 A       4       8       12      0
            2   283 A       5       10      15      0
            3   283 A       6       12      18      0
            4   283 A       7       14      21      1
            5   756 X       0       30      6       1
            

            I assumed the type cannot be different for the same id as you didn't give any conditions for that column. I also assumed the df is sorted by id. If not, you can first sort it for grp_idx to be correct.

            qid & accept id: (36464357, 36674182) query: Matplotlib in Pyside with Qt designer (PySide) soup:

            I'm not an expert on that matter so it might not be the cleanest way of doing this, but here is some working code to get you started:

            \n
              \n
            • I think the easiest way to add widgets is via a QxxxxLayout
            • \n
            • then I just made your Plotter inherit from FigureCanvas
            • \n
            • and told matplotlib to work with PySide
            • \n
            \n

            ui.py:

            \n
            from PySide import QtCore, QtGui\n\nclass Ui_Form(object):\n    def setupUi(self, Form):\n        Form.setObjectName("Form")\n        Form.resize(533, 497)\n        self.mplvl = QtGui.QWidget(Form)\n        self.mplvl.setGeometry(QtCore.QRect(150, 150, 251, 231))\n        self.mplvl.setObjectName("mplvl")\n        self.vLayout = QtGui.QVBoxLayout()\n        self.mplvl.setLayout(self.vLayout)\n        self.retranslateUi(Form)\n        QtCore.QMetaObject.connectSlotsByName(Form)\n\n    def retranslateUi(self, Form):\n        Form.setWindowTitle(QtGui.QApplication.translate("Form", "Form", None, QtGui.QApplication.UnicodeUTF8))\n
            \n

            For this you just have to add the canvas to mplvl in QtDesigner

            \n

            main.py:

            \n
            import matplotlib\nmatplotlib.use('Qt4Agg')\nmatplotlib.rcParams['backend.qt4'] = 'PySide'\nfrom matplotlib.backends.backend_qt4agg import (\n    FigureCanvasQTAgg as FigureCanvas,\n    NavigationToolbar2QT as NavigationToolbar)\nfrom matplotlib.figure import Figure\nfrom PySide import QtGui, QtCore\nimport random\n\nfrom weakref import proxy\nfrom ui import Ui_Form\n\n\nclass Plotter(FigureCanvas):\n    def __init__(self, parent):\n        ''' plot some random stuff '''\n        self.parent = proxy(parent)\n        # random data\n        data = [random.random() for i in range(10)]\n        fig = Figure()\n        super(Plotter,self).__init__(fig)\n        # create an axis\n        self.axes = fig.add_subplot(111)\n        # discards the old graph\n        self.axes.hold(False)\n        # plot data\n        self.axes.plot(data, '*-')\n\n    def binding_plotter_with_ui(self):\n        self.parent.vLayout.insertWidget(1, self)\n\nif __name__ == "__main__":\n    import sys\n    app = QtGui.QApplication(sys.argv)\n    Form = QtGui.QWidget()\n    ui = Ui_Form()\n    ui.setupUi(Form)\n    # plotter logic and binding needs to be added here\n    plotter = Plotter(ui)\n    plotter.binding_plotter_with_ui()\n    plotter2 = Plotter(ui)\n    plotter2.binding_plotter_with_ui()\n    Form.show()\n    sys.exit(app.exec_())\n
            \n

            Now what's left is probably to tweak the FigureCanvas to make it the right size and proportions, so you should be able to get what you want looking at this example or the other.

            \n

            Good luck!

            \n soup wrap:

            I'm not an expert on that matter so it might not be the cleanest way of doing this, but here is some working code to get you started:

            • I think the easiest way to add widgets is via a QxxxxLayout
            • then I just made your Plotter inherit from FigureCanvas
            • and told matplotlib to work with PySide

            ui.py:

            from PySide import QtCore, QtGui
            
            class Ui_Form(object):
                def setupUi(self, Form):
                    Form.setObjectName("Form")
                    Form.resize(533, 497)
                    self.mplvl = QtGui.QWidget(Form)
                    self.mplvl.setGeometry(QtCore.QRect(150, 150, 251, 231))
                    self.mplvl.setObjectName("mplvl")
                    self.vLayout = QtGui.QVBoxLayout()
                    self.mplvl.setLayout(self.vLayout)
                    self.retranslateUi(Form)
                    QtCore.QMetaObject.connectSlotsByName(Form)
            
                def retranslateUi(self, Form):
                    Form.setWindowTitle(QtGui.QApplication.translate("Form", "Form", None, QtGui.QApplication.UnicodeUTF8))
            

            For this you just have to add the canvas to mplvl in QtDesigner

            main.py:

            import matplotlib
            matplotlib.use('Qt4Agg')
            matplotlib.rcParams['backend.qt4'] = 'PySide'
            from matplotlib.backends.backend_qt4agg import (
                FigureCanvasQTAgg as FigureCanvas,
                NavigationToolbar2QT as NavigationToolbar)
            from matplotlib.figure import Figure
            from PySide import QtGui, QtCore
            import random
            
            from weakref import proxy
            from ui import Ui_Form
            
            
            class Plotter(FigureCanvas):
                def __init__(self, parent):
                    ''' plot some random stuff '''
                    self.parent = proxy(parent)
                    # random data
                    data = [random.random() for i in range(10)]
                    fig = Figure()
                    super(Plotter,self).__init__(fig)
                    # create an axis
                    self.axes = fig.add_subplot(111)
                    # discards the old graph
                    self.axes.hold(False)
                    # plot data
                    self.axes.plot(data, '*-')
            
                def binding_plotter_with_ui(self):
                    self.parent.vLayout.insertWidget(1, self)
            
            if __name__ == "__main__":
                import sys
                app = QtGui.QApplication(sys.argv)
                Form = QtGui.QWidget()
                ui = Ui_Form()
                ui.setupUi(Form)
                # plotter logic and binding needs to be added here
                plotter = Plotter(ui)
                plotter.binding_plotter_with_ui()
                plotter2 = Plotter(ui)
                plotter2.binding_plotter_with_ui()
                Form.show()
                sys.exit(app.exec_())
            

            Now what's left is probably to tweak the FigureCanvas to make it the right size and proportions, so you should be able to get what you want looking at this example or the other.

            Good luck!

            qid & accept id: (36479374, 36479702) query: Python identify in which interval the numbers are soup:

            numpy has nice support for this without having to write a for loop:

            \n
            import numpy as np\n\ndata = np.array([0.2, 6.4, 3.0, 1.6])\nbins = np.array([0.0, 1.0, 2.5, 4.0, 10.0])\ncats = np.digitize(data, bins)\ncats\n# array([1, 4, 3, 2])\n
            \n

            If you insist on a for loop, just iterate over the elements to bin, and the bins:

            \n
            data = [0.2, 6.4, 3.0]\nbins = [(0.0, 1.0), (1.0, 4.0), (4.0, 10.0)]  # assumed (lower, upper] format\ncats = []\n\nfor elem in data:\n    for idx, bounds in enumerate(bins, start=1):\n        if bounds[0] < elem <= bounds[1]:\n            cats.append(idx)\n            break\n    else:\n        raise ValueError('No bin for {}'.format(elem))\n
            \n

            The above uses tuples to specify the bin ranges (like your example), but that's not technically necessary (e.g. the numpy code). You could store just the cutoffs and compare adjacent elements from cutoffs[:-1].

            \n soup wrap:

            numpy has nice support for this without having to write a for loop:

            import numpy as np
            
            data = np.array([0.2, 6.4, 3.0, 1.6])
            bins = np.array([0.0, 1.0, 2.5, 4.0, 10.0])
            cats = np.digitize(data, bins)
            cats
            # array([1, 4, 3, 2])
            

            If you insist on a for loop, just iterate over the elements to bin, and the bins:

            data = [0.2, 6.4, 3.0]
            bins = [(0.0, 1.0), (1.0, 4.0), (4.0, 10.0)]  # assumed (lower, upper] format
            cats = []
            
            for elem in data:
                for idx, bounds in enumerate(bins, start=1):
                    if bounds[0] < elem <= bounds[1]:
                        cats.append(idx)
                        break
                else:
                    raise ValueError('No bin for {}'.format(elem))
            

            The above uses tuples to specify the bin ranges (like your example), but that's not technically necessary (e.g. the numpy code). You could store just the cutoffs and compare adjacent elements from cutoffs[:-1].

            qid & accept id: (36495903, 36495964) query: Convert pandas datetime objects soup:

            If you just want a new string representation then use dt.strftime:

            \n
            In [7]:\ndf['Time'] = df['Date'].dt.strftime('%H:%M:%S')\ndf\n\nOut[7]:\n            Timestamp                    Date      Time\n0  20160208_095900.51 2016-02-08 09:59:00.510  09:59:00\n1  20160208_095901.51 2016-02-08 09:59:01.510  09:59:01\n2  20160208_095902.51 2016-02-08 09:59:02.510  09:59:02\n3  20160208_095903.51 2016-02-08 09:59:03.510  09:59:03\n4  20160208_095904.51 2016-02-08 09:59:04.510  09:59:04\n5  20160208_095905.51 2016-02-08 09:59:05.510  09:59:05\n6  20160208_095906.51 2016-02-08 09:59:06.510  09:59:06\n7  20160208_095907.51 2016-02-08 09:59:07.510  09:59:07\n8  20160208_095908.51 2016-02-08 09:59:08.510  09:59:08\n9  20160208_095909.51 2016-02-08 09:59:09.510  09:59:09\n
            \n

            If you want the datetime.time component then use dt.time:

            \n
            In [8]:\ndf['Time'] = df['Date'].dt.time\ndf\n\nOut[8]:\n            Timestamp                    Date             Time\n0  20160208_095900.51 2016-02-08 09:59:00.510  09:59:00.510000\n1  20160208_095901.51 2016-02-08 09:59:01.510  09:59:01.510000\n2  20160208_095902.51 2016-02-08 09:59:02.510  09:59:02.510000\n3  20160208_095903.51 2016-02-08 09:59:03.510  09:59:03.510000\n4  20160208_095904.51 2016-02-08 09:59:04.510  09:59:04.510000\n5  20160208_095905.51 2016-02-08 09:59:05.510  09:59:05.510000\n6  20160208_095906.51 2016-02-08 09:59:06.510  09:59:06.510000\n7  20160208_095907.51 2016-02-08 09:59:07.510  09:59:07.510000\n8  20160208_095908.51 2016-02-08 09:59:08.510  09:59:08.510000\n9  20160208_095909.51 2016-02-08 09:59:09.510  09:59:09.510000\n
            \n soup wrap:

            If you just want a new string representation then use dt.strftime:

            In [7]:
            df['Time'] = df['Date'].dt.strftime('%H:%M:%S')
            df
            
            Out[7]:
                        Timestamp                    Date      Time
            0  20160208_095900.51 2016-02-08 09:59:00.510  09:59:00
            1  20160208_095901.51 2016-02-08 09:59:01.510  09:59:01
            2  20160208_095902.51 2016-02-08 09:59:02.510  09:59:02
            3  20160208_095903.51 2016-02-08 09:59:03.510  09:59:03
            4  20160208_095904.51 2016-02-08 09:59:04.510  09:59:04
            5  20160208_095905.51 2016-02-08 09:59:05.510  09:59:05
            6  20160208_095906.51 2016-02-08 09:59:06.510  09:59:06
            7  20160208_095907.51 2016-02-08 09:59:07.510  09:59:07
            8  20160208_095908.51 2016-02-08 09:59:08.510  09:59:08
            9  20160208_095909.51 2016-02-08 09:59:09.510  09:59:09
            

            If you want the datetime.time component then use dt.time:

            In [8]:
            df['Time'] = df['Date'].dt.time
            df
            
            Out[8]:
                        Timestamp                    Date             Time
            0  20160208_095900.51 2016-02-08 09:59:00.510  09:59:00.510000
            1  20160208_095901.51 2016-02-08 09:59:01.510  09:59:01.510000
            2  20160208_095902.51 2016-02-08 09:59:02.510  09:59:02.510000
            3  20160208_095903.51 2016-02-08 09:59:03.510  09:59:03.510000
            4  20160208_095904.51 2016-02-08 09:59:04.510  09:59:04.510000
            5  20160208_095905.51 2016-02-08 09:59:05.510  09:59:05.510000
            6  20160208_095906.51 2016-02-08 09:59:06.510  09:59:06.510000
            7  20160208_095907.51 2016-02-08 09:59:07.510  09:59:07.510000
            8  20160208_095908.51 2016-02-08 09:59:08.510  09:59:08.510000
            9  20160208_095909.51 2016-02-08 09:59:09.510  09:59:09.510000
            
            qid & accept id: (36524627, 36524746) query: combining lists inside values in pyspark soup:

            the problem was I considered each item from the result of collect operation as a key value pair, but instead it's a Tuple with key as first entry and value, the second. So I iterated upon then using the following lambda, and I got the result.

            \n
            def append_values_inside(key, value):\n    temp = []\n    for v in value:\n        for entry in v:\n            temp.append(entry)\n    return (key, temp)\nfor entry in ratings_and_users.map(lambda a: append_values_inside(a[0], a[1])).collect() :\n        print(entry)\n
            \n

            Final result:

            \n
            (b'"20599"', [7.0, b'"349802972X"', 'bamberg, franken, germany', 'NULL'])\n(b'"120675"', [0.0, b'"0972189408"', 'crescent city, california, usa', 45])\n(b'"166487"', [6.0, b'"8422626993"', 'santander, n/a, spain', 103])\n(b'"166487"', [7.0, b'"8440639228"', 'santander, n/a, spain', 103])\n
            \n soup wrap:

            the problem was I considered each item from the result of collect operation as a key value pair, but instead it's a Tuple with key as first entry and value, the second. So I iterated upon then using the following lambda, and I got the result.

            def append_values_inside(key, value):
                temp = []
                for v in value:
                    for entry in v:
                        temp.append(entry)
                return (key, temp)
            for entry in ratings_and_users.map(lambda a: append_values_inside(a[0], a[1])).collect() :
                    print(entry)
            

            Final result:

            (b'"20599"', [7.0, b'"349802972X"', 'bamberg, franken, germany', 'NULL'])
            (b'"120675"', [0.0, b'"0972189408"', 'crescent city, california, usa', 45])
            (b'"166487"', [6.0, b'"8422626993"', 'santander, n/a, spain', 103])
            (b'"166487"', [7.0, b'"8440639228"', 'santander, n/a, spain', 103])
            
            qid & accept id: (36549666, 36549887) query: Pythonic way of comparing all adjacent elements in a list soup:

            zip lets you combine multiple iterators:

            \n
            for i,j in zip(range(0,len(A)-1), range(1,len(A))):\n    #some operation between A[i] and A[j]\n
            \n

            you can also use enumerate on a range object:

            \n
            for i,j in enumerate(range(1,len(A)):\n    #some operation between A[i] and A[j]\n
            \n

            Note that unlike the other answers this gives you access to the indices of A not just the items, this is necessary if you want to use any assignment to A[i] or A[j], for example here is a very basic bubble sort:

            \n
            A = list(range(10))\nfound1=True\nwhile found1:\n    found1=False\n    for i,j in enumerate(range(1,len(A))):\n        if A[i] < A[j]:\n            A[i],A[j] = A[j],A[i]\n            found1=True\nprint(A)\n
            \n

            this is only possible when you iterate over the indices of A.

            \n soup wrap:

            zip lets you combine multiple iterators:

            for i,j in zip(range(0,len(A)-1), range(1,len(A))):
                #some operation between A[i] and A[j]
            

            you can also use enumerate on a range object:

            for i,j in enumerate(range(1,len(A)):
                #some operation between A[i] and A[j]
            

            Note that unlike the other answers this gives you access to the indices of A not just the items, this is necessary if you want to use any assignment to A[i] or A[j], for example here is a very basic bubble sort:

            A = list(range(10))
            found1=True
            while found1:
                found1=False
                for i,j in enumerate(range(1,len(A))):
                    if A[i] < A[j]:
                        A[i],A[j] = A[j],A[i]
                        found1=True
            print(A)
            

            this is only possible when you iterate over the indices of A.

            qid & accept id: (36550795, 36551972) query: Pythonic way of looping over variable that is either an element or a list soup:

            First, you might want to include different types (rather than list) in your check, and a quick way of doing that would be:

            \n
            def is_iterable(x):\n    return type(x) in [list, tuple] # or just isinstance(x, list)\n
            \n
            \n

            With that, I would probably end up doing something like:

            \n
            if is_iterable(test):\n    for x in test:\n        do_stuff(x)\nelse:\n    do_stuff(test)\n
            \n

            Or if you expect any return:

            \n
            if is_iterable(test):\n    return [do_stuff(x) for x in test]\nelse:\n    return [do_stuff(test)]\n
            \n

            I don't know whether is the more pythonic way (or not) of doing that, but for me, is the most readable one. If you really really want to reduce space, probably your option is the way to go, as is the best practical way of getting a one-liner. However, I don't think there is any performance improvement (maybe just quite the opposite).

            \n

            Another last option, if your do_stuff is not defined as a function, and thus you don't want to copy-paste code (never do that), would be to just get the assignment out:

            \n
            test = test if is_iterable(test) else [test]\nfor x in test:\n    do_stuff\n    ...\n
            \n

            But this is in essence the same as what you already have. In my personal experience, it is usually useful to get all the preprocessing out of the calculation step and make sure all the parameters have valid types. Then just perform whatever operation you need to do on them.

            \n soup wrap:

            First, you might want to include different types (rather than list) in your check, and a quick way of doing that would be:

            def is_iterable(x):
                return type(x) in [list, tuple] # or just isinstance(x, list)
            

            With that, I would probably end up doing something like:

            if is_iterable(test):
                for x in test:
                    do_stuff(x)
            else:
                do_stuff(test)
            

            Or if you expect any return:

            if is_iterable(test):
                return [do_stuff(x) for x in test]
            else:
                return [do_stuff(test)]
            

            I don't know whether is the more pythonic way (or not) of doing that, but for me, is the most readable one. If you really really want to reduce space, probably your option is the way to go, as is the best practical way of getting a one-liner. However, I don't think there is any performance improvement (maybe just quite the opposite).

            Another last option, if your do_stuff is not defined as a function, and thus you don't want to copy-paste code (never do that), would be to just get the assignment out:

            test = test if is_iterable(test) else [test]
            for x in test:
                do_stuff
                ...
            

            But this is in essence the same as what you already have. In my personal experience, it is usually useful to get all the preprocessing out of the calculation step and make sure all the parameters have valid types. Then just perform whatever operation you need to do on them.

            qid & accept id: (36554940, 36556556) query: Gnuplot: use a function to transform a column of a data file and plot the transformed data and the function soup:

            Use

            \n
            plot "energy_vs_volume.dat" using 1:(P($1))\n
            \n

            to apply the function P(x) to your data. To write the calculated values to a file, wrap the plot command in set/unset table pair:

            \n
            plot "energy_vs_volume.dat" using 1:(P($1))\nset table "output.dat"\nreplot\nunset table\n
            \n

            The generated file contains three columns, the values as given by the using statement, and in the third column a character which indicates if the values were in range (i), out of range (o) or undefined (u).

            \n soup wrap:

            Use

            plot "energy_vs_volume.dat" using 1:(P($1))
            

            to apply the function P(x) to your data. To write the calculated values to a file, wrap the plot command in set/unset table pair:

            plot "energy_vs_volume.dat" using 1:(P($1))
            set table "output.dat"
            replot
            unset table
            

            The generated file contains three columns, the values as given by the using statement, and in the third column a character which indicates if the values were in range (i), out of range (o) or undefined (u).

            qid & accept id: (36558005, 36560401) query: Interpolating 3d data at a single point in space (Python 2.7) soup:

            A basic example. Note that the meshgrid is not needed for the interpolation, but only to make a fast ufunc to generate an example function A=f(x,y,z), here A=x+y+z.

            \n
            from scipy.interpolate import interpn\nimport numpy as np\n\n#make up a regular 3d grid \nX=np.linspace(-5,5,11)\nY=np.linspace(-5,5,11)\nZ=np.linspace(-5,5,11)\nxv,yv,zv = np.meshgrid(X,Y,Z)\n\n# make up a function   \n# see http://docs.scipy.org/doc/numpy/reference/ufuncs.html\nA = np.add(xv,np.add(yv,zv))   \n#this one is easy enough for us to know what to expect at (.5,.5,.5)\n\n# usage : interpn(points, values, xi, method='linear', bounds_error=True, fill_value=nan) \ninterpn((X,Y,Z),A,[0.5,0.5,0.5])\n
            \n

            Output:

            \n
            array([ 1.5])\n
            \n

            If you pass in an array of points of interest, it will give you multiple answers.

            \n soup wrap:

            A basic example. Note that the meshgrid is not needed for the interpolation, but only to make a fast ufunc to generate an example function A=f(x,y,z), here A=x+y+z.

            from scipy.interpolate import interpn
            import numpy as np
            
            #make up a regular 3d grid 
            X=np.linspace(-5,5,11)
            Y=np.linspace(-5,5,11)
            Z=np.linspace(-5,5,11)
            xv,yv,zv = np.meshgrid(X,Y,Z)
            
            # make up a function   
            # see http://docs.scipy.org/doc/numpy/reference/ufuncs.html
            A = np.add(xv,np.add(yv,zv))   
            #this one is easy enough for us to know what to expect at (.5,.5,.5)
            
            # usage : interpn(points, values, xi, method='linear', bounds_error=True, fill_value=nan) 
            interpn((X,Y,Z),A,[0.5,0.5,0.5])
            

            Output:

            array([ 1.5])
            

            If you pass in an array of points of interest, it will give you multiple answers.

            qid & accept id: (36559053, 36559570) query: Creating an OrderedDict from a csv file soup:

            I don't know why you are trying to pop keys off of the dictionary and items off the list. It doesn't seem to serve your purpose of creating an OrderedDict.

            \n

            This is the solution I came to. It doesn't pop any items (again because I don't know exactly why you are doing that).

            \n
            import csv\nfrom collections import OrderedDict\n\nfile = open('example.csv', mode='r')\n\ncsvReader = csv.reader(file)\n\n# get rid of header row\nheader = next(csvReader)\n# print(header)\n\nodict = OrderedDict()\nfor row in csvReader:\n    odict[row[0]] = row[1:]\n    # print(row)\n\nprint(odict)\n
            \n

            This could be cleaner and more reusable if it is put into a function, like so:

            \n
            import csv\nfrom collections import OrderedDict\n\ndef parse_csv(filename):\n\n    file = open(filename, mode='r')\n\n    csvReader = csv.reader(file)\n\n    # get rid of header row\n    header = next(csvReader)\n    # print(header)\n\n    odict = OrderedDict()\n    for row in csvReader:\n        odict[row[0]] = row[1:]\n        # print(row)\n\n    return odict\n\nparse_csv('example.csv')\n
            \n soup wrap:

            I don't know why you are trying to pop keys off of the dictionary and items off the list. It doesn't seem to serve your purpose of creating an OrderedDict.

            This is the solution I came to. It doesn't pop any items (again because I don't know exactly why you are doing that).

            import csv
            from collections import OrderedDict
            
            file = open('example.csv', mode='r')
            
            csvReader = csv.reader(file)
            
            # get rid of header row
            header = next(csvReader)
            # print(header)
            
            odict = OrderedDict()
            for row in csvReader:
                odict[row[0]] = row[1:]
                # print(row)
            
            print(odict)
            

            This could be cleaner and more reusable if it is put into a function, like so:

            import csv
            from collections import OrderedDict
            
            def parse_csv(filename):
            
                file = open(filename, mode='r')
            
                csvReader = csv.reader(file)
            
                # get rid of header row
                header = next(csvReader)
                # print(header)
            
                odict = OrderedDict()
                for row in csvReader:
                    odict[row[0]] = row[1:]
                    # print(row)
            
                return odict
            
            parse_csv('example.csv')
            
            qid & accept id: (36572221, 36573116) query: How to find ngram frequency of a column in a pandas dataframe? soup:

            If your data is like

            \n
            import pandas as pd\ndf = pd.DataFrame([\n    'must watch. Good acting',\n    'average movie. Bad acting',\n    'good movie. Good acting',\n    'pathetic. Avoid',\n    'avoid'], columns=['description'])\n
            \n

            You could use the CountVectorizer of the package sklearn:

            \n
            from sklearn.feature_extraction.text import CountVectorizer\nword_vectorizer = CountVectorizer(ngram_range=(1,2), analyzer='word')\nsparse_matrix = word_vectorizer.fit_transform(df['description'])\nfrequencies = sum(sparse_matrix).toarray()[0]\npd.DataFrame(frequencies, index=word_vectorizer.get_feature_names(), columns=['frequency'])\n
            \n

            Which gives you :

            \n
                            frequency\ngood            3\npathetic        1\naverage movie   1\nmovie bad       2\nwatch           1\ngood movie      1\nwatch good      3\ngood acting     2\nmust            1\nmovie good      2\npathetic avoid  1\nbad acting      1\naverage         1\nmust watch      1\nacting          1\nbad             1\nmovie           1\navoid           1\n
            \n

            EDIT

            \n

            fit will just "train" your vectorizer : it will split the words of your corpus and create a vocabulary with it. Then transform can take a new document and create vector of frequency based on the vectorizer vocabulary.

            \n

            Here your training set is your output set, so you can do both at the same time (fit_transform). Because you have 5 documents, it will create 5 vectors as a matrix. You want a global vector, so you have to make a sum.

            \n soup wrap:

            If your data is like

            import pandas as pd
            df = pd.DataFrame([
                'must watch. Good acting',
                'average movie. Bad acting',
                'good movie. Good acting',
                'pathetic. Avoid',
                'avoid'], columns=['description'])
            

            You could use the CountVectorizer of the package sklearn:

            from sklearn.feature_extraction.text import CountVectorizer
            word_vectorizer = CountVectorizer(ngram_range=(1,2), analyzer='word')
            sparse_matrix = word_vectorizer.fit_transform(df['description'])
            frequencies = sum(sparse_matrix).toarray()[0]
            pd.DataFrame(frequencies, index=word_vectorizer.get_feature_names(), columns=['frequency'])
            

            Which gives you :

                            frequency
            good            3
            pathetic        1
            average movie   1
            movie bad       2
            watch           1
            good movie      1
            watch good      3
            good acting     2
            must            1
            movie good      2
            pathetic avoid  1
            bad acting      1
            average         1
            must watch      1
            acting          1
            bad             1
            movie           1
            avoid           1
            

            EDIT

            fit will just "train" your vectorizer : it will split the words of your corpus and create a vocabulary with it. Then transform can take a new document and create vector of frequency based on the vectorizer vocabulary.

            Here your training set is your output set, so you can do both at the same time (fit_transform). Because you have 5 documents, it will create 5 vectors as a matrix. You want a global vector, so you have to make a sum.

            qid & accept id: (36579996, 36582214) query: Python: Loop through all nested key-value pairs created by xmltodict soup:

            If you come across a list in the data then you just need to call myprint on every element of the list:

            \n
            def myprint(d):\n    if isinstance(d,dict): #check if it's a dict before using .iteritems()\n        for k, v in d.iteritems():\n            if isinstance(v, (list,dict)): #check for either list or dict\n                myprint(v)\n            else:\n                print "Key :{0},  Value: {1}".format(k, v)\n    elif isinstance(d,list): #allow for list input too\n        for item in d:\n            myprint(item)\n
            \n

            then you will get an output something like:

            \n
            ...\nKey :@name,  Value: Employee\nKey :@isMandotory,  Value: True\nKey :#text,  Value: Jake Roberts\nKey :@name,  Value: Section\nKey :@isOpen,  Value: True\nKey :@isMandotory,  Value: False\nKey :#text,  Value: 5\n...\n
            \n

            Although I'm not sure how useful this is since you have a lot of duplicate keys like @name, I'd like to offer a function I created a while ago to traverse nested json data of nested dicts and lists:

            \n
            def traverse(obj, prev_path = "obj", path_repr = "{}[{!r}]".format)\n    if isinstance(obj,dict):\n        it = obj.items()\n    elif isinstance(obj,list):\n        it = enumerate(obj)\n    else:\n        yield prev_path,obj\n        return\n    for k,v in it:\n        for data in traverse(v, path_repr(prev_path,k), path_repr):\n            yield data\n
            \n

            Then you can traverse the data with:

            \n
            for path,value in traverse(doc):\n    print("{} = {}".format(path,value))\n
            \n

            with the default values for prev_path and path_repr it gives output like this:

            \n
            obj[u'session'][u'@id'] = 2934\nobj[u'session'][u'@name'] = Valves\nobj[u'session'][u'@docVersion'] = 5.0.1\nobj[u'session'][u'docInfo'][u'field'][0][u'@name'] = Employee\nobj[u'session'][u'docInfo'][u'field'][0][u'@isMandotory'] = True\nobj[u'session'][u'docInfo'][u'field'][0]['#text'] = Jake Roberts\nobj[u'session'][u'docInfo'][u'field'][1][u'@name'] = Section\nobj[u'session'][u'docInfo'][u'field'][1][u'@isOpen'] = True\nobj[u'session'][u'docInfo'][u'field'][1][u'@isMandotory'] = False\nobj[u'session'][u'docInfo'][u'field'][1]['#text'] = 5\nobj[u'session'][u'docInfo'][u'field'][2][u'@name'] = Location\nobj[u'session'][u'docInfo'][u'field'][2][u'@isOpen'] = True\nobj[u'session'][u'docInfo'][u'field'][2][u'@isMandotory'] = False\nobj[u'session'][u'docInfo'][u'field'][2]['#text'] = Munchen\n
            \n

            although you can write a function for path_repr to take the value of prev_path (determined by recursively calling path_repr) and the new key, for example if you want to get the indices as a tuple you could do this:

            \n
            def add_to_tuple(prev,new):\n    return prev+(new,) #prev is a tuple, add in the new element to the tuple\n\nfor path,value in traverse(doc,(),add_to_tuple): #prev_path is initially an empty tuple\n    print("{} = {}".format(path,value))\n
            \n

            then the output would be:

            \n
            ...\n(u'session', u'docInfo', u'field', 0, '#text') = Jake Roberts\n(u'session', u'docInfo', u'field', 1, u'@name') = Section\n(u'session', u'docInfo', u'field', 1, u'@isOpen') = True\n(u'session', u'docInfo', u'field', 1, u'@isMandotory') = False\n(u'session', u'docInfo', u'field', 1, '#text') = 5\n...\n
            \n

            I found this particularly useful when dealing with my json data but I'm not really sure what you want to do with your xml.

            \n soup wrap:

            If you come across a list in the data then you just need to call myprint on every element of the list:

            def myprint(d):
                if isinstance(d,dict): #check if it's a dict before using .iteritems()
                    for k, v in d.iteritems():
                        if isinstance(v, (list,dict)): #check for either list or dict
                            myprint(v)
                        else:
                            print "Key :{0},  Value: {1}".format(k, v)
                elif isinstance(d,list): #allow for list input too
                    for item in d:
                        myprint(item)
            

            then you will get an output something like:

            ...
            Key :@name,  Value: Employee
            Key :@isMandotory,  Value: True
            Key :#text,  Value: Jake Roberts
            Key :@name,  Value: Section
            Key :@isOpen,  Value: True
            Key :@isMandotory,  Value: False
            Key :#text,  Value: 5
            ...
            

            Although I'm not sure how useful this is since you have a lot of duplicate keys like @name, I'd like to offer a function I created a while ago to traverse nested json data of nested dicts and lists:

            def traverse(obj, prev_path = "obj", path_repr = "{}[{!r}]".format)
                if isinstance(obj,dict):
                    it = obj.items()
                elif isinstance(obj,list):
                    it = enumerate(obj)
                else:
                    yield prev_path,obj
                    return
                for k,v in it:
                    for data in traverse(v, path_repr(prev_path,k), path_repr):
                        yield data
            

            Then you can traverse the data with:

            for path,value in traverse(doc):
                print("{} = {}".format(path,value))
            

            with the default values for prev_path and path_repr it gives output like this:

            obj[u'session'][u'@id'] = 2934
            obj[u'session'][u'@name'] = Valves
            obj[u'session'][u'@docVersion'] = 5.0.1
            obj[u'session'][u'docInfo'][u'field'][0][u'@name'] = Employee
            obj[u'session'][u'docInfo'][u'field'][0][u'@isMandotory'] = True
            obj[u'session'][u'docInfo'][u'field'][0]['#text'] = Jake Roberts
            obj[u'session'][u'docInfo'][u'field'][1][u'@name'] = Section
            obj[u'session'][u'docInfo'][u'field'][1][u'@isOpen'] = True
            obj[u'session'][u'docInfo'][u'field'][1][u'@isMandotory'] = False
            obj[u'session'][u'docInfo'][u'field'][1]['#text'] = 5
            obj[u'session'][u'docInfo'][u'field'][2][u'@name'] = Location
            obj[u'session'][u'docInfo'][u'field'][2][u'@isOpen'] = True
            obj[u'session'][u'docInfo'][u'field'][2][u'@isMandotory'] = False
            obj[u'session'][u'docInfo'][u'field'][2]['#text'] = Munchen
            

            although you can write a function for path_repr to take the value of prev_path (determined by recursively calling path_repr) and the new key, for example if you want to get the indices as a tuple you could do this:

            def add_to_tuple(prev,new):
                return prev+(new,) #prev is a tuple, add in the new element to the tuple
            
            for path,value in traverse(doc,(),add_to_tuple): #prev_path is initially an empty tuple
                print("{} = {}".format(path,value))
            

            then the output would be:

            ...
            (u'session', u'docInfo', u'field', 0, '#text') = Jake Roberts
            (u'session', u'docInfo', u'field', 1, u'@name') = Section
            (u'session', u'docInfo', u'field', 1, u'@isOpen') = True
            (u'session', u'docInfo', u'field', 1, u'@isMandotory') = False
            (u'session', u'docInfo', u'field', 1, '#text') = 5
            ...
            

            I found this particularly useful when dealing with my json data but I'm not really sure what you want to do with your xml.

            qid & accept id: (36597386, 36602277) query: Match C++ Strings and String Literals using regex in Python soup:

            You can grab all the string literals with the following regex:

            \n
            r'(?P(?:\bu8|\b[LuU])?)(?:"(?P[^"\\]*(?:\\.[^"\\]*)*)"|\'(?P[^\'\\]*(?:\\.[^\'\\]*)*)\')|R"([^"(]*)\((?P.*?)\)\4"'\n
            \n

            See the regex demo

            \n

            Explanation:

            \n
              \n
            • (?P(?:\bu8|\b[LuU])?) - (Group named "prefix") the optional prefix, either u8 (whole word) or L, u, U (as whole words)
            • \n
            • (?:"(?P[^"\\]*(?:\\.[^"\\\\]*)*)" - a double quoted string literal, with the contents between " captured into Group named "dbl". The part is matching ", then 0+ characters other than \ and " followed with any number (0+) of sequences of an escape sequence (\\.) followed with 0+ characters other than \ and " (it is an unrolled version of (?:[^"\\]|\\.)*)
            • \n
            • | - or
            • \n
            • \'(?P[^\'\\]*(?:\\.[^\'\\]*)*)\') - a single quoted string literal, with the contents between ' captured into Group named "sngl". See details on how it works above.
            • \n
            • | - or
            • \n
            • R"([^"(]*)\((?P.*?)\)\4" - this is a raw string literal part capturing the contents into a group named raw. First, R is matched. Then " followed with 0+ characters other than " and ( while capturing the delimiter value into Group 4 (as all named groups also have their numeric IDs), and then the inside conetents are matched with a lazy construct (use re.S if the strings are multiline), up to the first ) followed with the contents of Group 4 (the raw string literal delimiter), and then the final ".
            • \n
            \n

            Sample Python demo:

            \n
            import re\n\np = re.compile(r'(?P(?:\bu8|\b[LuU])?)(?:"(?P[^"\\]*(?:\\.[^"\\]*)*)"|\'(?P[^\'\\]*(?:\\.[^\'\\]*)*)\')|R"([^"(]*)\((?P.*?)\)\4"')\ns = "\"text'\\\"here\"\nL'text\\'\"here'\nu8\"text'\\\"here\"\nu'text\\'\"here'\nU\"text'\\\"here\"\nR\"delimiter(text\"'\"here)delimiter\""\nprint(s)\nprint('--------- Regex works below ---------')\nfor x in p.finditer(s):\n    if x.group("dbl"):\n        print(x.group("dbl"))\n    elif x.group("sngl"):\n        print(x.group("sngl"))\n    else:\n        print(x.group("raw"))\n
            \n soup wrap:

            You can grab all the string literals with the following regex:

            r'(?P(?:\bu8|\b[LuU])?)(?:"(?P[^"\\]*(?:\\.[^"\\]*)*)"|\'(?P[^\'\\]*(?:\\.[^\'\\]*)*)\')|R"([^"(]*)\((?P.*?)\)\4"'
            

            See the regex demo

            Explanation:

            • (?P(?:\bu8|\b[LuU])?) - (Group named "prefix") the optional prefix, either u8 (whole word) or L, u, U (as whole words)
            • (?:"(?P[^"\\]*(?:\\.[^"\\\\]*)*)" - a double quoted string literal, with the contents between " captured into Group named "dbl". The part is matching ", then 0+ characters other than \ and " followed with any number (0+) of sequences of an escape sequence (\\.) followed with 0+ characters other than \ and " (it is an unrolled version of (?:[^"\\]|\\.)*)
            • | - or
            • \'(?P[^\'\\]*(?:\\.[^\'\\]*)*)\') - a single quoted string literal, with the contents between ' captured into Group named "sngl". See details on how it works above.
            • | - or
            • R"([^"(]*)\((?P.*?)\)\4" - this is a raw string literal part capturing the contents into a group named raw. First, R is matched. Then " followed with 0+ characters other than " and ( while capturing the delimiter value into Group 4 (as all named groups also have their numeric IDs), and then the inside conetents are matched with a lazy construct (use re.S if the strings are multiline), up to the first ) followed with the contents of Group 4 (the raw string literal delimiter), and then the final ".

            Sample Python demo:

            import re
            
            p = re.compile(r'(?P(?:\bu8|\b[LuU])?)(?:"(?P[^"\\]*(?:\\.[^"\\]*)*)"|\'(?P[^\'\\]*(?:\\.[^\'\\]*)*)\')|R"([^"(]*)\((?P.*?)\)\4"')
            s = "\"text'\\\"here\"\nL'text\\'\"here'\nu8\"text'\\\"here\"\nu'text\\'\"here'\nU\"text'\\\"here\"\nR\"delimiter(text\"'\"here)delimiter\""
            print(s)
            print('--------- Regex works below ---------')
            for x in p.finditer(s):
                if x.group("dbl"):
                    print(x.group("dbl"))
                elif x.group("sngl"):
                    print(x.group("sngl"))
                else:
                    print(x.group("raw"))
            
            qid & accept id: (36633059, 36633297) query: Make a pandas series by running a function on all adjacent values soup:

            In response to your edit, we could try and use a similar .rolling method, but pandas does not currently support non-numeric types in rolls.

            \n

            So, we can use a list comprehension:

            \n
            [music21.interval.Interval(music21.note.Note(s1[i]),\\n                           music21.note.Note(s1[i + 1])).name\\n for i in range(len(s1)-1)]\n
            \n

            or, an apply:

            \n
            import music21\nimport pandas as pd\nimport numpy as np\n\ns1 = pd.Series(['C4', 'E-4', 'G4', 'A-4'])\ndf = pd.DataFrame({0:s1, 1:s1.shift(1)})\n\ndef myfunc(x):\n    if not any([pd.isnull(x[0]), pd.isnull(x[1])]):\n        return music21.interval.Interval(music21.note.Note(x[0]),music21.note.Note(x[1])).name\n\n\ndf.apply(myfunc, axis = 1)\n
            \n

            nb, I would be surprised if the apply is any faster than the comprehension

            \n soup wrap:

            In response to your edit, we could try and use a similar .rolling method, but pandas does not currently support non-numeric types in rolls.

            So, we can use a list comprehension:

            [music21.interval.Interval(music21.note.Note(s1[i]),\
                                       music21.note.Note(s1[i + 1])).name\
             for i in range(len(s1)-1)]
            

            or, an apply:

            import music21
            import pandas as pd
            import numpy as np
            
            s1 = pd.Series(['C4', 'E-4', 'G4', 'A-4'])
            df = pd.DataFrame({0:s1, 1:s1.shift(1)})
            
            def myfunc(x):
                if not any([pd.isnull(x[0]), pd.isnull(x[1])]):
                    return music21.interval.Interval(music21.note.Note(x[0]),music21.note.Note(x[1])).name
            
            
            df.apply(myfunc, axis = 1)
            

            nb, I would be surprised if the apply is any faster than the comprehension

            qid & accept id: (36672440, 36672537) query: How to force sympy to extract specific subexpressions? soup:

            A simple way is to solve for parameters which will cause the expressions to be equal at a number of points in time. Given that the forms are in fact the same, this will work fine:

            \n
            V_Ci, tau, V_Cf = symbols('V_Ci, tau, V_Cf')\n\ntarget = V_Ci*exp(-t/tau) + Heaviside(t)*V_Cf*(1 - exp(-t/tau))\n\nsolve([(eqVc.rhs - target).subs(t, ti) for ti in [0, 1, 2]],\n      [V_Ci, tau, V_Cf], dict=True)\n
            \n

            The answer I get is

            \n
            [{V_Cf: R_S/(R_1 + R_S),\n  tau: 1/log(exp((1/R_S + 1/R_1)/(C_1 + C_S))),\n  V_Ci: k_1}]\n
            \n

            That log(exp()) is not simplified away because of the way the variables are defined. Defining everything as real (V_Ci, tau, V_Cf = symbols('V_Ci, tau, V_Cf', real=True) and similar modification in your code) simplifies the soluion to

            \n
            [{V_Ci: k_1, \n  V_Cf: R_S/(R_1 + R_S), \n   tau: R_1*R_S*(C_1 + C_S)/(R_1 + R_S)}]\n
            \n soup wrap:

            A simple way is to solve for parameters which will cause the expressions to be equal at a number of points in time. Given that the forms are in fact the same, this will work fine:

            V_Ci, tau, V_Cf = symbols('V_Ci, tau, V_Cf')
            
            target = V_Ci*exp(-t/tau) + Heaviside(t)*V_Cf*(1 - exp(-t/tau))
            
            solve([(eqVc.rhs - target).subs(t, ti) for ti in [0, 1, 2]],
                  [V_Ci, tau, V_Cf], dict=True)
            

            The answer I get is

            [{V_Cf: R_S/(R_1 + R_S),
              tau: 1/log(exp((1/R_S + 1/R_1)/(C_1 + C_S))),
              V_Ci: k_1}]
            

            That log(exp()) is not simplified away because of the way the variables are defined. Defining everything as real (V_Ci, tau, V_Cf = symbols('V_Ci, tau, V_Cf', real=True) and similar modification in your code) simplifies the soluion to

            [{V_Ci: k_1, 
              V_Cf: R_S/(R_1 + R_S), 
               tau: R_1*R_S*(C_1 + C_S)/(R_1 + R_S)}]
            
            qid & accept id: (36709837, 36710070) query: Slicing and arranging dataframe in pandas soup:

            I think you can first create helper column cols by cumcount and then pivot_table.\nThen you need find length of notnull columns (substract first 2) and\ngroupby by this length. Last dropna columns in each group:

            \n
            df['cols'] = 'p' + (df.groupby('id')['id'].cumcount() + 1).astype(str)\n\ndf1 = df.pivot_table(index=['id', 'channel'], \n                    columns='cols', \n                    values='path', \n                    aggfunc='first').reset_index().rename_axis(None, axis=1)\n\nprint df1\n     id channel    p1    p2     p3    p4\n0    15  direct    a1    a2     a3    a4\n1   213    paid    b2    b1   None  None\n2  2222  direct  as25  dw46    32q  None\n3  3111    paid  d32a  23ff  www32   2d2\n\nprint df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)\n0    4\n1    2\n2    3\n3    4\ndtype: int64\n\nfor i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)):\n    print i\n    print g.dropna(axis=1)\n2\n    id channel  p1  p2\n1  213    paid  b2  b1\n3\n     id channel    p1    p2   p3\n2  2222  direct  as25  dw46  32q\n4\n     id channel    p1    p2     p3   p4\n0    15  direct    a1    a2     a3   a4\n3  3111    paid  d32a  23ff  www32  2d2\n
            \n

            For storing you can use dictionary of DataFrames:

            \n
            dfs={i: g.dropna(axis=1)         \n    for i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1))}\n\n#select DataFrame with len=2    \nprint dfs[2]\n    id channel  p1  p2\n1  213    paid  b2  b1\n\n#select DataFrame with len=3       \nprint dfs[3]\n     id channel    p1    p2   p3\n2  2222  direct  as25  dw46  32q\n
            \n soup wrap:

            I think you can first create helper column cols by cumcount and then pivot_table. Then you need find length of notnull columns (substract first 2) and groupby by this length. Last dropna columns in each group:

            df['cols'] = 'p' + (df.groupby('id')['id'].cumcount() + 1).astype(str)
            
            df1 = df.pivot_table(index=['id', 'channel'], 
                                columns='cols', 
                                values='path', 
                                aggfunc='first').reset_index().rename_axis(None, axis=1)
            
            print df1
                 id channel    p1    p2     p3    p4
            0    15  direct    a1    a2     a3    a4
            1   213    paid    b2    b1   None  None
            2  2222  direct  as25  dw46    32q  None
            3  3111    paid  d32a  23ff  www32   2d2
            
            print df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)
            0    4
            1    2
            2    3
            3    4
            dtype: int64
            
            for i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1)):
                print i
                print g.dropna(axis=1)
            2
                id channel  p1  p2
            1  213    paid  b2  b1
            3
                 id channel    p1    p2   p3
            2  2222  direct  as25  dw46  32q
            4
                 id channel    p1    p2     p3   p4
            0    15  direct    a1    a2     a3   a4
            3  3111    paid  d32a  23ff  www32  2d2
            

            For storing you can use dictionary of DataFrames:

            dfs={i: g.dropna(axis=1)         
                for i, g in df1.groupby(df1.apply(lambda x: x.notnull().sum() - 2 , axis=1))}
            
            #select DataFrame with len=2    
            print dfs[2]
                id channel  p1  p2
            1  213    paid  b2  b1
            
            #select DataFrame with len=3       
            print dfs[3]
                 id channel    p1    p2   p3
            2  2222  direct  as25  dw46  32q
            
            qid & accept id: (36719792, 36720468) query: How can I pack images? -Pygame -PyInstaller soup:

            I don't know much about the fromstring and tostring methods but you could always include \nthe images as base64 data. Pygame seems to need an actual image file and not just a blob \nof binary data so in the example below, I've included 3 small icons as base64 strings that \nget written to files in a sub-folder named "data". The filenames are then passed \nto pygame.image.load().

            \n

            I tested this code with pyinstaller --onefile filename.py and \nit worked fine without any manual changes to settings/specs/paths etc.

            \n
            import os\nimport hashlib\nimport pygame\nimport time\nimport base64\n\ndef create_assets(asset_dict, asset_dir):\n\n    """ \n    hand this function a dictionary of assets (images, mp3s, whatever)\n    and an absolute path to the data/asset folder. \n    The function creates the folder and files from the base64 strings\n    if they don't exist. If the files exist, an md5 check is run\n    instead to ensure integrity \n    """\n\n    first_run = False\n    if not os.path.isdir(asset_dir):\n        os.mkdir(asset_dir)\n        first_run = True\n    for label in asset_dict:\n        asset = asset_dict[label]\n        filename = os.path.join(asset_dir, asset["filename"])\n        rewrite = False\n        # no need to check file if we just created the data folder\n        if not first_run:\n            if not os.path.isfile(filename):\n                # the file doesn't exist\n                rewrite = True\n            else:\n                # file exists - make sure it's intact via md5\n                with open(filename, "rb") as f:\n                    if not hashlib.md5(f.read()).hexdigest() == asset["md5"]:\n                        # the filename exists but the contents is wrong\n                        rewrite = True\n        if first_run or rewrite:\n            # one of our checks failed or first run - write file\n            print ("Writing file: ",filename)\n            with open(filename, "wb") as f:\n                f.write(base64.b64decode(asset["data"]))\n        else:\n            print ("File exists: ",filename)\n\n\n""" \nThis the data dictionary. It's very easy to save \nthe whole thing as json should you feel like it.\nThe images are just small, random icons at the moment \n\n"""\n\nassets = {\n    "background": {\n        "filename": "bg1.png", \n        "data": "iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAACOUlEQVQ4je2VQUhTcRzHP3szJyHiwbH5aCzpMCJ6NHIjWmzFtCBwKkEX7RYN2g4eAyF2jLRLzE5GoIegOgQNEarJCBwtYeQgPEQ0dC+fDNK5yFD3OuR7zL13WNCx7/ED3x8/vr8/379FVVWVA5WVZZY/TCEXMxxtdxLou082PU61sgpgYKI7hFzM4BD99A/PACBog9KpKPMvruvDwpEnlJUC4YFp2jtcpkzyxQBQ5BxKKQeANdj7LZHPTrK9VQTA7vTiD95lfe09+ewEtrZOTp29xbHjlwysyyFR3iiwvVXkR6XEiZPDtMjFjD5I8sfpdgV4OdtPeGCa/b2f9HgGWUhFqdV2DSwyMofkiyEXM/qWgt3pJRx5zJVrT+l2BQCoVlZ5++omojvIQipKZfOLKQPockiI7hAAy7kkltmkRzU7QLNq9ApmYQMIwhEkXxyr1aabG5mZ1/ow+SzRGPbnT8853XsbyR9HEFpZX1sEMLCLVx8ZDmUNnPma8J4bw9bWSY9nkHfzY/za+U5Z+YggtFJYmqJW2z14p4fZhrxEo9cym/So7R0uLlx+wOKbO3rYzarRKwDUarvs7+3om/yNGr3/j/IPjiK6Q1Qrq+QXJ4mMzDEaW9Hz6R+a0TM1Y6OxFYZuvKbLIXG+7x6iO4RgVkHN1tefGA5Xn0VVVTWdih4qSqWUo6wUyGcnkHxxRHeQ/b0dA9PKWJPd6aUFMFRQNj3eVH1peddXn0X7Auq3VOScIWyzA9QP0vQbENbTigXxO1gAAAAASUVORK5CYII=",\n        "md5": "12f7eb2eea8992a2644de649dfaf00b3"\n        },\n    "player_sprite": {\n        "filename": "player_img.png", \n        "data": "iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAACh0lEQVQ4jZWUS2gTURiFv5lkmiZpk5KmNWmbIFratLWlCiq2blwILly4EFSsG3EjCK5EFITiQhFXbgRBXbXgouBe3GkRKTaYvtLWR01sa5M0j8nDpK9xEWZIwqStZ3nuzLmX/7v3CIE7HxR5KobvwUkaB9wApAIx7H1O5h5+Jv5pFYAjT05r3maygHfIR8OxZlTJs+uERoKI3iEf1kN2HKdc2mJoJEjme1ILK/WyP1IcvNFTFpZfy7H0coaUP4qgKIqSC6exeOrJr2YxuSxM3x1nM5Hn7+8MpTK31Wme/WgTnsud2Puc2npyMlIMVI3p++O0XDiM44SLVCDGt2d+8itZdpOt10n77X7MrXUAGEtnkPJH2c5uItlNLI8t7hlW19FA68V2alusmickv0aVSgD7lR4oUQ8AgGAU8Vz1IdaIVT09UIabg9eHo+9DbMkbZYGeK514h3wIkkjKH9X1CpEcciBGfjVL5F0IeXa9CEUPgFgj4rnWRXg0yE5+u6pXKUEOxpXwaJDExNp/za+ahI/n3ip7f7a36rscRShVd9oHFFWmZosGxWjrdSJPxYoLLguFPzkA2i51FHesNfDr1UxVD6AQybHw9AtmtxXD8/HXw/Y+J/LsOu23+tlKb5BfyZKZjyNIIuHRIMpWcSp6nirfveNkFpIY0/MJlscWkRpMxXtklUhMrLGzsVN2CkDXU+en/lv2ltUKUu9dNakAwm8WqKw+7S2rFZSei+8apgKwdTciGEV+vpgqqz7tLatKTka0naEclBbqsmB2W+l5NEhl9RnOLw0MpwIxbN0OJFsNtW4rB8560QOlSgXQdMaDZDcBEHw8gWSrwahXQdVAVQIonb1aff8A8A1zY9iTMCcAAAAASUVORK5CYII=",\n        "md5": "79f25f0784a7849415f9c3d0d9d05267"\n        },\n    "weapon": {\n        "filename": "sword1.png", \n        "data": "iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAACzUlEQVQ4jZWVX0hTYRjGfydHZJb5J03SrIarrQaNEhKERgT25yaiIjEGQUWB0U03BUHWRVS3BUUOHGkUFg2LjW5bQkUjs6bNHcskJZRVplHTzs7bxdphy0Ot5+Ycnu99n3O+5/l4P+Q32u7FxO0JyOWOPvkbWjujouu68e72BOTqrX5j3UIG9mxdQXOTA4BI7AvB0Aca6itxOUoNruPBW5y2Yupc5RzauxoALakbGkrbvZgA1FQXUr++HEVR8PlVfH7VKGrcYeVoo51L3lcEQyNsql3CueMbjPWnL8eJDn0FIG9i7raWmuWF7G5YgaIoALgcpThtRWiacGSfnZ1bqlEUhYJ8Cw+7Rzm8dzXVSxcYglUVBYQjcW50DcLljj7Dk1zwt/rWzqgoIiL8gSs3+4m+myCiTmC3LqJ27WISM8lZXNpD750BDu5ZldqhiMjrgc9ysbVXevrjIiJy5Ey36Louw6NTouu6nLjwzJRL97o9AXnSM5ZKOTOAYGiExh1Wou++MjQyhXVZIeOffhDuiwOYcsHQB+NZ5yrHcmCXDaetiOCjERrqK9m4roxwJM7KqoWMf/pBeWk+LnsJ375rsziAhvpKgqFUL4Cph9duv2FweIpwXxyXvQSXo5TEtDaLO7DLZnje3ORAURTmpEW8dwZIa2tJITGjAZCYSaJpuimXxrH9a4wjh5mx/xtKZqAWM2NzDcUsUIuZsTXVhTmFYhaoEUqmsf8bShZaO6M5japc0N6lSt7kvO0tmqazfu3irA9FYl+44H2Fw1pEVUWBwfn8KgX5FirK5mfVd9wfxHs3Rt7JU6dbdBE0TYxGAJ9fRR2eZOZnks0bl+Lzq5y/3os6PMnD7lES00lqnamfCEfivB+dwmkrTh0bkdTEztx+T39c3J6APA5/NLjnr8fl7JUX8qRnzLCivUvNmthZgm5PIEv0X6OtvUuddQX8Ag5COzDf7kUwAAAAAElFTkSuQmCC",\n        "md5": "92485d36b8ac414cc758d9a6c6f28d23"\n        },\n}\n\n# get absolute path to asset directory\nasset_dir = "data"\nasset_dir_path = os.path.join(os.getcwd(), asset_dir)\n\n# create files in asset directory using the assets dictionary\ncreate_assets(assets, asset_dir_path)\n\npygame.init()\n\nWIDTH = 800\nHEIGHT = 600\n\nSCREEN = pygame.display.set_mode((WIDTH, HEIGHT))\n\nloaded_images = {}\n\n# initalize/load all the newly created images\nfor label in assets:\n    file_path = os.path.join(asset_dir_path,assets[label]["filename"])\n    loaded_images[label] = pygame.image.load(file_path)\n\npos1 = 0\npos2 = 0\nt_start = time.time()\n\nwhile time.time() - t_start < 5:\n    for img in loaded_images:\n        SCREEN.blit(loaded_images[img], (pos1, pos2))\n        time.sleep(0.2)\n        pos1 += 20\n        pos2 += 20\n        pygame.display.update()\n
            \n

            I turned the images into base64 strings like so:

            \n
            import base64\n\nwith open(img_input, "rb") as f:\n    with open(img_output_b64, "wb") as f2:\n        f2.write(base64.b64encode(f.read()))\n
            \n soup wrap:

            I don't know much about the fromstring and tostring methods but you could always include the images as base64 data. Pygame seems to need an actual image file and not just a blob of binary data so in the example below, I've included 3 small icons as base64 strings that get written to files in a sub-folder named "data". The filenames are then passed to pygame.image.load().

            I tested this code with pyinstaller --onefile filename.py and it worked fine without any manual changes to settings/specs/paths etc.

            import os
            import hashlib
            import pygame
            import time
            import base64
            
            def create_assets(asset_dict, asset_dir):
            
                """ 
                hand this function a dictionary of assets (images, mp3s, whatever)
                and an absolute path to the data/asset folder. 
                The function creates the folder and files from the base64 strings
                if they don't exist. If the files exist, an md5 check is run
                instead to ensure integrity 
                """
            
                first_run = False
                if not os.path.isdir(asset_dir):
                    os.mkdir(asset_dir)
                    first_run = True
                for label in asset_dict:
                    asset = asset_dict[label]
                    filename = os.path.join(asset_dir, asset["filename"])
                    rewrite = False
                    # no need to check file if we just created the data folder
                    if not first_run:
                        if not os.path.isfile(filename):
                            # the file doesn't exist
                            rewrite = True
                        else:
                            # file exists - make sure it's intact via md5
                            with open(filename, "rb") as f:
                                if not hashlib.md5(f.read()).hexdigest() == asset["md5"]:
                                    # the filename exists but the contents is wrong
                                    rewrite = True
                    if first_run or rewrite:
                        # one of our checks failed or first run - write file
                        print ("Writing file: ",filename)
                        with open(filename, "wb") as f:
                            f.write(base64.b64decode(asset["data"]))
                    else:
                        print ("File exists: ",filename)
            
            
            """ 
            This the data dictionary. It's very easy to save 
            the whole thing as json should you feel like it.
            The images are just small, random icons at the moment 
            
            """
            
            assets = {
                "background": {
                    "filename": "bg1.png", 
                    "data": "iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAACOUlEQVQ4je2VQUhTcRzHP3szJyHiwbH5aCzpMCJ6NHIjWmzFtCBwKkEX7RYN2g4eAyF2jLRLzE5GoIegOgQNEarJCBwtYeQgPEQ0dC+fDNK5yFD3OuR7zL13WNCx7/ED3x8/vr8/379FVVWVA5WVZZY/TCEXMxxtdxLou082PU61sgpgYKI7hFzM4BD99A/PACBog9KpKPMvruvDwpEnlJUC4YFp2jtcpkzyxQBQ5BxKKQeANdj7LZHPTrK9VQTA7vTiD95lfe09+ewEtrZOTp29xbHjlwysyyFR3iiwvVXkR6XEiZPDtMjFjD5I8sfpdgV4OdtPeGCa/b2f9HgGWUhFqdV2DSwyMofkiyEXM/qWgt3pJRx5zJVrT+l2BQCoVlZ5++omojvIQipKZfOLKQPockiI7hAAy7kkltmkRzU7QLNq9ApmYQMIwhEkXxyr1aabG5mZ1/ow+SzRGPbnT8853XsbyR9HEFpZX1sEMLCLVx8ZDmUNnPma8J4bw9bWSY9nkHfzY/za+U5Z+YggtFJYmqJW2z14p4fZhrxEo9cym/So7R0uLlx+wOKbO3rYzarRKwDUarvs7+3om/yNGr3/j/IPjiK6Q1Qrq+QXJ4mMzDEaW9Hz6R+a0TM1Y6OxFYZuvKbLIXG+7x6iO4RgVkHN1tefGA5Xn0VVVTWdih4qSqWUo6wUyGcnkHxxRHeQ/b0dA9PKWJPd6aUFMFRQNj3eVH1peddXn0X7Auq3VOScIWyzA9QP0vQbENbTigXxO1gAAAAASUVORK5CYII=",
                    "md5": "12f7eb2eea8992a2644de649dfaf00b3"
                    },
                "player_sprite": {
                    "filename": "player_img.png", 
                    "data": "iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAACh0lEQVQ4jZWUS2gTURiFv5lkmiZpk5KmNWmbIFratLWlCiq2blwILly4EFSsG3EjCK5EFITiQhFXbgRBXbXgouBe3GkRKTaYvtLWR01sa5M0j8nDpK9xEWZIwqStZ3nuzLmX/7v3CIE7HxR5KobvwUkaB9wApAIx7H1O5h5+Jv5pFYAjT05r3maygHfIR8OxZlTJs+uERoKI3iEf1kN2HKdc2mJoJEjme1ILK/WyP1IcvNFTFpZfy7H0coaUP4qgKIqSC6exeOrJr2YxuSxM3x1nM5Hn7+8MpTK31Wme/WgTnsud2Puc2npyMlIMVI3p++O0XDiM44SLVCDGt2d+8itZdpOt10n77X7MrXUAGEtnkPJH2c5uItlNLI8t7hlW19FA68V2alusmickv0aVSgD7lR4oUQ8AgGAU8Vz1IdaIVT09UIabg9eHo+9DbMkbZYGeK514h3wIkkjKH9X1CpEcciBGfjVL5F0IeXa9CEUPgFgj4rnWRXg0yE5+u6pXKUEOxpXwaJDExNp/za+ahI/n3ip7f7a36rscRShVd9oHFFWmZosGxWjrdSJPxYoLLguFPzkA2i51FHesNfDr1UxVD6AQybHw9AtmtxXD8/HXw/Y+J/LsOu23+tlKb5BfyZKZjyNIIuHRIMpWcSp6nirfveNkFpIY0/MJlscWkRpMxXtklUhMrLGzsVN2CkDXU+en/lv2ltUKUu9dNakAwm8WqKw+7S2rFZSei+8apgKwdTciGEV+vpgqqz7tLatKTka0naEclBbqsmB2W+l5NEhl9RnOLw0MpwIxbN0OJFsNtW4rB8560QOlSgXQdMaDZDcBEHw8gWSrwahXQdVAVQIonb1aff8A8A1zY9iTMCcAAAAASUVORK5CYII=",
                    "md5": "79f25f0784a7849415f9c3d0d9d05267"
                    },
                "weapon": {
                    "filename": "sword1.png", 
                    "data": "iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAACzUlEQVQ4jZWVX0hTYRjGfydHZJb5J03SrIarrQaNEhKERgT25yaiIjEGQUWB0U03BUHWRVS3BUUOHGkUFg2LjW5bQkUjs6bNHcskJZRVplHTzs7bxdphy0Ot5+Ycnu99n3O+5/l4P+Q32u7FxO0JyOWOPvkbWjujouu68e72BOTqrX5j3UIG9mxdQXOTA4BI7AvB0Aca6itxOUoNruPBW5y2Yupc5RzauxoALakbGkrbvZgA1FQXUr++HEVR8PlVfH7VKGrcYeVoo51L3lcEQyNsql3CueMbjPWnL8eJDn0FIG9i7raWmuWF7G5YgaIoALgcpThtRWiacGSfnZ1bqlEUhYJ8Cw+7Rzm8dzXVSxcYglUVBYQjcW50DcLljj7Dk1zwt/rWzqgoIiL8gSs3+4m+myCiTmC3LqJ27WISM8lZXNpD750BDu5ZldqhiMjrgc9ysbVXevrjIiJy5Ey36Louw6NTouu6nLjwzJRL97o9AXnSM5ZKOTOAYGiExh1Wou++MjQyhXVZIeOffhDuiwOYcsHQB+NZ5yrHcmCXDaetiOCjERrqK9m4roxwJM7KqoWMf/pBeWk+LnsJ375rsziAhvpKgqFUL4Cph9duv2FweIpwXxyXvQSXo5TEtDaLO7DLZnje3ORAURTmpEW8dwZIa2tJITGjAZCYSaJpuimXxrH9a4wjh5mx/xtKZqAWM2NzDcUsUIuZsTXVhTmFYhaoEUqmsf8bShZaO6M5japc0N6lSt7kvO0tmqazfu3irA9FYl+44H2Fw1pEVUWBwfn8KgX5FirK5mfVd9wfxHs3Rt7JU6dbdBE0TYxGAJ9fRR2eZOZnks0bl+Lzq5y/3os6PMnD7lES00lqnamfCEfivB+dwmkrTh0bkdTEztx+T39c3J6APA5/NLjnr8fl7JUX8qRnzLCivUvNmthZgm5PIEv0X6OtvUuddQX8Ag5COzDf7kUwAAAAAElFTkSuQmCC",
                    "md5": "92485d36b8ac414cc758d9a6c6f28d23"
                    },
            }
            
            # get absolute path to asset directory
            asset_dir = "data"
            asset_dir_path = os.path.join(os.getcwd(), asset_dir)
            
            # create files in asset directory using the assets dictionary
            create_assets(assets, asset_dir_path)
            
            pygame.init()
            
            WIDTH = 800
            HEIGHT = 600
            
            SCREEN = pygame.display.set_mode((WIDTH, HEIGHT))
            
            loaded_images = {}
            
            # initalize/load all the newly created images
            for label in assets:
                file_path = os.path.join(asset_dir_path,assets[label]["filename"])
                loaded_images[label] = pygame.image.load(file_path)
            
            pos1 = 0
            pos2 = 0
            t_start = time.time()
            
            while time.time() - t_start < 5:
                for img in loaded_images:
                    SCREEN.blit(loaded_images[img], (pos1, pos2))
                    time.sleep(0.2)
                    pos1 += 20
                    pos2 += 20
                    pygame.display.update()
            

            I turned the images into base64 strings like so:

            import base64
            
            with open(img_input, "rb") as f:
                with open(img_output_b64, "wb") as f2:
                    f2.write(base64.b64encode(f.read()))
            
            qid & accept id: (36731365, 36732761) query: Check Type: How to check if something is a RDD or a dataframe? soup:

            isinstance will work just fine:

            \n
            from pyspark.sql import DataFrame\nfrom pyspark.rdd import RDD\n\ndef foo(x):\n    if isinstance(x, RDD):\n        return "RDD"\n    if isinstance(x, DataFrame):\n        return "DataFrame"\n\nfoo(sc.parallelize([]))\n## 'RDD'\nfoo(sc.parallelize([("foo", 1)]).toDF())\n## 'DataFrame'\n
            \n

            but single dispatch is much more elegant approach:

            \n
            from functools import singledispatch\n\n@singledispatch\ndef bar(x):\n    pass \n\n@bar.register(RDD)\ndef _(arg):\n    return "RDD"\n\n@bar.register(DataFrame)\ndef _(arg):\n    return "DataFrame"\n\nbar(sc.parallelize([]))\n## 'RDD'\n\nbar(sc.parallelize([("foo", 1)]).toDF())\n## 'DataFrame'\n
            \n

            If you don't mind additional dependencies multipledispatch is also an interesting option:

            \n
            from multipledispatch import dispatch\n\n@dispatch(RDD)\ndef baz(x):\n    return "RDD"\n\n@dispatch(DataFrame)\ndef baz(x):\n    return "DataFrame"\n\nbaz(sc.parallelize([]))\n## 'RDD'\n\nbaz(sc.parallelize([("foo", 1)]).toDF())\n## 'DataFrame'\n
            \n

            Finally the most Pythonic approach is to simply check an interface:

            \n
            def foobar(x):\n    if hasattr(x, "rdd"):\n        ## It is a DataFrame\n    else:\n        ## It (probably) is a RDD\n
            \n soup wrap:

            isinstance will work just fine:

            from pyspark.sql import DataFrame
            from pyspark.rdd import RDD
            
            def foo(x):
                if isinstance(x, RDD):
                    return "RDD"
                if isinstance(x, DataFrame):
                    return "DataFrame"
            
            foo(sc.parallelize([]))
            ## 'RDD'
            foo(sc.parallelize([("foo", 1)]).toDF())
            ## 'DataFrame'
            

            but single dispatch is much more elegant approach:

            from functools import singledispatch
            
            @singledispatch
            def bar(x):
                pass 
            
            @bar.register(RDD)
            def _(arg):
                return "RDD"
            
            @bar.register(DataFrame)
            def _(arg):
                return "DataFrame"
            
            bar(sc.parallelize([]))
            ## 'RDD'
            
            bar(sc.parallelize([("foo", 1)]).toDF())
            ## 'DataFrame'
            

            If you don't mind additional dependencies multipledispatch is also an interesting option:

            from multipledispatch import dispatch
            
            @dispatch(RDD)
            def baz(x):
                return "RDD"
            
            @dispatch(DataFrame)
            def baz(x):
                return "DataFrame"
            
            baz(sc.parallelize([]))
            ## 'RDD'
            
            baz(sc.parallelize([("foo", 1)]).toDF())
            ## 'DataFrame'
            

            Finally the most Pythonic approach is to simply check an interface:

            def foobar(x):
                if hasattr(x, "rdd"):
                    ## It is a DataFrame
                else:
                    ## It (probably) is a RDD
            
            qid & accept id: (36744627, 36769922) query: Network capturing with Selenium/PhantomJS soup:

            I am using a proxy for this

            \n
            from selenium import webdriver\nfrom browsermobproxy import Server\n\nserver = Server(environment.b_mob_proxy_path)\nserver.start()\nproxy = server.create_proxy()\nservice_args = ["--proxy-server=%s" % proxy.proxy]\ndriver = webdriver.PhantomJS(service_args=service_args)\n\nproxy.new_har()\ndriver.get('url_to_open')\nprint proxy.har  # this is the archive\n# for example:\nall_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]\n
            \n

            the 'har' (http archive format) has a lot of other information about the requests and responses, it's very useful to me

            \n

            installing on Linux:

            \n
            pip install browsermob-proxy\n
            \n soup wrap:

            I am using a proxy for this

            from selenium import webdriver
            from browsermobproxy import Server
            
            server = Server(environment.b_mob_proxy_path)
            server.start()
            proxy = server.create_proxy()
            service_args = ["--proxy-server=%s" % proxy.proxy]
            driver = webdriver.PhantomJS(service_args=service_args)
            
            proxy.new_har()
            driver.get('url_to_open')
            print proxy.har  # this is the archive
            # for example:
            all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]
            

            the 'har' (http archive format) has a lot of other information about the requests and responses, it's very useful to me

            installing on Linux:

            pip install browsermob-proxy
            
            qid & accept id: (36753799, 36753921) query: Join unique values into new data frame (python, pandas) soup:

            UPDATE:

            \n

            B. M.'s solution utilizing numpy is much faster - i would recommend to use his approach:

            \n
            In [88]: %timeit pd.DataFrame({'col1':np.repeat(aa,bb.size),'col2':np.tile(bb,aa.size)})\n10 loops, best of 3: 25.4 ms per loop\n\nIn [89]: %timeit pd.DataFrame(list(product(aa,bb)), columns=['col1', 'col2'])\n1 loop, best of 3: 1.28 s per loop\n\nIn [90]: aa.size\nOut[90]: 1000\n\nIn [91]: bb.size\nOut[91]: 1000\n
            \n

            try itertools.product:

            \n
            In [56]: a\nOut[56]:\narray(['a', 'b', 'c', 'd'],\n      dtype='
            \n soup wrap:

            UPDATE:

            B. M.'s solution utilizing numpy is much faster - i would recommend to use his approach:

            In [88]: %timeit pd.DataFrame({'col1':np.repeat(aa,bb.size),'col2':np.tile(bb,aa.size)})
            10 loops, best of 3: 25.4 ms per loop
            
            In [89]: %timeit pd.DataFrame(list(product(aa,bb)), columns=['col1', 'col2'])
            1 loop, best of 3: 1.28 s per loop
            
            In [90]: aa.size
            Out[90]: 1000
            
            In [91]: bb.size
            Out[91]: 1000
            

            try itertools.product:

            In [56]: a
            Out[56]:
            array(['a', 'b', 'c', 'd'],
                  dtype='
            qid & accept id: (36765952, 36766169) query: Python: Is there a shortcut to finding which substring(from a set of substrings) comes first in a string? soup:

            You could use generators to find all positions, and min() to locate the left-most:

            \n
            positions = (s.find(sub), sub) for sub in (s1, s2, s3))\nleftmost = min((pos, sub) for pos, sub in positions if pos > -1)[1]\n
            \n

            This runs s.find() just once for each substring, filtering out any substring not present. If there are no substring matches at all, min() will throw a ValueError exception; you may want to catch that.

            \n

            This does scan the string 3 times; if the number of substrings tested is large enough, you'd want to build a trie structure instead, loop over indices into s and test if the characters at that position are present in the trie:

            \n
            def make_trie(*words):\n     root = {}\n     for word in words:\n         current = root\n         for letter in word:\n             current = current.setdefault(letter, {})\n         # insert sentinel at the end\n         current[None] = None\n     return root\n\ndef find_first(s, trie):\n    for i in range(len(s)):\n        pos, current, found = i, trie, []\n        while pos < len(s) and s[pos] in current:\n            found.append(s[pos])\n            current = current[s[pos]]\n            if None in current:  # whole substring detected\n                return ''.join(found)\n            pos += 1\n\nleftmost = find_first(s, make_trie(s1, s2, s3))\n
            \n

            The trie can be re-used for multiple strings.

            \n soup wrap:

            You could use generators to find all positions, and min() to locate the left-most:

            positions = (s.find(sub), sub) for sub in (s1, s2, s3))
            leftmost = min((pos, sub) for pos, sub in positions if pos > -1)[1]
            

            This runs s.find() just once for each substring, filtering out any substring not present. If there are no substring matches at all, min() will throw a ValueError exception; you may want to catch that.

            This does scan the string 3 times; if the number of substrings tested is large enough, you'd want to build a trie structure instead, loop over indices into s and test if the characters at that position are present in the trie:

            def make_trie(*words):
                 root = {}
                 for word in words:
                     current = root
                     for letter in word:
                         current = current.setdefault(letter, {})
                     # insert sentinel at the end
                     current[None] = None
                 return root
            
            def find_first(s, trie):
                for i in range(len(s)):
                    pos, current, found = i, trie, []
                    while pos < len(s) and s[pos] in current:
                        found.append(s[pos])
                        current = current[s[pos]]
                        if None in current:  # whole substring detected
                            return ''.join(found)
                        pos += 1
            
            leftmost = find_first(s, make_trie(s1, s2, s3))
            

            The trie can be re-used for multiple strings.

            qid & accept id: (36779891, 36780538) query: Insert values in lists following a pattern soup:

            I would just build a new list by constantly appending to it rather than inserting into an existing list. This should work:

            \n
            n = len(list_a)\nnewList = []\nfor i in range(0,n, 6):\n    newList.append(list_a[i:i+6] ) \n\n    newTuple1 = (newList[-1][1], newList[i][0])\n    newList.append(newTuple1)\n    try:\n        newTuple2 = (newTuple1[0] + 1, list_a[i+6][0])\n        newList.append(newTuple2)\n    except IndexError:\n        print "There was no next tuple"\n\nprint newList\n
            \n

            Output

            \n
            There was no next tuple\n[(1, 6), (6, 66), (66, 72), (72, 78), (78, 138), (138, 146), (146, 1), (147, 154), (154, 208), (208, 217), (217, 225), (225, 279), (279, 288), (300, 400), (400, 146)]\n
            \n

            Note that your example did not indicate what to do in case two if there are no additional tuples. Supposed there are 12 tuples in list_a. Then when you get to the second group of 2, there is no next tuple.

            \n

            Hope that helps.

            \n soup wrap:

            I would just build a new list by constantly appending to it rather than inserting into an existing list. This should work:

            n = len(list_a)
            newList = []
            for i in range(0,n, 6):
                newList.append(list_a[i:i+6] ) 
            
                newTuple1 = (newList[-1][1], newList[i][0])
                newList.append(newTuple1)
                try:
                    newTuple2 = (newTuple1[0] + 1, list_a[i+6][0])
                    newList.append(newTuple2)
                except IndexError:
                    print "There was no next tuple"
            
            print newList
            

            Output

            There was no next tuple
            [(1, 6), (6, 66), (66, 72), (72, 78), (78, 138), (138, 146), (146, 1), (147, 154), (154, 208), (208, 217), (217, 225), (225, 279), (279, 288), (300, 400), (400, 146)]
            

            Note that your example did not indicate what to do in case two if there are no additional tuples. Supposed there are 12 tuples in list_a. Then when you get to the second group of 2, there is no next tuple.

            Hope that helps.

            qid & accept id: (36783166, 36783935) query: Use map over a list of 50 generated colours to count, using filter, and reduce, or len, the frequency of occurence soup:

            This one also uses zip so you have a reference of the color being counted:

            \n
            zip(colours, map(lambda x: len(filter(lambda y: y==x, c)), colours))\n
            \n

            The way to use reduce to count the elements was giving me some thought, the only way I found of doing it was this:

            \n
            map(lambda color: reduce(lambda x,y: x+y, map(lambda y: 1,filter(lambda x: x==color, c))), colours)\n
            \n soup wrap:

            This one also uses zip so you have a reference of the color being counted:

            zip(colours, map(lambda x: len(filter(lambda y: y==x, c)), colours))
            

            The way to use reduce to count the elements was giving me some thought, the only way I found of doing it was this:

            map(lambda color: reduce(lambda x,y: x+y, map(lambda y: 1,filter(lambda x: x==color, c))), colours)
            
            qid & accept id: (36785204, 36785653) query: Conditionally replace several columns with default values in Pandas soup:

            If I understood your question correctly, you just need .loc (ix would also work):

            \n
            df.loc[df.DEFAULT, special]\nOut[40]: \n          A         D         G         I\n2  0.629427  0.532373  0.529779  0.274649\n4  0.226196  0.467896  0.851469  0.971351\n7  0.666459  0.351840  0.414972  0.451190\n8  0.238104  0.277630  0.943198  0.293356\n
            \n

            For assignment:

            \n
            df.loc[df.DEFAULT, special] = default\n\ndf\nOut[44]: \n          A         B         C         D         E         F         G  \\n0  0.513798  0.138073  0.685051  0.173045  0.964050  0.245352  0.360657   \n1  0.286920  0.464747  0.301910  0.857810  0.957686  0.684297  0.381671   \n2  1.000000  0.454802  0.707585  2.000000  0.777142  0.738670  3.000000   \n3  0.894643  0.987747  0.162569  0.430214  0.205933  0.651764  0.361578   \n4  1.000000  0.859582  0.014823  2.000000  0.658297  0.875474  3.000000   \n5  0.075581  0.848288  0.819145  0.429341  0.718035  0.275785  0.951492   \n6  0.984910  0.858093  0.665032  0.138201  0.006561  0.282801  0.050243   \n7  1.000000  0.215375  0.594164  2.000000  0.666909  0.598950  3.000000   \n8  1.000000  0.931840  0.568436  2.000000  0.911106  0.727052  3.000000   \n9  0.140491  0.181527  0.436082  0.617412  0.468370  0.496973  0.426825   \n\n          H         I         J DEFAULT  \n0  0.964239  0.422831  0.660515   False  \n1  0.650808  0.112612  0.897050   False  \n2  0.537366  4.000000  0.243392    True  \n3  0.377302  0.341089  0.488061   False  \n4  0.074656  4.000000  0.317079    True  \n5  0.990471  0.634703  0.141121   False  \n6  0.026650  0.731152  0.589984   False  \n7  0.570956  4.000000  0.762232    True  \n8  0.828288  4.000000  0.359620    True  \n9  0.701504  0.050273  0.427838   False  \n
            \n soup wrap:

            If I understood your question correctly, you just need .loc (ix would also work):

            df.loc[df.DEFAULT, special]
            Out[40]: 
                      A         D         G         I
            2  0.629427  0.532373  0.529779  0.274649
            4  0.226196  0.467896  0.851469  0.971351
            7  0.666459  0.351840  0.414972  0.451190
            8  0.238104  0.277630  0.943198  0.293356
            

            For assignment:

            df.loc[df.DEFAULT, special] = default
            
            df
            Out[44]: 
                      A         B         C         D         E         F         G  \
            0  0.513798  0.138073  0.685051  0.173045  0.964050  0.245352  0.360657   
            1  0.286920  0.464747  0.301910  0.857810  0.957686  0.684297  0.381671   
            2  1.000000  0.454802  0.707585  2.000000  0.777142  0.738670  3.000000   
            3  0.894643  0.987747  0.162569  0.430214  0.205933  0.651764  0.361578   
            4  1.000000  0.859582  0.014823  2.000000  0.658297  0.875474  3.000000   
            5  0.075581  0.848288  0.819145  0.429341  0.718035  0.275785  0.951492   
            6  0.984910  0.858093  0.665032  0.138201  0.006561  0.282801  0.050243   
            7  1.000000  0.215375  0.594164  2.000000  0.666909  0.598950  3.000000   
            8  1.000000  0.931840  0.568436  2.000000  0.911106  0.727052  3.000000   
            9  0.140491  0.181527  0.436082  0.617412  0.468370  0.496973  0.426825   
            
                      H         I         J DEFAULT  
            0  0.964239  0.422831  0.660515   False  
            1  0.650808  0.112612  0.897050   False  
            2  0.537366  4.000000  0.243392    True  
            3  0.377302  0.341089  0.488061   False  
            4  0.074656  4.000000  0.317079    True  
            5  0.990471  0.634703  0.141121   False  
            6  0.026650  0.731152  0.589984   False  
            7  0.570956  4.000000  0.762232    True  
            8  0.828288  4.000000  0.359620    True  
            9  0.701504  0.050273  0.427838   False  
            
            qid & accept id: (36794619, 36794819) query: Customizing time of the datetime object in python soup:

            Python's datetime objects are immutable, ie date_obj.hour = 23 results with
            \nAttributeError: attribute 'hour' of 'datetime.datetime' objects is not writable.

            \n

            Instead we need to create a new datetime object.\nConsider this as a guide:

            \n
            from datetime import datetime\nfrom dateutil import relativedelta\n\norig_start = datetime.now()\norig_end = datetime.now() + relativedelta.relativedelta(months=1)\n\nprint(orig_start)\nprint(orig_end)\n\nmod_start = datetime(year=orig_start.year,\n                     month=orig_start.month,\n                     day=orig_start.day,\n                     hour=0, minute=0, second=0)\n\nmod_end = datetime(year=orig_end.year,\n                   month=orig_end.month,\n                   day=orig_end.day,\n                   hour=23, minute=59, second=59)\n\n# or even better as suggested in the comments:\nmod_end = orig_end.replace(hour=23, minute=59, second=59, microsecond=0)\n\nprint(mod_start)\nprint(mod_end)\n
            \n

            outputs:

            \n
            2016-04-22 16:11:08.171845\n2016-05-22 16:11:08.171845\n2016-04-22 00:00:00\n2016-05-22 23:59:59\n
            \n soup wrap:

            Python's datetime objects are immutable, ie date_obj.hour = 23 results with
            AttributeError: attribute 'hour' of 'datetime.datetime' objects is not writable.

            Instead we need to create a new datetime object. Consider this as a guide:

            from datetime import datetime
            from dateutil import relativedelta
            
            orig_start = datetime.now()
            orig_end = datetime.now() + relativedelta.relativedelta(months=1)
            
            print(orig_start)
            print(orig_end)
            
            mod_start = datetime(year=orig_start.year,
                                 month=orig_start.month,
                                 day=orig_start.day,
                                 hour=0, minute=0, second=0)
            
            mod_end = datetime(year=orig_end.year,
                               month=orig_end.month,
                               day=orig_end.day,
                               hour=23, minute=59, second=59)
            
            # or even better as suggested in the comments:
            mod_end = orig_end.replace(hour=23, minute=59, second=59, microsecond=0)
            
            print(mod_start)
            print(mod_end)
            

            outputs:

            2016-04-22 16:11:08.171845
            2016-05-22 16:11:08.171845
            2016-04-22 00:00:00
            2016-05-22 23:59:59
            
            qid & accept id: (36798227, 36803072) query: Python CSVkit compare CSV files soup:

            I would recommended to use pandas to achieve what you are looking for:

            \n

            And here is how simple it would be using pandas, consider your two csv files are like this:

            \n
            \n

            CSV1

            \n
            \n
            reference,name,house\n2348A,john,37\n5648R,bill,3\nRT48,kate,88\n76A,harry ,433\n
            \n
            \n

            CSV2

            \n
            \n
            reference\n2348A\n76A\n
            \n
            \n

            Code

            \n
            \n
            import pandas as pd\ndf1 = pd.read_csv(r'd:\temp\data1.csv')\ndf2 = pd.read_csv(r'd:\temp\data2.csv')\ndf3 = pd.merge(df1,df2, on= 'reference', how='inner')\ndf3.to_csv('outpt.csv')\n
            \n
            \n

            output.csv

            \n
            \n
            ,reference,name,house\n0,2348A,john,37\n1,76A,harry ,433\n
            \n soup wrap:

            I would recommended to use pandas to achieve what you are looking for:

            And here is how simple it would be using pandas, consider your two csv files are like this:

            CSV1

            reference,name,house
            2348A,john,37
            5648R,bill,3
            RT48,kate,88
            76A,harry ,433
            

            CSV2

            reference
            2348A
            76A
            

            Code

            import pandas as pd
            df1 = pd.read_csv(r'd:\temp\data1.csv')
            df2 = pd.read_csv(r'd:\temp\data2.csv')
            df3 = pd.merge(df1,df2, on= 'reference', how='inner')
            df3.to_csv('outpt.csv')
            

            output.csv

            ,reference,name,house
            0,2348A,john,37
            1,76A,harry ,433
            
            qid & accept id: (36799190, 36819439) query: Update a Pyspark DF Column based on an Array in another column soup:

            Here is how I resolved this using explode:

            \n
            df = df.withColumn('temp', split(df.fieldList, ','))\ndf = df.withColumn('cols', explode(df.temp))\ndf = df.withColumn('col_value', split(df.cols, '='))\ndf = df.withColumn('deltaCol', df.col_value[0])\n       .withColumn('deltaValue',df.col_value[1])\n
            \n

            Final Output of the above (after dropping irrelevant columns) resulted in this:

            \n
            +------+-----+--------+--------------------+--------+----------+\n|    id|table|    user|          changeDate|deltaCol|deltaValue|\n+------+-----+--------+--------------------+--------+----------+\n|555555| TAB2| user11 | 2016-01-24 19:10...| value2 |       100|\n|  1111| TAB1| user01 | 2015-12-31 13:12...|  value |      0.34|\n|  1111| TAB1| user01 | 2015-12-31 13:12...|   name | 'newName'|\n+------+-----+--------+--------------------+--------+----------+\n
            \n

            After this I registered it as a table and performed SQL operation to pivot the data:

            \n
            >>> res = sqlContext.sql("select id, table, user, changeDate, max(value2) as value2, max(value) as value, max(name) as name \\n... from (select id, table, user, changeDate, case when trim(deltaCol) == 'value2' then deltaValue else Null end value2,\\n... case when trim(deltaCol) == 'value' then deltaValue else Null end value,\\n... case when trim(deltaCol) == 'name' then deltaValue else Null end name from delta) t group by id, table, user, changeDate")\n
            \n

            The result of this was:

            \n
            +------+-----+--------+--------------------+------+-----+----------+\n|    id|table|    user|          changeDate|value2|value|      name|\n+------+-----+--------+--------------------+------+-----+----------+\n|555555| TAB2| user11 | 2016-01-24 19:10...|   100| null|      null|\n|  1111| TAB1| user01 | 2015-12-31 13:12...|  null| 0.34| 'newName'|\n+------+-----+--------+--------------------+------+-----+----------+\n
            \n

            For usage of this code with different tables, I used the columns of the master DF(my eventual target table) to prepare a string of columns:

            \n
            >>> string = [(", max(" + c + ") as " + c) for c in masterDF.columns]\n>>> string = "".join(string)\n>>> string\n', max(id) as id, max(value) as value, max(name) as name, max(value2) as value2'\n
            \n soup wrap:

            Here is how I resolved this using explode:

            df = df.withColumn('temp', split(df.fieldList, ','))
            df = df.withColumn('cols', explode(df.temp))
            df = df.withColumn('col_value', split(df.cols, '='))
            df = df.withColumn('deltaCol', df.col_value[0])
                   .withColumn('deltaValue',df.col_value[1])
            

            Final Output of the above (after dropping irrelevant columns) resulted in this:

            +------+-----+--------+--------------------+--------+----------+
            |    id|table|    user|          changeDate|deltaCol|deltaValue|
            +------+-----+--------+--------------------+--------+----------+
            |555555| TAB2| user11 | 2016-01-24 19:10...| value2 |       100|
            |  1111| TAB1| user01 | 2015-12-31 13:12...|  value |      0.34|
            |  1111| TAB1| user01 | 2015-12-31 13:12...|   name | 'newName'|
            +------+-----+--------+--------------------+--------+----------+
            

            After this I registered it as a table and performed SQL operation to pivot the data:

            >>> res = sqlContext.sql("select id, table, user, changeDate, max(value2) as value2, max(value) as value, max(name) as name \
            ... from (select id, table, user, changeDate, case when trim(deltaCol) == 'value2' then deltaValue else Null end value2,\
            ... case when trim(deltaCol) == 'value' then deltaValue else Null end value,\
            ... case when trim(deltaCol) == 'name' then deltaValue else Null end name from delta) t group by id, table, user, changeDate")
            

            The result of this was:

            +------+-----+--------+--------------------+------+-----+----------+
            |    id|table|    user|          changeDate|value2|value|      name|
            +------+-----+--------+--------------------+------+-----+----------+
            |555555| TAB2| user11 | 2016-01-24 19:10...|   100| null|      null|
            |  1111| TAB1| user01 | 2015-12-31 13:12...|  null| 0.34| 'newName'|
            +------+-----+--------+--------------------+------+-----+----------+
            

            For usage of this code with different tables, I used the columns of the master DF(my eventual target table) to prepare a string of columns:

            >>> string = [(", max(" + c + ") as " + c) for c in masterDF.columns]
            >>> string = "".join(string)
            >>> string
            ', max(id) as id, max(value) as value, max(name) as name, max(value2) as value2'
            
            qid & accept id: (36804141, 36805412) query: Vectorized construction of DatetimeIndex in Pandas soup:

            You can use dtypes m8[Y], m8[M], m8[D] to make Timedeltas arrays, and add them together to the date: "0000-01-01":

            \n
            import pandas as pd\nimport numpy as np\n\nyear = np.arange(2010, 2020)\nmonths = np.arange(1, 13)\ndays = np.arange(1, 29)\n\ny, m, d = map(np.ravel, np.broadcast_arrays(*np.ix_(year, months, days)))\n\nstart = np.array(["0000-01-01"], dtype="M8[Y]")\n\nr1 = start + y.astype("m8[Y]") + (m - 1).astype("m8[M]") + (d-1).astype("m8[D]")\n\ndef build_DatetimeIndex(*args):\n    return pd.DatetimeIndex([pd.datetime(*tup)\n                             for tup in np.broadcast(*args)])\n\nr2 = build_DatetimeIndex(y, m, d)\n\nnp.all(pd.DatetimeIndex(r1) == r2)\n
            \n

            To include hours , minutes, seconds:

            \n
            import pandas as pd\nimport numpy as np\n\ny = np.array([2012, 2013])\nm = np.array([1, 3])\nd = np.array([5, 20])\nH = np.array([10, 20])\nM = np.array([30, 40])\nS = np.array([0, 30])\n\nstart = np.array(["0000-01-01"], dtype="M8[Y]")\n\ndate = start + y.astype("m8[Y]") + (m - 1).astype("m8[M]") + (d-1).astype("m8[D]")\ndatetime = date.astype("M8[s]") + H.astype("m8[h]") + M.astype("m8[m]") + S.astype("m8[s]")\n\npd.Series(datetime)\n
            \n

            the result:

            \n
            0   2012-01-05 10:30:00\n1   2013-03-20 20:40:30\ndtype: datetime64[ns]\n
            \n soup wrap:

            You can use dtypes m8[Y], m8[M], m8[D] to make Timedeltas arrays, and add them together to the date: "0000-01-01":

            import pandas as pd
            import numpy as np
            
            year = np.arange(2010, 2020)
            months = np.arange(1, 13)
            days = np.arange(1, 29)
            
            y, m, d = map(np.ravel, np.broadcast_arrays(*np.ix_(year, months, days)))
            
            start = np.array(["0000-01-01"], dtype="M8[Y]")
            
            r1 = start + y.astype("m8[Y]") + (m - 1).astype("m8[M]") + (d-1).astype("m8[D]")
            
            def build_DatetimeIndex(*args):
                return pd.DatetimeIndex([pd.datetime(*tup)
                                         for tup in np.broadcast(*args)])
            
            r2 = build_DatetimeIndex(y, m, d)
            
            np.all(pd.DatetimeIndex(r1) == r2)
            

            To include hours , minutes, seconds:

            import pandas as pd
            import numpy as np
            
            y = np.array([2012, 2013])
            m = np.array([1, 3])
            d = np.array([5, 20])
            H = np.array([10, 20])
            M = np.array([30, 40])
            S = np.array([0, 30])
            
            start = np.array(["0000-01-01"], dtype="M8[Y]")
            
            date = start + y.astype("m8[Y]") + (m - 1).astype("m8[M]") + (d-1).astype("m8[D]")
            datetime = date.astype("M8[s]") + H.astype("m8[h]") + M.astype("m8[m]") + S.astype("m8[s]")
            
            pd.Series(datetime)
            

            the result:

            0   2012-01-05 10:30:00
            1   2013-03-20 20:40:30
            dtype: datetime64[ns]
            
            qid & accept id: (36804586, 36805734) query: Reordering same characters such that the characters are at least distance d from each other soup:

            I have tried an approach for solving this problem. It tries to create a list of characters and fill it with characters from the original string, in the order of their frequency. For example, for a string 'babac' with given distance 2, it fills in the order:

            \n
            most common:  ('b', 3)  # character b with frequency 3\n['-', '-', '-', '-', '-', '-']\nupdated o:  ['b', '-', '-', '-', '-', '-']\n['b', '-', '-', '-', '-', '-']\nupdated o:  ['b', '-', 'b', '-', '-', '-']\n['b', '-', 'b', '-', '-', '-']\nupdated o:  ['b', '-', 'b', '-', 'b', '-']\nmost common:  ('c', 2)\n['b', '-', 'b', '-', 'b', '-']\nupdated o:  ['b', 'c', 'b', '-', 'b', '-']\n['b', 'c', 'b', '-', 'b', '-']\nupdated o:  ['b', 'c', 'b', 'c', 'b', '-']\nmost common:  ('a', 1)\n['b', 'c', 'b', 'c', 'b', '-']\nupdated o:  ['b', 'c', 'b', 'c', 'b', 'a']\n
            \n

            to give final string bcbcba.

            \n

            I have pasted the code below. It contains many comments about what each line does and why. Try running it. If you want to get detailed output to know how the list above gets filled, then uncomment the lines that contain print statements.

            \n

            Here is the code, hope its useful:

            \n
            import collections\nimport math\n\ndef printMyString():\n  # get inputs\n  myStr = raw_input("enter string: ")\n  dist = int(raw_input("enter dist: "))\n\n  #create a dict, where each key is a character from myStr and corresponding value is its frequency\n  counter = collections.Counter(list(myStr))\n\n  # create an empty list where we will fill our characters to get final string\n  o = ['-']*len(myStr)\n\n  # get the most common character\n  most_common_char_freq = counter.most_common(1)[0][1]\n\n  # sep is the maximum distance at which repeated instances of the most frequent character m can be located from each other in the final string.\n  sep = int(math.ceil(len(myStr)*1.0/most_common_char_freq))\n\n  # if sep is less than given distance, then it is not possible to have such a string.\n\n  if(sep < dist):\n    print "such a string is not possible"\n    return\n  #print "sep", sep\n\n\n  j = 0 # this marks index at which we can write into the list\n\n  # while we still have characters left, we will continue to fill our output list o \n  while len(counter) > 0:\n   current_most_common_char = counter.most_common(1)[0][0]       # get the most common character left in counter        \n   current_most_common_char_freq = counter.most_common(1)[0][1]   \n\n   #print "most common: ", current_most_common_char\n   while o[j] != '-':  # Go to the next position in the output list where a character is yet to be written.\n     j += 1  \n     if(j == len(o)):  # We are out of places to write, this is bad!\n      # print "breaking, o = ", o\n      return\n\n   for i in range(current_most_common_char_freq): # For multiple occurences of the current most freq char, we write them one after the other, a distance of 'sep' apart\n    #print o\n    if (j+i*sep) >= len(o): # If we have to go beyond the length of the output list/string to write a character, then such a string is not possible\n      #print "not possible, o, char is ", o, current_most_common_char\n      print "such a string is not possible"\n      return\n    o[j+i*sep] = current_most_common_char # Write to the output list\n    #print "updated o: ", o\n\n   del counter[current_most_common_char] # remove the most common character. lets move on to next one in the loop.\n   j += 1 # update before moving on\n\n  print ''.join(o) # merge the characters in the output list to get final string\n\nprintMyString()\n
            \n soup wrap:

            I have tried an approach for solving this problem. It tries to create a list of characters and fill it with characters from the original string, in the order of their frequency. For example, for a string 'babac' with given distance 2, it fills in the order:

            most common:  ('b', 3)  # character b with frequency 3
            ['-', '-', '-', '-', '-', '-']
            updated o:  ['b', '-', '-', '-', '-', '-']
            ['b', '-', '-', '-', '-', '-']
            updated o:  ['b', '-', 'b', '-', '-', '-']
            ['b', '-', 'b', '-', '-', '-']
            updated o:  ['b', '-', 'b', '-', 'b', '-']
            most common:  ('c', 2)
            ['b', '-', 'b', '-', 'b', '-']
            updated o:  ['b', 'c', 'b', '-', 'b', '-']
            ['b', 'c', 'b', '-', 'b', '-']
            updated o:  ['b', 'c', 'b', 'c', 'b', '-']
            most common:  ('a', 1)
            ['b', 'c', 'b', 'c', 'b', '-']
            updated o:  ['b', 'c', 'b', 'c', 'b', 'a']
            

            to give final string bcbcba.

            I have pasted the code below. It contains many comments about what each line does and why. Try running it. If you want to get detailed output to know how the list above gets filled, then uncomment the lines that contain print statements.

            Here is the code, hope its useful:

            import collections
            import math
            
            def printMyString():
              # get inputs
              myStr = raw_input("enter string: ")
              dist = int(raw_input("enter dist: "))
            
              #create a dict, where each key is a character from myStr and corresponding value is its frequency
              counter = collections.Counter(list(myStr))
            
              # create an empty list where we will fill our characters to get final string
              o = ['-']*len(myStr)
            
              # get the most common character
              most_common_char_freq = counter.most_common(1)[0][1]
            
              # sep is the maximum distance at which repeated instances of the most frequent character m can be located from each other in the final string.
              sep = int(math.ceil(len(myStr)*1.0/most_common_char_freq))
            
              # if sep is less than given distance, then it is not possible to have such a string.
            
              if(sep < dist):
                print "such a string is not possible"
                return
              #print "sep", sep
            
            
              j = 0 # this marks index at which we can write into the list
            
              # while we still have characters left, we will continue to fill our output list o 
              while len(counter) > 0:
               current_most_common_char = counter.most_common(1)[0][0]       # get the most common character left in counter        
               current_most_common_char_freq = counter.most_common(1)[0][1]   
            
               #print "most common: ", current_most_common_char
               while o[j] != '-':  # Go to the next position in the output list where a character is yet to be written.
                 j += 1  
                 if(j == len(o)):  # We are out of places to write, this is bad!
                  # print "breaking, o = ", o
                  return
            
               for i in range(current_most_common_char_freq): # For multiple occurences of the current most freq char, we write them one after the other, a distance of 'sep' apart
                #print o
                if (j+i*sep) >= len(o): # If we have to go beyond the length of the output list/string to write a character, then such a string is not possible
                  #print "not possible, o, char is ", o, current_most_common_char
                  print "such a string is not possible"
                  return
                o[j+i*sep] = current_most_common_char # Write to the output list
                #print "updated o: ", o
            
               del counter[current_most_common_char] # remove the most common character. lets move on to next one in the loop.
               j += 1 # update before moving on
            
              print ''.join(o) # merge the characters in the output list to get final string
            
            printMyString()
            
            qid & accept id: (36806340, 36813485) query: Python3 Rename files in a directory importing the new names from a txt file soup:

            The following code will do the job for your specific use-case, though can make it more general purpose re-namer.

            \n
            import os # os is a library that gives us the ability to make OS changes\n\ndef file_renamer(list_of_files, new_file_name_list):\n    for file_name in list_of_files:\n        for (new_filename, barcode_infile) in new_file_name_list:\n            # as per the mentioned filename pattern -> xxxx.1.xxxx.[barcode]\n            barcode_current = file_name[12:19] # extracting the barcode from current filename\n            if barcode_current == barcode_infile:\n                os.rename(file_name, new_filename)  # renaming step\n                print 'Successfully renamed %s to %s ' % (file_name, new_filename)\n\n\nif __name__ == "__main__":\n    path = os.getcwd()  # preassuming that you'll be executing the script while in the files directory\n    file_dir = os.path.abspath(path)\n    newname_file = raw_input('enter file with new names - or the complete path: ')\n    path_newname_file = os.path.join(file_dir, newname_file)\n    new_file_name_list = []\n    with open(path_newname_file) as file:\n        for line in file:\n            x = line.strip().split(',')\n            new_file_name_list.append(x)\n\n    list_of_files = os.listdir(file_dir)\n    file_renamer(list_of_files, new_file_name_list)\n
            \n

            Pre-assumptions:\nnewnames.txt - comma

            \n
            0000.1.0000.1234567,1234567\n0000.1.0000.1234568,1234568\n0000.1.0000.1234569,1234569\n0000.1.0000.1234570,1234570\n0000.1.0000.1234571,1234571\n
            \n

            Files

            \n
            1111.1.0000.1234567\n1111.1.0000.1234568\n1111.1.0000.1234569 \n
            \n

            were renamed to

            \n
            0000.1.0000.1234567\n0000.1.0000.1234568\n0000.1.0000.1234569\n
            \n

            The terminal output:

            \n
            >python file_renamer.py\nenter file with new names: newnames.txt\nThe list of files -  ['.git', '.idea', '1111.1.0000.1234567', '1111.1.0000.1234568', '1111.1.0000.1234569', 'file_renamer.py', 'newnames.txt.txt']\nSuccessfully renamed 1111.1.0000.1234567 to 0000.1.0000.1234567\nSuccessfully renamed 1111.1.0000.1234568 to 0000.1.0000.1234568\nSuccessfully renamed 1111.1.0000.1234569 to 0000.1.0000.1234569\n
            \n soup wrap:

            The following code will do the job for your specific use-case, though can make it more general purpose re-namer.

            import os # os is a library that gives us the ability to make OS changes
            
            def file_renamer(list_of_files, new_file_name_list):
                for file_name in list_of_files:
                    for (new_filename, barcode_infile) in new_file_name_list:
                        # as per the mentioned filename pattern -> xxxx.1.xxxx.[barcode]
                        barcode_current = file_name[12:19] # extracting the barcode from current filename
                        if barcode_current == barcode_infile:
                            os.rename(file_name, new_filename)  # renaming step
                            print 'Successfully renamed %s to %s ' % (file_name, new_filename)
            
            
            if __name__ == "__main__":
                path = os.getcwd()  # preassuming that you'll be executing the script while in the files directory
                file_dir = os.path.abspath(path)
                newname_file = raw_input('enter file with new names - or the complete path: ')
                path_newname_file = os.path.join(file_dir, newname_file)
                new_file_name_list = []
                with open(path_newname_file) as file:
                    for line in file:
                        x = line.strip().split(',')
                        new_file_name_list.append(x)
            
                list_of_files = os.listdir(file_dir)
                file_renamer(list_of_files, new_file_name_list)
            

            Pre-assumptions: newnames.txt - comma

            0000.1.0000.1234567,1234567
            0000.1.0000.1234568,1234568
            0000.1.0000.1234569,1234569
            0000.1.0000.1234570,1234570
            0000.1.0000.1234571,1234571
            

            Files

            1111.1.0000.1234567
            1111.1.0000.1234568
            1111.1.0000.1234569 
            

            were renamed to

            0000.1.0000.1234567
            0000.1.0000.1234568
            0000.1.0000.1234569
            

            The terminal output:

            >python file_renamer.py
            enter file with new names: newnames.txt
            The list of files -  ['.git', '.idea', '1111.1.0000.1234567', '1111.1.0000.1234568', '1111.1.0000.1234569', 'file_renamer.py', 'newnames.txt.txt']
            Successfully renamed 1111.1.0000.1234567 to 0000.1.0000.1234567
            Successfully renamed 1111.1.0000.1234568 to 0000.1.0000.1234568
            Successfully renamed 1111.1.0000.1234569 to 0000.1.0000.1234569
            
            qid & accept id: (36835793, 36836375) query: Pandas - group by consecutive ranges soup:

            A way to do that :

            \n
            df = pd.DataFrame([[1,3,10], [4,10,7], [11,17,6], [18,26, 12],\n[27,30, 15], [31,40,6], [41, 42, 6]], columns=['start','end', 'height'])\n
            \n

            Use cut to make groups :

            \n
            df['groups']=pd.cut(df.height,[-1,0,5,10,15,1000])\n
            \n

            Find break points :

            \n
            df['categories']=(df.groups!=df.groups.shift()).cumsum()\n
            \n

            Then df is :

            \n
            """\n   start  end  height    groups  categories\n0      1    3      10   (5, 10]           0\n1      4   10       7   (5, 10]           0\n2     11   17       6   (5, 10]           0\n3     18   26      12  (10, 15]           1\n4     27   30      15  (10, 15]           1\n5     31   40       6   (5, 10]           2\n6     41   42       6   (5, 10]           2\n"""\n
            \n

            Define interesting data :

            \n
            f = {'start':['first'],'end':['last'], 'groups':['first']}\n
            \n

            And use the groupby.agg function :

            \n
            df.groupby('categories').agg(f)\n"""\n              groups  end start\n               first last first\ncategories                     \n0            (5, 10]   17     1\n1           (10, 15]   30    18\n2            (5, 10]   42    31\n"""\n
            \n soup wrap:

            A way to do that :

            df = pd.DataFrame([[1,3,10], [4,10,7], [11,17,6], [18,26, 12],
            [27,30, 15], [31,40,6], [41, 42, 6]], columns=['start','end', 'height'])
            

            Use cut to make groups :

            df['groups']=pd.cut(df.height,[-1,0,5,10,15,1000])
            

            Find break points :

            df['categories']=(df.groups!=df.groups.shift()).cumsum()
            

            Then df is :

            """
               start  end  height    groups  categories
            0      1    3      10   (5, 10]           0
            1      4   10       7   (5, 10]           0
            2     11   17       6   (5, 10]           0
            3     18   26      12  (10, 15]           1
            4     27   30      15  (10, 15]           1
            5     31   40       6   (5, 10]           2
            6     41   42       6   (5, 10]           2
            """
            

            Define interesting data :

            f = {'start':['first'],'end':['last'], 'groups':['first']}
            

            And use the groupby.agg function :

            df.groupby('categories').agg(f)
            """
                          groups  end start
                           first last first
            categories                     
            0            (5, 10]   17     1
            1           (10, 15]   30    18
            2            (5, 10]   42    31
            """
            
            qid & accept id: (36845683, 36846070) query: Adding 'n' values in list using for-loop and step-loop for that 'n' values in python soup:

            I guess you want to do something like this:

            \n
            data_copy = list(data)  # you can replace any appearance of data_copy with data if you don't care if it is changed\nwhile data_copy:  # this is equivalent to: while len(data_copy) != 0:\n    to = min(10, len(data_copy))  # If there are less then 10 entries left, the length will be smaller than ten, so that it is either 10 or the (smaller) length. This is the amount of data that's processed\n    f(data_copy[:to])  # make the function call with any value up to 'to'\n    del data_copy[:to]  # delete the data, because we already processed it\n
            \n

            This:

            \n
            def f(x): print(x)\ndata = list(range(53))  # list from 0 (included) to 52 (included)\n# here is the top part\n
            \n

            yields the expected output of

            \n
            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]\n[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]\n[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]\n[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]\n[50, 51, 52]\n
            \n soup wrap:

            I guess you want to do something like this:

            data_copy = list(data)  # you can replace any appearance of data_copy with data if you don't care if it is changed
            while data_copy:  # this is equivalent to: while len(data_copy) != 0:
                to = min(10, len(data_copy))  # If there are less then 10 entries left, the length will be smaller than ten, so that it is either 10 or the (smaller) length. This is the amount of data that's processed
                f(data_copy[:to])  # make the function call with any value up to 'to'
                del data_copy[:to]  # delete the data, because we already processed it
            

            This:

            def f(x): print(x)
            data = list(range(53))  # list from 0 (included) to 52 (included)
            # here is the top part
            

            yields the expected output of

            [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
            [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
            [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
            [30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
            [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
            [50, 51, 52]
            
            qid & accept id: (36849151, 36849258) query: Compare rows then take rows out if neccessary soup:

            UPDATE2: here is ayhan's solution which will work properly:

            \n
            In [135]: df[df.Distance.astype("int64")>=df.Distance.astype("int64").cummax()]\nOut[135]:\n  Area  Distance\n0    1  19626207\n1    2  20174412\n2    3  20174412\n7    8  20195112\n8    9  21127633\n
            \n

            UPDATE:

            \n

            the following solution will NOT always work properly, because it will remove ALL duplicates. So if you will have duplicated values in your original DF they will disappear.

            \n

            Here is an example:

            \n
            In [122]: df\nOut[122]:\n  Area  Distance\n0    1  19626207\n1    2  20174412  # duplicates\n2    3  20174412  # they should BOTH be in the result set\n3    4  19396352\n4    5  19391124\n5    6  19851396\n6    7  19221462\n7    8  20195112\n8    9  21127633\n9   10  19989793\n\nIn [123]: df.loc[df.Distance.cummax().drop_duplicates().index]\nOut[123]:\n  Area  Distance\n0    1  19626207\n1    2  20174412  # one duplicate has been dropped\n7    8  20195112\n8    9  21127633\n
            \n

            PS I'll try to find a working solution

            \n

            OLD answer:

            \n

            i'm not sure whether it's the most efficient method, but it works:

            \n
            In [94]: df.loc[df.Distance.cummax().drop_duplicates().index]\nOut[94]:\n  Area  Distance\n0    1  19626207\n1    2  20174412\n2    3  20175112\n7    8  20195112\n8    9  21127633\n
            \n

            Explanation:

            \n
            In [98]: df.Distance.cummax()\nOut[98]:\n0    19626207\n1    20174412\n2    20175112\n3    20175112\n4    20175112\n5    20175112\n6    20175112\n7    20195112\n8    21127633\n9    21127633\nName: Distance, dtype: object\n
            \n soup wrap:

            UPDATE2: here is ayhan's solution which will work properly:

            In [135]: df[df.Distance.astype("int64")>=df.Distance.astype("int64").cummax()]
            Out[135]:
              Area  Distance
            0    1  19626207
            1    2  20174412
            2    3  20174412
            7    8  20195112
            8    9  21127633
            

            UPDATE:

            the following solution will NOT always work properly, because it will remove ALL duplicates. So if you will have duplicated values in your original DF they will disappear.

            Here is an example:

            In [122]: df
            Out[122]:
              Area  Distance
            0    1  19626207
            1    2  20174412  # duplicates
            2    3  20174412  # they should BOTH be in the result set
            3    4  19396352
            4    5  19391124
            5    6  19851396
            6    7  19221462
            7    8  20195112
            8    9  21127633
            9   10  19989793
            
            In [123]: df.loc[df.Distance.cummax().drop_duplicates().index]
            Out[123]:
              Area  Distance
            0    1  19626207
            1    2  20174412  # one duplicate has been dropped
            7    8  20195112
            8    9  21127633
            

            PS I'll try to find a working solution

            OLD answer:

            i'm not sure whether it's the most efficient method, but it works:

            In [94]: df.loc[df.Distance.cummax().drop_duplicates().index]
            Out[94]:
              Area  Distance
            0    1  19626207
            1    2  20174412
            2    3  20175112
            7    8  20195112
            8    9  21127633
            

            Explanation:

            In [98]: df.Distance.cummax()
            Out[98]:
            0    19626207
            1    20174412
            2    20175112
            3    20175112
            4    20175112
            5    20175112
            6    20175112
            7    20195112
            8    21127633
            9    21127633
            Name: Distance, dtype: object
            
            qid & accept id: (36921573, 36921908) query: Given two numpy arrays of same size, how to apply a function two each pair of elements at identical position? soup:

            The first argument to stats.binom_test may be an array, but the second\nargument to stats.binom_test must be an integer, not an array.

            \n

            So unless x+y (the values passed as the second arguments) contains a lot of\nrepeated values, there is no way to reduce the number of calls to stats.binom_test. \nIn general, you just have to call it once for each element in x and x+y.

            \n

            However, NumPy does have a helper function, np.vectorize, which can make the syntax prettier. np.vectorize returns a function which can take arrays as input and return an array as output. np.vectorize is mainly "for convenience, not for performance". Under the hood it performs a for-loop much like the one you wrote. Thus, the explicit for-loop can be replaced by

            \n
            binom_test = np.vectorize(stats.binom_test)\nresult = binom_test(x, x+y)\n
            \n
            \n
            import numpy as np\nfrom scipy import stats\nnp.random.seed(2016)\nh, w = 3, 4\n\nx=np.random.random_integers(4,9,(h,w))\ny=np.random.random_integers(4,9,(h,w))\n\nresult = np.ones((h,w))\nfor row in range(h):\n    result[row,:] = np.array([stats.binom_test(x[row,_], x[row,_]+y[row,_]) \n                              for _ in range(w)])\n\nbinom_test = np.vectorize(stats.binom_test)\nresult2 = binom_test(x, x+y)\n\nassert np.allclose(result, result2)\nprint(result2)\n
            \n

            yields

            \n
            [[ 1.          0.75390625  0.77441406  0.60723877]\n [ 1.          0.79052734  0.77441406  0.77441406]\n [ 1.          1.          1.          1.        ]]\n
            \n soup wrap:

            The first argument to stats.binom_test may be an array, but the second argument to stats.binom_test must be an integer, not an array.

            So unless x+y (the values passed as the second arguments) contains a lot of repeated values, there is no way to reduce the number of calls to stats.binom_test. In general, you just have to call it once for each element in x and x+y.

            However, NumPy does have a helper function, np.vectorize, which can make the syntax prettier. np.vectorize returns a function which can take arrays as input and return an array as output. np.vectorize is mainly "for convenience, not for performance". Under the hood it performs a for-loop much like the one you wrote. Thus, the explicit for-loop can be replaced by

            binom_test = np.vectorize(stats.binom_test)
            result = binom_test(x, x+y)
            

            import numpy as np
            from scipy import stats
            np.random.seed(2016)
            h, w = 3, 4
            
            x=np.random.random_integers(4,9,(h,w))
            y=np.random.random_integers(4,9,(h,w))
            
            result = np.ones((h,w))
            for row in range(h):
                result[row,:] = np.array([stats.binom_test(x[row,_], x[row,_]+y[row,_]) 
                                          for _ in range(w)])
            
            binom_test = np.vectorize(stats.binom_test)
            result2 = binom_test(x, x+y)
            
            assert np.allclose(result, result2)
            print(result2)
            

            yields

            [[ 1.          0.75390625  0.77441406  0.60723877]
             [ 1.          0.79052734  0.77441406  0.77441406]
             [ 1.          1.          1.          1.        ]]
            
            qid & accept id: (36923865, 36924053) query: Uploading files using Django Admin soup:

            Nothing special to do, django-admin has it already.

            \n

            models.py

            \n
            class Router(models.Model):\n    specifications = models.FileField(upload_to='router_specifications')\n
            \n

            admin.py

            \n
            from django.contrib import admin\nfrom my_app import models\n\nadmin.site.register(models.Router)\n
            \n

            You'll see a file upload field in your model's admin now. If already a file for the model instance, you'll see a link to it as well.

            \n

            When rendering a view to a user, pass the model(s) in the view's context and use the field's url property to link to the file.

            \n
            Download PDF\n
            \n soup wrap:

            Nothing special to do, django-admin has it already.

            models.py

            class Router(models.Model):
                specifications = models.FileField(upload_to='router_specifications')
            

            admin.py

            from django.contrib import admin
            from my_app import models
            
            admin.site.register(models.Router)
            

            You'll see a file upload field in your model's admin now. If already a file for the model instance, you'll see a link to it as well.

            When rendering a view to a user, pass the model(s) in the view's context and use the field's url property to link to the file.

            Download PDF
            
            qid & accept id: (36928577, 36928676) query: How can I get a list of package locations from a PIP requirements file? soup:

            You can look up packages in the PyPI using the XMLRPC API:

            \n
            try:\n    import xmlrpclib  # Python 2\nexcept ImportError:\n    import xmlrpc.client as xmlrpclib  # Python 3\n\npypi = xmlrpclib.ServerProxy('http://pypi.python.org/pypi')\n\npackage_name = "Flask-Login"\n\npackages = pypi.search({"name": package_name})\npackage = next(package for package in packages if package["name"] == package_name)\nrelease_data = pypi.release_data(package_name, package["version"])\n\nprint(package_name)\nprint(package["version"])\nprint(release_data["summary"])\nprint(release_data["home_page"])\n
            \n

            Prints:

            \n
            Flask-Login\n0.3.0\nUser session management for Flask\nhttps://github.com/maxcountryman/flask-login\n
            \n soup wrap:

            You can look up packages in the PyPI using the XMLRPC API:

            try:
                import xmlrpclib  # Python 2
            except ImportError:
                import xmlrpc.client as xmlrpclib  # Python 3
            
            pypi = xmlrpclib.ServerProxy('http://pypi.python.org/pypi')
            
            package_name = "Flask-Login"
            
            packages = pypi.search({"name": package_name})
            package = next(package for package in packages if package["name"] == package_name)
            release_data = pypi.release_data(package_name, package["version"])
            
            print(package_name)
            print(package["version"])
            print(release_data["summary"])
            print(release_data["home_page"])
            

            Prints:

            Flask-Login
            0.3.0
            User session management for Flask
            https://github.com/maxcountryman/flask-login
            
            qid & accept id: (36935617, 36935678) query: Remove following duplicates in a tuple soup:
            In [12]: seen = set()\n\nIn [13]: [x if x not in seen and not seen.add(x) else '' for x in ax]\nOut[13]: ['0', '1', '', '', '2', '', '', '3']\n
            \n

            This is a slightly modified version of a uniquifier suggested by Dave Kirby, here.

            \n
            \n

            seen.add(x) adds x to the set seen. The seen.add method returns None. So\nin a boolean context, (since bool(None) is False), not seen.add(x) is always True. Therefore the condition

            \n
            x not in seen and not seen.add(x)\n
            \n

            has a boolean value equal to

            \n
            x not in seen and True\n
            \n

            which is equivalent to

            \n
            x not in seen\n
            \n

            So the conditional expression

            \n
            x if x not in seen and not seen.add(x) else ''\n
            \n

            returns x if x is not already in seen and returns '' if x is already in seen (and x then gets added to seen). If x not in seen is False (that is, if x is already in seen) then seen.add(x) is not called because Python's and short-circuits -- any expression of the form False and something is automatically False without one having to evaluate something.

            \n
            \n

            This could also be written, not as succinctly, but without the complexity, as

            \n
            def replace_dupes(ax):\n    result = []\n    seen = set()\n    for x in ax:\n        if x in seen:\n            result.append('')\n        else:\n            seen.add(x)\n            result.append(x)\n    return result\n\nax = ('0','1','1','1','2','2','2','3')\nprint(replace_dupes(ax))\n# ['0', '1', '', '', '2', '', '', '3']\n
            \n soup wrap:
            In [12]: seen = set()
            
            In [13]: [x if x not in seen and not seen.add(x) else '' for x in ax]
            Out[13]: ['0', '1', '', '', '2', '', '', '3']
            

            This is a slightly modified version of a uniquifier suggested by Dave Kirby, here.


            seen.add(x) adds x to the set seen. The seen.add method returns None. So in a boolean context, (since bool(None) is False), not seen.add(x) is always True. Therefore the condition

            x not in seen and not seen.add(x)
            

            has a boolean value equal to

            x not in seen and True
            

            which is equivalent to

            x not in seen
            

            So the conditional expression

            x if x not in seen and not seen.add(x) else ''
            

            returns x if x is not already in seen and returns '' if x is already in seen (and x then gets added to seen). If x not in seen is False (that is, if x is already in seen) then seen.add(x) is not called because Python's and short-circuits -- any expression of the form False and something is automatically False without one having to evaluate something.


            This could also be written, not as succinctly, but without the complexity, as

            def replace_dupes(ax):
                result = []
                seen = set()
                for x in ax:
                    if x in seen:
                        result.append('')
                    else:
                        seen.add(x)
                        result.append(x)
                return result
            
            ax = ('0','1','1','1','2','2','2','3')
            print(replace_dupes(ax))
            # ['0', '1', '', '', '2', '', '', '3']
            
            qid & accept id: (36939122, 36939477) query: Average of key values in a list of dictionaries soup:

            You can use zip and numpy functions mean and round for this task:

            \n
            In [8]: import numpy as np\n\nIn [9]:  [dict(zip(d.keys(), [int(np.round(np.mean(d.values())))])) for d in L]\n\n#Out[9]: [{'Eva': 5}, {'Ana': 53}, {'Ada': 12}]\n
            \n

            Version with "less" parenthesis:

            \n
            [dict(zip(d.keys(), [np.array(d.values()).mean().round().astype(int)])) for d in L]\n
            \n soup wrap:

            You can use zip and numpy functions mean and round for this task:

            In [8]: import numpy as np
            
            In [9]:  [dict(zip(d.keys(), [int(np.round(np.mean(d.values())))])) for d in L]
            
            #Out[9]: [{'Eva': 5}, {'Ana': 53}, {'Ada': 12}]
            

            Version with "less" parenthesis:

            [dict(zip(d.keys(), [np.array(d.values()).mean().round().astype(int)])) for d in L]
            
            qid & accept id: (36949277, 36949412) query: Ordering a nested dictionary by the frequency of the nested value soup:

            You could use Counter to order the key pairs based on their frequency. It also provides an easy way to get x most frequent items:

            \n
            from collections import Counter\n\nd = {\n    'KEY1': {\n        'key2_1': 5,\n        'key2_2': 1,\n        'key2_3': 3\n    },\n    'KEY2': {\n        'key2_1': 2,\n        'key2_2': 3,\n        'key2_3': 4\n    }\n}\n\nc = Counter()\nfor k, v in d.iteritems():\n    c.update({(k, k1): v1 for k1, v1 in v.iteritems()})\n\nprint c.most_common(3)\n
            \n

            Output:

            \n
            [(('KEY1', 'key2_1'), 5), (('KEY2', 'key2_3'), 4), (('KEY2', 'key2_2'), 3)]\n
            \n

            If you only care about the most common key pairs and have no other reason to build nested dictionary you could just use the following code:

            \n
            from collections import Counter\n\nl = ['foobar', 'foofoo', 'foobar', 'barfoo']\nD = Counter((v[:3], v[3:]) for v in l)\nprint D.most_common() # [(('foo', 'bar'), 2), (('foo', 'foo'), 1), (('bar', 'foo'), 1)]\n
            \n

            Short explanation: ((v[:3], v[3:]) for v in l) is a generator expression that will generate tuples where first item is the same as top level key in your original dict and second item is the same as key in nested dict.

            \n
            >>> x = list((v[:3], v[3:]) for v in l)\n>>> x\n[('foo', 'bar'), ('foo', 'foo'), ('foo', 'bar'), ('bar', 'foo')]\n
            \n

            Counter is a subclass of dict. It accepts an iterable as an argument and each unique element in iterable will be used as key and value is the count of element in the iterable.

            \n
            >>> c = Counter(x)\n>>> c\nCounter({('foo', 'bar'): 2, ('foo', 'foo'): 1, ('bar', 'foo'): 1})\n
            \n

            Since generator expression is an iterable there's no need to convert it to list in between so construction can simply be done with Counter((v[:3], v[3:]) for v in l).

            \n

            The if statements you asked about are checking if the key exists in dict:

            \n
            >>> d = {1: 'foo'}\n>>> 1 in d\nTrue\n>>> 2 in d\nFalse\n
            \n

            So the following code will check if key with value of id exists in dict D and if it doesn't it will assign empty dict there.

            \n
            if id not in D:\n    D[id] = {}\n
            \n

            The second if does exactly the same for nested dictionaries.

            \n soup wrap:

            You could use Counter to order the key pairs based on their frequency. It also provides an easy way to get x most frequent items:

            from collections import Counter
            
            d = {
                'KEY1': {
                    'key2_1': 5,
                    'key2_2': 1,
                    'key2_3': 3
                },
                'KEY2': {
                    'key2_1': 2,
                    'key2_2': 3,
                    'key2_3': 4
                }
            }
            
            c = Counter()
            for k, v in d.iteritems():
                c.update({(k, k1): v1 for k1, v1 in v.iteritems()})
            
            print c.most_common(3)
            

            Output:

            [(('KEY1', 'key2_1'), 5), (('KEY2', 'key2_3'), 4), (('KEY2', 'key2_2'), 3)]
            

            If you only care about the most common key pairs and have no other reason to build nested dictionary you could just use the following code:

            from collections import Counter
            
            l = ['foobar', 'foofoo', 'foobar', 'barfoo']
            D = Counter((v[:3], v[3:]) for v in l)
            print D.most_common() # [(('foo', 'bar'), 2), (('foo', 'foo'), 1), (('bar', 'foo'), 1)]
            

            Short explanation: ((v[:3], v[3:]) for v in l) is a generator expression that will generate tuples where first item is the same as top level key in your original dict and second item is the same as key in nested dict.

            >>> x = list((v[:3], v[3:]) for v in l)
            >>> x
            [('foo', 'bar'), ('foo', 'foo'), ('foo', 'bar'), ('bar', 'foo')]
            

            Counter is a subclass of dict. It accepts an iterable as an argument and each unique element in iterable will be used as key and value is the count of element in the iterable.

            >>> c = Counter(x)
            >>> c
            Counter({('foo', 'bar'): 2, ('foo', 'foo'): 1, ('bar', 'foo'): 1})
            

            Since generator expression is an iterable there's no need to convert it to list in between so construction can simply be done with Counter((v[:3], v[3:]) for v in l).

            The if statements you asked about are checking if the key exists in dict:

            >>> d = {1: 'foo'}
            >>> 1 in d
            True
            >>> 2 in d
            False
            

            So the following code will check if key with value of id exists in dict D and if it doesn't it will assign empty dict there.

            if id not in D:
                D[id] = {}
            

            The second if does exactly the same for nested dictionaries.

            qid & accept id: (36950503, 36952191) query: How to find the average of previous sales at each time in python soup:

            It was quite tricky for me but works anyhow. Expecting more elegant solution from others.

            \n
            import pandas as pd\nimport datetime\n\ndateparse = lambda x: pd.datetime.strptime(x, '%m/%d/%Y')\ndf = pd.read_csv('Sample.csv',index_col='date', parse_dates=[0], date_parser=dateparse)\n\nexpd_gb = df.reset_index().groupby(['wholesaler', 'product'])['sales'].apply(pd.Series.expanding)\nidx = df.reset_index().groupby(['wholesaler', 'product', 'date'])['sales'].count().index\n\ncnct = pd.concat([expd_gb.iloc[n].mean().shift(1) for n in range(len(expd_gb))])\ncnct.index = idx\n\ncnct.to_csv('TotalAvg.csv')\n
            \n

            Result,

            \n
            wholesaler  product  date      \n11209       UME24    2013-12-31     NaN\n13131       UPE55    2012-12-31     NaN\n                     2013-02-23     1.0\n                     2013-04-24     578.5\n52237       UPE54    2013-12-18     NaN\n                     2013-12-31     9.0\n53929       UME24    2013-12-19     NaN\n            UPE54    2012-12-31     NaN\n82204       UPE55    2012-12-31     NaN\n83389       UPE54    2013-12-01     NaN\n                     2013-12-17     9.0\n
            \n soup wrap:

            It was quite tricky for me but works anyhow. Expecting more elegant solution from others.

            import pandas as pd
            import datetime
            
            dateparse = lambda x: pd.datetime.strptime(x, '%m/%d/%Y')
            df = pd.read_csv('Sample.csv',index_col='date', parse_dates=[0], date_parser=dateparse)
            
            expd_gb = df.reset_index().groupby(['wholesaler', 'product'])['sales'].apply(pd.Series.expanding)
            idx = df.reset_index().groupby(['wholesaler', 'product', 'date'])['sales'].count().index
            
            cnct = pd.concat([expd_gb.iloc[n].mean().shift(1) for n in range(len(expd_gb))])
            cnct.index = idx
            
            cnct.to_csv('TotalAvg.csv')
            

            Result,

            wholesaler  product  date      
            11209       UME24    2013-12-31     NaN
            13131       UPE55    2012-12-31     NaN
                                 2013-02-23     1.0
                                 2013-04-24     578.5
            52237       UPE54    2013-12-18     NaN
                                 2013-12-31     9.0
            53929       UME24    2013-12-19     NaN
                        UPE54    2012-12-31     NaN
            82204       UPE55    2012-12-31     NaN
            83389       UPE54    2013-12-01     NaN
                                 2013-12-17     9.0
            
            qid & accept id: (36967883, 36968772) query: Sorting data from a csv alphabetically, highest to lowest and average soup:

            The first step would be to break down the problem into small steps:

            \n
              \n
            1. How to open and handle the file (using the with statement at the bottom of that section)
            2. \n
            3. How to traverse a csv file
            4. \n
            5. How to sort the entries
            6. \n
            7. How to sort by the second value of each row
            8. \n
            9. How to print each element of a list on a separate line
            10. \n
            11. How to count total scores
            12. \n
            \n

            Expanding on the last one you can total up the scores as well as the number of entries for each name like this:

            \n
            import csv\nimport collections\n...\nwith open(path) as f:\n    entries = collections.Counter()\n    total_scores = collections.Counter()\n    for name,score in csv.reader(f):\n        total_scores[name] += int(score)\n        entries[name] += 1\n
            \n

            Then you can calculate the average score for each person with total_scores[name] / entries[name]

            \n
            for name in sorted(entries):\n    ave_score = total_scores[name] / entries[name]\n    print(name,ave_score) #sep=", ")\n
            \n

            the other two actions are quite simple with a few of the steps listed above.

            \n
            import csv\nimport collections\nfrom operator import itemgetter\n\n...\n\nif sort_int == 1:\n    with open(path) as f:\n        reader = csv.reader(f)\n        for name, score in sorted(reader):\n            print(name,score)\n\nelif sort_int == 2:\n    with open(path) as f:\n        entries = sorted(csv.reader(f), \n                         key=itemgetter(1), \n                         reverse=True)\n        for name,score in entries:\n            print(name,score)\n\nelif sort_int == 3:\n    with open(path) as f:\n        entries = collections.Counter()\n        total_scores = collections.Counter()\n        for name,score in csv.reader(f):\n            score = int(score)\n            total_scores[name] += score\n            entries[name] += 1\n\n        for name in sorted(entries):\n            ave_score = total_scores[name] / entries[name]\n            print(name,ave_score)\n
            \n

            If you want to apply the highest to lowest to the average scores then you will need to make a reference to all the averages such as a dict:

            \n
            ave_scores = {}\nfor name in sorted(entries):\n    ave_score = total_scores[name] / entries[name]\n    ave_scores[name] = ave_score\n\nfor name,ave_score in sorted(ave_scores.items(), key = itemgetter(1), reversed=True):\n    print(name,ave_score)\n
            \n soup wrap:

            The first step would be to break down the problem into small steps:

            1. How to open and handle the file (using the with statement at the bottom of that section)
            2. How to traverse a csv file
            3. How to sort the entries
            4. How to sort by the second value of each row
            5. How to print each element of a list on a separate line
            6. How to count total scores

            Expanding on the last one you can total up the scores as well as the number of entries for each name like this:

            import csv
            import collections
            ...
            with open(path) as f:
                entries = collections.Counter()
                total_scores = collections.Counter()
                for name,score in csv.reader(f):
                    total_scores[name] += int(score)
                    entries[name] += 1
            

            Then you can calculate the average score for each person with total_scores[name] / entries[name]

            for name in sorted(entries):
                ave_score = total_scores[name] / entries[name]
                print(name,ave_score) #sep=", ")
            

            the other two actions are quite simple with a few of the steps listed above.

            import csv
            import collections
            from operator import itemgetter
            
            ...
            
            if sort_int == 1:
                with open(path) as f:
                    reader = csv.reader(f)
                    for name, score in sorted(reader):
                        print(name,score)
            
            elif sort_int == 2:
                with open(path) as f:
                    entries = sorted(csv.reader(f), 
                                     key=itemgetter(1), 
                                     reverse=True)
                    for name,score in entries:
                        print(name,score)
            
            elif sort_int == 3:
                with open(path) as f:
                    entries = collections.Counter()
                    total_scores = collections.Counter()
                    for name,score in csv.reader(f):
                        score = int(score)
                        total_scores[name] += score
                        entries[name] += 1
            
                    for name in sorted(entries):
                        ave_score = total_scores[name] / entries[name]
                        print(name,ave_score)
            

            If you want to apply the highest to lowest to the average scores then you will need to make a reference to all the averages such as a dict:

            ave_scores = {}
            for name in sorted(entries):
                ave_score = total_scores[name] / entries[name]
                ave_scores[name] = ave_score
            
            for name,ave_score in sorted(ave_scores.items(), key = itemgetter(1), reversed=True):
                print(name,ave_score)
            
            qid & accept id: (36971201, 36972823) query: map array of numbers to rank efficiently in Python soup:

            Here is an efficient solution and a comparison with the solution using index (the index solution is also not correct with the added (edit 3) restriction to the question)

            \n
            import numpy as np\n\ndef rank1(x):\n    # Sort values i = 0, 1, 2, .. using x[i] as key\n    y = sorted(range(len(x)), key = lambda i: x[i])\n    # Map each value of x to a rank. If a value is already associated with a\n    # rank, the rank is updated. Iterate in reversed order so we get the\n    # smallest rank for each value.\n    rank = { x[y[i]]: i for i in xrange(len(y) -1, -1 , -1) }\n    # Remove gaps in the ranks\n    kv = sorted(rank.iteritems(), key = lambda p: p[1])\n    for i in range(len(kv)):\n        kv[i] = (kv[i][0], i)\n    rank = { p[0]: p[1] for p in kv }\n    # Pre allocate a array to fill with ranks\n    r = np.zeros((len(x),), dtype=np.int)\n    for i, v in enumerate(x):\n        r[i] = rank[v]\n    return r\n\ndef rank2(x):\n    x_sorted = sorted(x)\n    # creates a new list to preserve x\n    rank = list(x)\n    for v in x_sorted:\n        rank[rank.index(v)] = x_sorted.index(v)\n    return rank\n
            \n

            Comparison results

            \n
            >>> d = np.arange(1000)\n>>> random.shuffle(d)\n>>> %timeit rank1(d)\n100 loops, best of 3: 1.97 ms per loop\n>>> %timeit rank2(d)\n1 loops, best of 3: 226 ms per loop\n\n>>> d = np.arange(10000)\n>>> random.shuffle(d)\n>>> %timeit rank1(d)\n10 loops, best of 3: 32 ms per loop\n>>> %timeit rank2(d)\n1 loops, best of 3: 24.4 s per loop\n\n>>> d = np.arange(100000)\n>>> random.shuffle(d)\n>>> %timeit rank1(d)\n1 loops, best of 3: 433 ms per loop\n\n>>> d = np.arange(2000000)\n>>> random.shuffle(d)\n>>> %timeit rank1(d)\n1 loops, best of 3: 11.2 s per loop\n
            \n

            The problem with the index solution is that the time complexity is O(n^2). The time complexity of my solution is O(n lg n), that is, the sort time.

            \n soup wrap:

            Here is an efficient solution and a comparison with the solution using index (the index solution is also not correct with the added (edit 3) restriction to the question)

            import numpy as np
            
            def rank1(x):
                # Sort values i = 0, 1, 2, .. using x[i] as key
                y = sorted(range(len(x)), key = lambda i: x[i])
                # Map each value of x to a rank. If a value is already associated with a
                # rank, the rank is updated. Iterate in reversed order so we get the
                # smallest rank for each value.
                rank = { x[y[i]]: i for i in xrange(len(y) -1, -1 , -1) }
                # Remove gaps in the ranks
                kv = sorted(rank.iteritems(), key = lambda p: p[1])
                for i in range(len(kv)):
                    kv[i] = (kv[i][0], i)
                rank = { p[0]: p[1] for p in kv }
                # Pre allocate a array to fill with ranks
                r = np.zeros((len(x),), dtype=np.int)
                for i, v in enumerate(x):
                    r[i] = rank[v]
                return r
            
            def rank2(x):
                x_sorted = sorted(x)
                # creates a new list to preserve x
                rank = list(x)
                for v in x_sorted:
                    rank[rank.index(v)] = x_sorted.index(v)
                return rank
            

            Comparison results

            >>> d = np.arange(1000)
            >>> random.shuffle(d)
            >>> %timeit rank1(d)
            100 loops, best of 3: 1.97 ms per loop
            >>> %timeit rank2(d)
            1 loops, best of 3: 226 ms per loop
            
            >>> d = np.arange(10000)
            >>> random.shuffle(d)
            >>> %timeit rank1(d)
            10 loops, best of 3: 32 ms per loop
            >>> %timeit rank2(d)
            1 loops, best of 3: 24.4 s per loop
            
            >>> d = np.arange(100000)
            >>> random.shuffle(d)
            >>> %timeit rank1(d)
            1 loops, best of 3: 433 ms per loop
            
            >>> d = np.arange(2000000)
            >>> random.shuffle(d)
            >>> %timeit rank1(d)
            1 loops, best of 3: 11.2 s per loop
            

            The problem with the index solution is that the time complexity is O(n^2). The time complexity of my solution is O(n lg n), that is, the sort time.

            qid & accept id: (36971758, 36971942) query: Python handling newline and tab characters when writing to file soup:

            You can use str.encode:

            \n
            with open('test.cpp', 'a') as out:\n    print(test_str.encode('unicode_escape').decode('utf-8'), file=out)\n
            \n

            This'll escape all the Python recognised special escape characters.

            \n

            Given your example:

            \n
            >>> test_str = "/*\n test.cpp\n *\n *\n *\n\t2013.02.30\n *\n */\n"\n>>> test_str.encode('unicode_escape')\nb'/*\\n test.cpp\\n *\\n *\\n *\\n\\t2013.02.30\\n *\\n */\\n'\n
            \n soup wrap:

            You can use str.encode:

            with open('test.cpp', 'a') as out:
                print(test_str.encode('unicode_escape').decode('utf-8'), file=out)
            

            This'll escape all the Python recognised special escape characters.

            Given your example:

            >>> test_str = "/*\n test.cpp\n *\n *\n *\n\t2013.02.30\n *\n */\n"
            >>> test_str.encode('unicode_escape')
            b'/*\\n test.cpp\\n *\\n *\\n *\\n\\t2013.02.30\\n *\\n */\\n'
            
            qid & accept id: (36974140, 36982073) query: Scrapy xpath get text of an element that starts with < soup:

            Considering that Scrapy uses lxml under the hood, it might worth inspecting how lxml handles this kind of HTML, which contains XML special character < in one of the text nodes :

            \n
            >>> from lxml import html\n>>> raw = '''
            \n...
            \n... Recommended length of visit:\n... <1 hour\n...
            \n...
            \n... Fee:\n... No\n...
            \n...
            '''\n... \n>>> root = html.fromstring(raw)\n>>> print html.tostring(root)\n
            \n
            \n Recommended length of visit:\n\n
            \n Fee:\n No\n
            \n
            \n
            \n

            Notice in the above demo, as you suspected, text node '<1 hour' is gone completely from the root element source. As a workaround, consider using BeautifulSoup since it is more reasonable in handling this HTML case (you can pass response.body_as_unicode() to create the soup from Scrapy response) :

            \n
            >>> from bs4 import BeautifulSoup\n>>> soup = BeautifulSoup(raw, "html.parser")\n>>> print soup.prettify()\n
            \n
            \n \n Recommended length of visit:\n \n <1 hour\n
            \n
            \n \n Fee:\n \n No\n
            \n
            \n
            \n

            Finding the target text node using BS can be done as follow :

            \n
            >>> soup.find('b', text='Recommended length of visit:').next_sibling\nu'\n    <1 hour\n'\n
            \n soup wrap:

            Considering that Scrapy uses lxml under the hood, it might worth inspecting how lxml handles this kind of HTML, which contains XML special character < in one of the text nodes :

            >>> from lxml import html
            >>> raw = '''
            ...
            ... Recommended length of visit: ... <1 hour ...
            ...
            ... Fee: ... No ...
            ...
            ''' ... >>> root = html.fromstring(raw) >>> print html.tostring(root)
            Recommended length of visit:
            Fee: No

            Notice in the above demo, as you suspected, text node '<1 hour' is gone completely from the root element source. As a workaround, consider using BeautifulSoup since it is more reasonable in handling this HTML case (you can pass response.body_as_unicode() to create the soup from Scrapy response) :

            >>> from bs4 import BeautifulSoup
            >>> soup = BeautifulSoup(raw, "html.parser")
            >>> print soup.prettify()
            
            Recommended length of visit: <1 hour
            Fee: No

            Finding the target text node using BS can be done as follow :

            >>> soup.find('b', text='Recommended length of visit:').next_sibling
            u'\n    <1 hour\n'
            
            qid & accept id: (36988306, 36989120) query: Pandas check for future condition by group soup:

            Setup

            \n
            from StringIO import StringIO\nimport pandas as pd\n\ntext = """id      date        item\n1    2000-01-01     'foo'\n1    2000-01-02     'pants'\n1    2000-01-03     'bar'\n2    2000-01-02     'organ'\n2    2000-02-01     'beef'\n3    2000-01-01     'pants'\n3    2000-01-10     'oranges'\n3    2000-02-20     'pants'"""\n\ndf = pd.read_csv(StringIO(text), delim_whitespace=True, parse_dates=[1])\n
            \n

            Solution

            \n

            I'm using nested apply

            \n
            def check_future_pants(x, df):\n    date_condition = x.date < df.date\n    pant_condition = df.item == "'pants'"\n    return (date_condition & pant_condition).any()\n\ndef check_df_pants(df):\n    return df.apply(lambda x: check_future_pants(x, df), axis=1)\n\ndf['will_buy_pants'] = df.groupby('id', group_keys=False).apply(check_df_pants)\n
            \n

            Demonstration / Explanation

            \n
            # Let's start with a sub-group\ndf1 = df[df.id == 1].copy()\n\nprint df1.apply(lambda x: check_future_pants(x, df1), axis=1)\n\n0     True\n1    False\n2    False\ndtype: bool\n
            \n

            This works for one group but the checking I do works on a DataFrame so I perform a nested apply with another checking function check_df_pants.

            \n
            df['will_buy_pants'] = df.groupby('id', group_keys=False).apply(check_df_pants)\npring df\n\n   id       date       item will_buy_pants\n0   1 2000-01-01      'foo'           True\n1   1 2000-01-02    'pants'          False\n2   1 2000-01-03      'bar'          False\n3   2 2000-01-02    'organ'          False\n4   2 2000-02-01     'beef'          False\n5   3 2000-01-01    'pants'           True\n6   3 2000-01-10  'oranges'           True\n7   3 2000-02-20    'pants'          False\n
            \n soup wrap:

            Setup

            from StringIO import StringIO
            import pandas as pd
            
            text = """id      date        item
            1    2000-01-01     'foo'
            1    2000-01-02     'pants'
            1    2000-01-03     'bar'
            2    2000-01-02     'organ'
            2    2000-02-01     'beef'
            3    2000-01-01     'pants'
            3    2000-01-10     'oranges'
            3    2000-02-20     'pants'"""
            
            df = pd.read_csv(StringIO(text), delim_whitespace=True, parse_dates=[1])
            

            Solution

            I'm using nested apply

            def check_future_pants(x, df):
                date_condition = x.date < df.date
                pant_condition = df.item == "'pants'"
                return (date_condition & pant_condition).any()
            
            def check_df_pants(df):
                return df.apply(lambda x: check_future_pants(x, df), axis=1)
            
            df['will_buy_pants'] = df.groupby('id', group_keys=False).apply(check_df_pants)
            

            Demonstration / Explanation

            # Let's start with a sub-group
            df1 = df[df.id == 1].copy()
            
            print df1.apply(lambda x: check_future_pants(x, df1), axis=1)
            
            0     True
            1    False
            2    False
            dtype: bool
            

            This works for one group but the checking I do works on a DataFrame so I perform a nested apply with another checking function check_df_pants.

            df['will_buy_pants'] = df.groupby('id', group_keys=False).apply(check_df_pants)
            pring df
            
               id       date       item will_buy_pants
            0   1 2000-01-01      'foo'           True
            1   1 2000-01-02    'pants'          False
            2   1 2000-01-03      'bar'          False
            3   2 2000-01-02    'organ'          False
            4   2 2000-02-01     'beef'          False
            5   3 2000-01-01    'pants'           True
            6   3 2000-01-10  'oranges'           True
            7   3 2000-02-20    'pants'          False
            
            qid & accept id: (37018019, 37037183) query: Python 3 concurrent.futures and per-thread initialization soup:

            So, it seems that a simple solution to my problem is to use threading.local to store a per-thread "session" (in the mockup below, just a random int). Perhaps not the cleanest I guess but for now it will do. Here is a mockup (Python 3.5.1):

            \n
            import time\nimport threading\nimport concurrent.futures\nimport random\nimport logging\n\nlogging.basicConfig(level=logging.DEBUG, format='(%(threadName)-0s) %(relativeCreated)d - %(message)s')\n\nx = [0.1, 0.1, 0.2, 0.4, 1.0, 0.1, 0.0]\n\nmydata = threading.local()\n\ndef do_work(secs):\n    if 'session' in mydata.__dict__:\n        logging.debug('re-using session "{}"'.format(mydata.session))\n    else:\n        mydata.session = random.randint(0,1000)\n        logging.debug('created new session: "{}"'.format(mydata.session))\n    time.sleep(secs)\n    logging.debug('slept for {} seconds'.format(secs))\n    return secs\n\nwith concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:\n    y = executor.map(do_work, x)\n\nprint(list(y))\n
            \n

            Produces the following output, showing that "sessions" are indeed local to each thread and reused:

            \n
            (Thread-1) 29 - created new session: "855"\n(Thread-2) 29 - created new session: "58"\n(Thread-3) 30 - created new session: "210"\n(Thread-1) 129 - slept for 0.1 seconds\n(Thread-1) 130 - re-using session "855"\n(Thread-2) 130 - slept for 0.1 seconds\n(Thread-2) 130 - re-using session "58"\n(Thread-3) 230 - slept for 0.2 seconds\n(Thread-3) 230 - re-using session "210"\n(Thread-3) 331 - slept for 0.1 seconds\n(Thread-3) 331 - re-using session "210"\n(Thread-3) 331 - slept for 0.0 seconds\n(Thread-1) 530 - slept for 0.4 seconds\n(Thread-2) 1131 - slept for 1.0 seconds\n[0.1, 0.1, 0.2, 0.4, 1.0, 0.1, 0.0]\n
            \n

            Minor note about logging: in order to use this in an IPython notebook, the logging setup needs to be slightly modified (since IPython has already setup a root logger). A more robust logging setup would be:

            \n
            IN_IPYNB = 'get_ipython' in vars()\n\nif IN_IPYNB:\n    logger = logging.getLogger()\n    logger.setLevel(logging.DEBUG)\n    for h in logger.handlers:\n        h.setFormatter(logging.Formatter(\n                '(%(threadName)-0s) %(relativeCreated)d - %(message)s'))\nelse:\n    logging.basicConfig(level=logging.DEBUG, format='(%(threadName)-0s) %(relativeCreated)d - %(message)s')\n
            \n soup wrap:

            So, it seems that a simple solution to my problem is to use threading.local to store a per-thread "session" (in the mockup below, just a random int). Perhaps not the cleanest I guess but for now it will do. Here is a mockup (Python 3.5.1):

            import time
            import threading
            import concurrent.futures
            import random
            import logging
            
            logging.basicConfig(level=logging.DEBUG, format='(%(threadName)-0s) %(relativeCreated)d - %(message)s')
            
            x = [0.1, 0.1, 0.2, 0.4, 1.0, 0.1, 0.0]
            
            mydata = threading.local()
            
            def do_work(secs):
                if 'session' in mydata.__dict__:
                    logging.debug('re-using session "{}"'.format(mydata.session))
                else:
                    mydata.session = random.randint(0,1000)
                    logging.debug('created new session: "{}"'.format(mydata.session))
                time.sleep(secs)
                logging.debug('slept for {} seconds'.format(secs))
                return secs
            
            with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
                y = executor.map(do_work, x)
            
            print(list(y))
            

            Produces the following output, showing that "sessions" are indeed local to each thread and reused:

            (Thread-1) 29 - created new session: "855"
            (Thread-2) 29 - created new session: "58"
            (Thread-3) 30 - created new session: "210"
            (Thread-1) 129 - slept for 0.1 seconds
            (Thread-1) 130 - re-using session "855"
            (Thread-2) 130 - slept for 0.1 seconds
            (Thread-2) 130 - re-using session "58"
            (Thread-3) 230 - slept for 0.2 seconds
            (Thread-3) 230 - re-using session "210"
            (Thread-3) 331 - slept for 0.1 seconds
            (Thread-3) 331 - re-using session "210"
            (Thread-3) 331 - slept for 0.0 seconds
            (Thread-1) 530 - slept for 0.4 seconds
            (Thread-2) 1131 - slept for 1.0 seconds
            [0.1, 0.1, 0.2, 0.4, 1.0, 0.1, 0.0]
            

            Minor note about logging: in order to use this in an IPython notebook, the logging setup needs to be slightly modified (since IPython has already setup a root logger). A more robust logging setup would be:

            IN_IPYNB = 'get_ipython' in vars()
            
            if IN_IPYNB:
                logger = logging.getLogger()
                logger.setLevel(logging.DEBUG)
                for h in logger.handlers:
                    h.setFormatter(logging.Formatter(
                            '(%(threadName)-0s) %(relativeCreated)d - %(message)s'))
            else:
                logging.basicConfig(level=logging.DEBUG, format='(%(threadName)-0s) %(relativeCreated)d - %(message)s')
            
            qid & accept id: (37042635, 37042849) query: How to make a test function using pytest soup:

            Try this:

            \n
            def test_added():\n    assert added(4, 6, 7) == (13, 11, 10)\n
            \n

            Then execute your test function. If all test are correct, you should get something like:

            \n
            1 passed in x.xx seconds\n
            \n

            Check the docs for more help.

            \n soup wrap:

            Try this:

            def test_added():
                assert added(4, 6, 7) == (13, 11, 10)
            

            Then execute your test function. If all test are correct, you should get something like:

            1 passed in x.xx seconds
            

            Check the docs for more help.

            qid & accept id: (37048689, 37330835) query: Abaqus: script to select elements on a surface soup:

            define a face set on the part or assembly:

            \n
              part.Set('facename',faces=part.faces.findAt(((1,0,0),),))\n
            \n

            where (1,0,0) is a coordinate anywhere on the face. (Don't use a point on a edge/corner though)

            \n

            then after meshing you can access the elements attached to that face, something like:

            \n
              instance.sets['facename'].elements\n
            \n

            note if you want to get those elements on the odb after running an analysis it is a little different:

            \n
              instance.elementSets['FACENAME'].elements\n
            \n

            note that the set name is upcased on the odb..

            \n soup wrap:

            define a face set on the part or assembly:

              part.Set('facename',faces=part.faces.findAt(((1,0,0),),))
            

            where (1,0,0) is a coordinate anywhere on the face. (Don't use a point on a edge/corner though)

            then after meshing you can access the elements attached to that face, something like:

              instance.sets['facename'].elements
            

            note if you want to get those elements on the odb after running an analysis it is a little different:

              instance.elementSets['FACENAME'].elements
            

            note that the set name is upcased on the odb..

            qid & accept id: (37079175, 37081693) query: How to remove a column from a structured numpy array *without copying it*? soup:

            You can create a new data type containing just the fields that you want, with the same field offsets and the same itemsize as the original array's data type, and then use this new data type to create a view of the original array. The dtype function handles arguments with many formats; the relevant one is described in the section of the documentation called "Specifying and constructing data types". Scroll down to the subsection that begins with

            \n
            {'names': ..., 'formats': ..., 'offsets': ..., 'titles': ..., 'itemsize': ...}\n
            \n

            Here are a couple convenience functions that use this idea.

            \n
            import numpy as np\n\n\ndef view_fields(a, names):\n    """\n    `a` must be a numpy structured array.\n    `names` is the collection of field names to keep.\n\n    Returns a view of the array `a` (not a copy).\n    """\n    dt = a.dtype\n    formats = [dt.fields[name][0] for name in names]\n    offsets = [dt.fields[name][1] for name in names]\n    itemsize = a.dtype.itemsize\n    newdt = np.dtype(dict(names=names,\n                          formats=formats,\n                          offsets=offsets,\n                          itemsize=itemsize))\n    b = a.view(newdt)\n    return b\n\n\ndef remove_fields(a, names):\n    """\n    `a` must be a numpy structured array.\n    `names` is the collection of field names to remove.\n\n    Returns a view of the array `a` (not a copy).\n    """\n    dt = a.dtype\n    keep_names = [name for name in dt.names if name not in names]\n    return view_fields(a, keep_names)\n
            \n

            For example,

            \n
            In [297]: a\nOut[297]: \narray([(10.0, 13.5, 1248, -2), (20.0, 0.0, 0, 0), (30.0, 0.0, 0, 0),\n       (40.0, 0.0, 0, 0), (50.0, 0.0, 0, 999)], \n      dtype=[('x', '
            \n

            Verify that b is a view (not a copy) of a by changing b[0]['x']...

            \n
            In [300]: b[0]['x'] = 3.14\n
            \n

            and seeing that a is also changed:

            \n
            In [301]: a[0]\nOut[301]: (3.14, 13.5, 1248, -2)\n
            \n soup wrap:

            You can create a new data type containing just the fields that you want, with the same field offsets and the same itemsize as the original array's data type, and then use this new data type to create a view of the original array. The dtype function handles arguments with many formats; the relevant one is described in the section of the documentation called "Specifying and constructing data types". Scroll down to the subsection that begins with

            {'names': ..., 'formats': ..., 'offsets': ..., 'titles': ..., 'itemsize': ...}
            

            Here are a couple convenience functions that use this idea.

            import numpy as np
            
            
            def view_fields(a, names):
                """
                `a` must be a numpy structured array.
                `names` is the collection of field names to keep.
            
                Returns a view of the array `a` (not a copy).
                """
                dt = a.dtype
                formats = [dt.fields[name][0] for name in names]
                offsets = [dt.fields[name][1] for name in names]
                itemsize = a.dtype.itemsize
                newdt = np.dtype(dict(names=names,
                                      formats=formats,
                                      offsets=offsets,
                                      itemsize=itemsize))
                b = a.view(newdt)
                return b
            
            
            def remove_fields(a, names):
                """
                `a` must be a numpy structured array.
                `names` is the collection of field names to remove.
            
                Returns a view of the array `a` (not a copy).
                """
                dt = a.dtype
                keep_names = [name for name in dt.names if name not in names]
                return view_fields(a, keep_names)
            

            For example,

            In [297]: a
            Out[297]: 
            array([(10.0, 13.5, 1248, -2), (20.0, 0.0, 0, 0), (30.0, 0.0, 0, 0),
                   (40.0, 0.0, 0, 0), (50.0, 0.0, 0, 999)], 
                  dtype=[('x', '

            Verify that b is a view (not a copy) of a by changing b[0]['x']...

            In [300]: b[0]['x'] = 3.14
            

            and seeing that a is also changed:

            In [301]: a[0]
            Out[301]: (3.14, 13.5, 1248, -2)
            
            qid & accept id: (37083117, 37105921) query: How to Change selection field automatically in odoo soup:

            Add an onchange attribute to the casier_judiciare field and then pass all the other fields you want to check as arguments to the method like this

            \n
            \n     \n     \n     \n     \n     \n     \n\n
            \n

            In your model file define the method like this and use an if statement to check if they're all True (That means they have all been checked), if so then you can return a dictionary with any value you want for the selection field, in this case etat_dos will change to Dossier Complet

            \n
            def onchange_casier_judiciare(self, cr, uid, ids, casier_judiciare, certificat_qual, extrait_role, reference_pro, context=None):\n    if casier_judiciare and certificat_qual and extrait_role and reference_pro: # if they're all True (that means they're all checked):\n        values = {'value': {'etat_dos': 'complet'}} #set the value of etat_dos field\n\n        return values\n
            \n

            Note that the onchange is only triggered on the casier_judiciare field but you can also set onchange on other fields and it should work just fine

            \n soup wrap:

            Add an onchange attribute to the casier_judiciare field and then pass all the other fields you want to check as arguments to the method like this

            
                 
                 
                 
                 
                 
                 
            
            

            In your model file define the method like this and use an if statement to check if they're all True (That means they have all been checked), if so then you can return a dictionary with any value you want for the selection field, in this case etat_dos will change to Dossier Complet

            def onchange_casier_judiciare(self, cr, uid, ids, casier_judiciare, certificat_qual, extrait_role, reference_pro, context=None):
                if casier_judiciare and certificat_qual and extrait_role and reference_pro: # if they're all True (that means they're all checked):
                    values = {'value': {'etat_dos': 'complet'}} #set the value of etat_dos field
            
                    return values
            

            Note that the onchange is only triggered on the casier_judiciare field but you can also set onchange on other fields and it should work just fine

            qid & accept id: (37088428, 37088655) query: Python Matplotlib: plotting feet and inches soup:

            You could use ticker.FuncFormatter to create a custom tick label:

            \n
            import numpy as np\nimport matplotlib.pyplot as plt\nimport matplotlib.ticker as ticker\n\nx = np.linspace(0, 1, 100)\ny = (np.random.random(100) - 0.5).cumsum()\n\nfig, ax = plt.subplots()\nax.plot(x, y)\n\ndef imperial(x, pos):\n    ft, inches = divmod(round(x*12), 12)\n    ft, inches = map(int, [ft, inches])\n    return ('{} ft'.format(ft) if not inches \n            else '{} {} in'.format(ft, inches) if ft\n            else '{} in'.format(inches))\n\nax.yaxis.set_major_formatter(ticker.FuncFormatter(imperial))\n\nplt.show()\n
            \n

            enter image description here

            \n
            \n

            To also control the location of the ticks, you could use a ticker.MultipleLocator. \nFor example, to place a tick mark every 4 inches, add

            \n
            loc = ticker.MultipleLocator(4./12)\nax.yaxis.set_major_locator(loc)\n
            \n

            to the code above.\nenter image description here

            \n soup wrap:

            You could use ticker.FuncFormatter to create a custom tick label:

            import numpy as np
            import matplotlib.pyplot as plt
            import matplotlib.ticker as ticker
            
            x = np.linspace(0, 1, 100)
            y = (np.random.random(100) - 0.5).cumsum()
            
            fig, ax = plt.subplots()
            ax.plot(x, y)
            
            def imperial(x, pos):
                ft, inches = divmod(round(x*12), 12)
                ft, inches = map(int, [ft, inches])
                return ('{} ft'.format(ft) if not inches 
                        else '{} {} in'.format(ft, inches) if ft
                        else '{} in'.format(inches))
            
            ax.yaxis.set_major_formatter(ticker.FuncFormatter(imperial))
            
            plt.show()
            

            enter image description here


            To also control the location of the ticks, you could use a ticker.MultipleLocator. For example, to place a tick mark every 4 inches, add

            loc = ticker.MultipleLocator(4./12)
            ax.yaxis.set_major_locator(loc)
            

            to the code above. enter image description here

            qid & accept id: (37091273, 37093003) query: Changing color TabbedPannelHeader in Kivy soup:

            I'm on windows, so default background is black afaik, but maybe on other OSes there is different one.

            \n

            In your main.py:

            \n
            from functools import partial\nimport rotinas\nWindow.clearcolor = (1, 1, 1, 1) <-----\nimport sqlite3 as sql\nfrom datetime import *\n
            \n

            That's your white color, if you use alpha==0 in your coloring. Remove that line and you'll have the default Kivy background i.e. black.

            \n

            Next thing, Label itself has no image as a background, therefore changing its background color either with variable or with canvas instrucctions results in a clear color. In widgets such as Spinner or basically anything that has different color than transparent most probably uses an image from atlas as a background(setting stuff from canvas is less efficient than changing pngs - at least more lines of code).

            \n

            When you use a widget that has an image as a background, changing the backgroung color only tints the image that's used as background i.e. the atlas one. There is your problem, because you maybe want to have a clear color or use the TabbedPanelHeader-blue(tinted one). Two examples:

            \n

            Here you have the tinted blue(the background_normal isn't necessary, it's set like that by default by kivy)

            \n
            from kivy.lang import Builder\nfrom kivy.base import runTouchApp\nfrom kivy.uix.boxlayout import BoxLayout\nBuilder.load_string('''\n:\n    TabbedPanelHeader\n        color: (0,0,1,1)\n        text:'blaaaaaaa'\n        background_color: (0, 0, 1, 1)\n        background_normal: 'atlas://data/images/defaulttheme/tab_btn'\n''')\nclass Test(BoxLayout):pass\nrunTouchApp(Test())\n
            \n

            Here you have the standard, clear color(see the empty background_normal):

            \n
            from kivy.lang import Builder\nfrom kivy.base import runTouchApp\nfrom kivy.uix.boxlayout import BoxLayout\nBuilder.load_string('''\n:\n    TabbedPanelHeader\n        color: (0,0,1,1)\n        text:'blaaaaaaa'\n        background_color: (0, 0, 1, 1)\n        background_normal: ''\n''')\nclass Test(BoxLayout):pass\nrunTouchApp(Test())\n
            \n

            PS: Use pep8 or install one yourself(pip install pep8) and make your code more readable. You will have a lot of problems debugging that after a year, trust me. It may work well, but you killed whole point of python readability.

            \n

            Also I saw some .db files in your zip, but didn't open them. The thing that you posted your database to someone you surely don't have a clue what will do with it is bad. Worse is even that you posted it publicly. Let's say you have in that database personal data, bank account numbers or whatever - you don't want to be responsible for loosing or misusing them, do you?

            \n soup wrap:

            I'm on windows, so default background is black afaik, but maybe on other OSes there is different one.

            In your main.py:

            from functools import partial
            import rotinas
            Window.clearcolor = (1, 1, 1, 1) <-----
            import sqlite3 as sql
            from datetime import *
            

            That's your white color, if you use alpha==0 in your coloring. Remove that line and you'll have the default Kivy background i.e. black.

            Next thing, Label itself has no image as a background, therefore changing its background color either with variable or with canvas instrucctions results in a clear color. In widgets such as Spinner or basically anything that has different color than transparent most probably uses an image from atlas as a background(setting stuff from canvas is less efficient than changing pngs - at least more lines of code).

            When you use a widget that has an image as a background, changing the backgroung color only tints the image that's used as background i.e. the atlas one. There is your problem, because you maybe want to have a clear color or use the TabbedPanelHeader-blue(tinted one). Two examples:

            Here you have the tinted blue(the background_normal isn't necessary, it's set like that by default by kivy)

            from kivy.lang import Builder
            from kivy.base import runTouchApp
            from kivy.uix.boxlayout import BoxLayout
            Builder.load_string('''
            :
                TabbedPanelHeader
                    color: (0,0,1,1)
                    text:'blaaaaaaa'
                    background_color: (0, 0, 1, 1)
                    background_normal: 'atlas://data/images/defaulttheme/tab_btn'
            ''')
            class Test(BoxLayout):pass
            runTouchApp(Test())
            

            Here you have the standard, clear color(see the empty background_normal):

            from kivy.lang import Builder
            from kivy.base import runTouchApp
            from kivy.uix.boxlayout import BoxLayout
            Builder.load_string('''
            :
                TabbedPanelHeader
                    color: (0,0,1,1)
                    text:'blaaaaaaa'
                    background_color: (0, 0, 1, 1)
                    background_normal: ''
            ''')
            class Test(BoxLayout):pass
            runTouchApp(Test())
            

            PS: Use pep8 or install one yourself(pip install pep8) and make your code more readable. You will have a lot of problems debugging that after a year, trust me. It may work well, but you killed whole point of python readability.

            Also I saw some .db files in your zip, but didn't open them. The thing that you posted your database to someone you surely don't have a clue what will do with it is bad. Worse is even that you posted it publicly. Let's say you have in that database personal data, bank account numbers or whatever - you don't want to be responsible for loosing or misusing them, do you?

            qid & accept id: (37106934, 37107047) query: How to create a random multidimensional array from existing variables soup:

            You could use randrange to generate row and column index and add enemies there. If there's already a enemy or some other object in given cell just skip it and randomize a new coordinate:

            \n
            import random\n\ndef add(grid, char, count):\n    while count:\n        row = random.randrange(len(grid))\n        column = random.randrange(len(grid[0]))\n        if world[row][column] == 'g':\n            world[row][column] = char\n            count -= 1\n
            \n

            Usage:

            \n
            world = [['g'] * 60 for _ in xrange(60)]\nadd(world, 'e', 25)\nadd(world, 't', 5)\n
            \n

            This approach makes only sense if your world is sparse, i.e. most of the world is grass. If world is going to be filled with different objects then tracking the free space and randomly selecting a tile from there would be better approach.

            \n soup wrap:

            You could use randrange to generate row and column index and add enemies there. If there's already a enemy or some other object in given cell just skip it and randomize a new coordinate:

            import random
            
            def add(grid, char, count):
                while count:
                    row = random.randrange(len(grid))
                    column = random.randrange(len(grid[0]))
                    if world[row][column] == 'g':
                        world[row][column] = char
                        count -= 1
            

            Usage:

            world = [['g'] * 60 for _ in xrange(60)]
            add(world, 'e', 25)
            add(world, 't', 5)
            

            This approach makes only sense if your world is sparse, i.e. most of the world is grass. If world is going to be filled with different objects then tracking the free space and randomly selecting a tile from there would be better approach.

            qid & accept id: (37113173, 37113753) query: Compare 2 excel files using Python soup:

            The following approach should get you started:

            \n
            from itertools import izip_longest\nimport xlrd\n\nrb1 = xlrd.open_workbook('file1.xlsx')\nrb2 = xlrd.open_workbook('file2.xlsx')\n\nsheet1 = rb1.sheet_by_index(0)\nsheet2 = rb2.sheet_by_index(0)\n\nfor rownum in range(max(sheet1.nrows, sheet2.nrows)):\n    if rownum < sheet1.nrows:\n        row_rb1 = sheet1.row_values(rownum)\n        row_rb2 = sheet2.row_values(rownum)\n\n        for colnum, (c1, c2) in enumerate(izip_longest(row_rb1, row_rb2)):\n            if c1 != c2:\n                print "Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2)\n    else:\n        print "Row {} missing".format(rownum+1)\n
            \n

            This will display any cells which are different between the two files. For your given two files, this will display:

            \n
            Row 3 Col 2 - 0.235435 != 0.23546\n
            \n soup wrap:

            The following approach should get you started:

            from itertools import izip_longest
            import xlrd
            
            rb1 = xlrd.open_workbook('file1.xlsx')
            rb2 = xlrd.open_workbook('file2.xlsx')
            
            sheet1 = rb1.sheet_by_index(0)
            sheet2 = rb2.sheet_by_index(0)
            
            for rownum in range(max(sheet1.nrows, sheet2.nrows)):
                if rownum < sheet1.nrows:
                    row_rb1 = sheet1.row_values(rownum)
                    row_rb2 = sheet2.row_values(rownum)
            
                    for colnum, (c1, c2) in enumerate(izip_longest(row_rb1, row_rb2)):
                        if c1 != c2:
                            print "Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2)
                else:
                    print "Row {} missing".format(rownum+1)
            

            This will display any cells which are different between the two files. For your given two files, this will display:

            Row 3 Col 2 - 0.235435 != 0.23546
            
            qid & accept id: (37116967, 37132292) query: Mongoengine filter query on list embedded field based on last index soup:

            I'm not sure MongoEngine can do that (yet). AFAIK, you'd need to use the aggregation pipeline.

            \n

            In the Mongo shell, using the '$slice' and the $arrayElemAt operators:

            \n
            db.order.aggregate([{ $project: {last_status: { $arrayElemAt: [{ $slice: [ "$status", -1 ] }, 0 ]} }}, {$match: {'last_status.status_code':"scode"}} ])\n
            \n

            And in Python:

            \n
            pipeline = [\n    {'$project': {'last_status': { '$arrayElemAt': [{ '$slice': [ "$status", -1 ] }, 0 ]} }},\n    {'$match': {'last_status.status_code':'scode'}}\n]\n\nagg_cursor = Order.objects.aggregate(*pipeline)\n\nresult = [ Order.objects.get(id=order['_id']) for order in agg_cursor ]\n
            \n

            The trick here is that objects.aggregate provides a PyMongo cursor, not a MongoEngine cursor, so if you need MongoEngine objects, you can proceed in two steps: first filter using the aggregation framework to get the ids of matched items, then get them through a MongoEngine query.

            \n

            This is what I do. From my tests, it had proven to be much more efficient than fetching everything and filtering in the python code.

            \n

            If there is a simpler way, I'm interested to ear about it. Otherwise, this could be a feature request for MongoEngine. You may want to open an issue there.

            \n soup wrap:

            I'm not sure MongoEngine can do that (yet). AFAIK, you'd need to use the aggregation pipeline.

            In the Mongo shell, using the '$slice' and the $arrayElemAt operators:

            db.order.aggregate([{ $project: {last_status: { $arrayElemAt: [{ $slice: [ "$status", -1 ] }, 0 ]} }}, {$match: {'last_status.status_code':"scode"}} ])
            

            And in Python:

            pipeline = [
                {'$project': {'last_status': { '$arrayElemAt': [{ '$slice': [ "$status", -1 ] }, 0 ]} }},
                {'$match': {'last_status.status_code':'scode'}}
            ]
            
            agg_cursor = Order.objects.aggregate(*pipeline)
            
            result = [ Order.objects.get(id=order['_id']) for order in agg_cursor ]
            

            The trick here is that objects.aggregate provides a PyMongo cursor, not a MongoEngine cursor, so if you need MongoEngine objects, you can proceed in two steps: first filter using the aggregation framework to get the ids of matched items, then get them through a MongoEngine query.

            This is what I do. From my tests, it had proven to be much more efficient than fetching everything and filtering in the python code.

            If there is a simpler way, I'm interested to ear about it. Otherwise, this could be a feature request for MongoEngine. You may want to open an issue there.

            qid & accept id: (37119071, 37121993) query: Scipy rotate and zoom an image without changing its dimensions soup:

            scipy.ndimage.rotate accepts a reshape= parameter:

            \n
            \n

            reshape : bool, optional

            \n

            If reshape is true, the output shape is adapted so that the input\n array is contained completely in the output. Default is True.

            \n
            \n

            So to "clip" the edges you can simply call scipy.ndimage.rotate(img, ..., reshape=False).

            \n
            from scipy.ndimage import rotate\nfrom scipy.misc import face\nfrom matplotlib import pyplot as plt\n\nimg = face()\nrot = rotate(img, 30, reshape=False)\n\nfig, ax = plt.subplots(1, 2)\nax[0].imshow(img)\nax[1].imshow(rot)\n
            \n

            enter image description here

            \n

            Things are more complicated for scipy.ndimage.zoom.

            \n

            A naive method would be to zoom the entire input array, then use slice indexing and/or zero-padding to make the output the same size as your input. However, in cases where you're increasing the size of the image it's wasteful to interpolate pixels that are only going to get clipped off at the edges anyway.

            \n

            Instead you could index only the part of the input that will fall within the bounds of the output array before you apply zoom:

            \n
            import numpy as np\nfrom scipy.ndimage import zoom\n\n\ndef clipped_zoom(img, zoom_factor, **kwargs):\n\n    h, w = img.shape[:2]\n\n    # width and height of the zoomed image\n    zh = int(np.round(zoom_factor * h))\n    zw = int(np.round(zoom_factor * w))\n\n    # for multichannel images we don't want to apply the zoom factor to the RGB\n    # dimension, so instead we create a tuple of zoom factors, one per array\n    # dimension, with 1's for any trailing dimensions after the width and height.\n    zoom_tuple = (zoom_factor,) * 2 + (1,) * (img.ndim - 2)\n\n    # zooming out\n    if zoom_factor < 1:\n        # bounding box of the clip region within the output array\n        top = (h - zh) // 2\n        left = (w - zw) // 2\n        # zero-padding\n        out = np.zeros_like(img)\n        out[top:top+zh, left:left+zw] = zoom(img, zoom_tuple, **kwargs)\n\n    # zooming in\n    elif zoom_factor > 1:\n        # bounding box of the clip region within the input array\n        top = (zh - h) // 2\n        left = (zw - w) // 2\n        out = zoom(img[top:top+zh, left:left+zw], zoom_tuple, **kwargs)\n        # `out` might still be slightly larger than `img` due to rounding, so\n        # trim off any extra pixels at the edges\n        trim_top = ((out.shape[0] - h) // 2)\n        trim_left = ((out.shape[1] - w) // 2)\n        out = out[trim_top:trim_top+h, trim_left:trim_left+w]\n\n    # if zoom_factor == 1, just return the input array\n    else:\n        out = img\n    return out\n
            \n

            For example:

            \n
            zm1 = clipped_zoom(img, 0.5)\nzm2 = clipped_zoom(img, 1.5)\n\nfig, ax = plt.subplots(1, 3)\nax[0].imshow(img)\nax[1].imshow(zm1)\nax[2].imshow(zm2)\n
            \n

            enter image description here

            \n soup wrap:

            scipy.ndimage.rotate accepts a reshape= parameter:

            reshape : bool, optional

            If reshape is true, the output shape is adapted so that the input array is contained completely in the output. Default is True.

            So to "clip" the edges you can simply call scipy.ndimage.rotate(img, ..., reshape=False).

            from scipy.ndimage import rotate
            from scipy.misc import face
            from matplotlib import pyplot as plt
            
            img = face()
            rot = rotate(img, 30, reshape=False)
            
            fig, ax = plt.subplots(1, 2)
            ax[0].imshow(img)
            ax[1].imshow(rot)
            

            enter image description here

            Things are more complicated for scipy.ndimage.zoom.

            A naive method would be to zoom the entire input array, then use slice indexing and/or zero-padding to make the output the same size as your input. However, in cases where you're increasing the size of the image it's wasteful to interpolate pixels that are only going to get clipped off at the edges anyway.

            Instead you could index only the part of the input that will fall within the bounds of the output array before you apply zoom:

            import numpy as np
            from scipy.ndimage import zoom
            
            
            def clipped_zoom(img, zoom_factor, **kwargs):
            
                h, w = img.shape[:2]
            
                # width and height of the zoomed image
                zh = int(np.round(zoom_factor * h))
                zw = int(np.round(zoom_factor * w))
            
                # for multichannel images we don't want to apply the zoom factor to the RGB
                # dimension, so instead we create a tuple of zoom factors, one per array
                # dimension, with 1's for any trailing dimensions after the width and height.
                zoom_tuple = (zoom_factor,) * 2 + (1,) * (img.ndim - 2)
            
                # zooming out
                if zoom_factor < 1:
                    # bounding box of the clip region within the output array
                    top = (h - zh) // 2
                    left = (w - zw) // 2
                    # zero-padding
                    out = np.zeros_like(img)
                    out[top:top+zh, left:left+zw] = zoom(img, zoom_tuple, **kwargs)
            
                # zooming in
                elif zoom_factor > 1:
                    # bounding box of the clip region within the input array
                    top = (zh - h) // 2
                    left = (zw - w) // 2
                    out = zoom(img[top:top+zh, left:left+zw], zoom_tuple, **kwargs)
                    # `out` might still be slightly larger than `img` due to rounding, so
                    # trim off any extra pixels at the edges
                    trim_top = ((out.shape[0] - h) // 2)
                    trim_left = ((out.shape[1] - w) // 2)
                    out = out[trim_top:trim_top+h, trim_left:trim_left+w]
            
                # if zoom_factor == 1, just return the input array
                else:
                    out = img
                return out
            

            For example:

            zm1 = clipped_zoom(img, 0.5)
            zm2 = clipped_zoom(img, 1.5)
            
            fig, ax = plt.subplots(1, 3)
            ax[0].imshow(img)
            ax[1].imshow(zm1)
            ax[2].imshow(zm2)
            

            enter image description here

            qid & accept id: (37119314, 37119960) query: How do I generate a sequence of integer numbers in a uniform distribution? soup:

            you can use a generator:

            \n
            from random import randint\n\ndef getNum1To4(runs):\n    occurences = {n+1:0 for n in range(4)}\n    for i in range(runs):\n        options = [n for n in occurences if occurences[n] < runs / 4]\n        choice = options[randint(0, len(options) - 1)]\n        occurences[choice] += 1\n        yield choice\n
            \n

            outputs:

            \n
            >>> runs = 8\n>>> gen = getNum1To4(8)\n>>> for n in range(runs): print gen.next()\n2\n1\n3\n1\n3\n4\n4\n2\n
            \n soup wrap:

            you can use a generator:

            from random import randint
            
            def getNum1To4(runs):
                occurences = {n+1:0 for n in range(4)}
                for i in range(runs):
                    options = [n for n in occurences if occurences[n] < runs / 4]
                    choice = options[randint(0, len(options) - 1)]
                    occurences[choice] += 1
                    yield choice
            

            outputs:

            >>> runs = 8
            >>> gen = getNum1To4(8)
            >>> for n in range(runs): print gen.next()
            2
            1
            3
            1
            3
            4
            4
            2
            
            qid & accept id: (37122210, 37122386) query: django object get two fields into a list from a model soup:

            You'll need two models, and a foreign key between them, e.g.:

            \n
            from django.contrib.auth.models import User\n\nclass PagerDutyPolicy(models.Model):\n    # the model automatically gets an id field\n    policy_name = models.CharField(max_length=200)  \n\nclass PagerDuty(models.Model):\n    # I'm assuming you wanted these to be related to users who can log in..\n    user = models.ForeignKey(User)   \n    mobile = models.CharField(max_length=200) \n    policy = models.ForeignKey(PagerDutyPolicy)\n
            \n

            To get all policies:

            \n
            PagerDutyPolicy.objects.all()\n
            \n

            To create a new PagerDuty object for bob, butting him in Team 1:

            \n
            PagerDuty.objects.create(\n    user=User.objects.get(username='bob'),  # or create a new user\n    mobile='...',\n    # policy=PagerDutyPolicy.objects.get(policy_name='Team 1')  # or..\n    policy=PagerDutyPolicy.objects.get(id=232)\n)\n
            \n

            if you're going to look up policies by policy_name that field should also have a db_index=True in the model definition.

            \n soup wrap:

            You'll need two models, and a foreign key between them, e.g.:

            from django.contrib.auth.models import User
            
            class PagerDutyPolicy(models.Model):
                # the model automatically gets an id field
                policy_name = models.CharField(max_length=200)  
            
            class PagerDuty(models.Model):
                # I'm assuming you wanted these to be related to users who can log in..
                user = models.ForeignKey(User)   
                mobile = models.CharField(max_length=200) 
                policy = models.ForeignKey(PagerDutyPolicy)
            

            To get all policies:

            PagerDutyPolicy.objects.all()
            

            To create a new PagerDuty object for bob, butting him in Team 1:

            PagerDuty.objects.create(
                user=User.objects.get(username='bob'),  # or create a new user
                mobile='...',
                # policy=PagerDutyPolicy.objects.get(policy_name='Team 1')  # or..
                policy=PagerDutyPolicy.objects.get(id=232)
            )
            

            if you're going to look up policies by policy_name that field should also have a db_index=True in the model definition.

            qid & accept id: (37128072, 37128279) query: How to find number of matches in the array or dictionary? soup:

            List elements:

            \n
            my_list = [1,4,7,4,5,7,1,3]\nprint my_list.count(4)\n
            \n

            Dictionary values using generator expression:

            \n
            my_dict = {0: 1, 2: 1, 4: 5, 6: 3, 8: 4, 10: 4, 12: 1}\nprint sum(1 for x in my_dict.values() if x == 4)\n
            \n

            As pointed out by zondo, the last line can be more simply written as:

            \n
            print sum(x == 4 for x in my_dict.values())\n
            \n

            due to the fact that True == 1.

            \n soup wrap:

            List elements:

            my_list = [1,4,7,4,5,7,1,3]
            print my_list.count(4)
            

            Dictionary values using generator expression:

            my_dict = {0: 1, 2: 1, 4: 5, 6: 3, 8: 4, 10: 4, 12: 1}
            print sum(1 for x in my_dict.values() if x == 4)
            

            As pointed out by zondo, the last line can be more simply written as:

            print sum(x == 4 for x in my_dict.values())
            

            due to the fact that True == 1.

            qid & accept id: (37136697, 37159774) query: Partial symbolic derivative in Python soup:

            So I've managed to solve my problem on my own. The main question was how to symbolically derive a function or equation with another function. As I've gone again slowly over the sympy documentation, I saw a little detail, that I've missed before.\nIn order to derive a function with a function you need to change the settings of the function, that will be used to derive. For example:

            \n
            x, y, z = symbols('x, y, z')\nA = x*y*z\nB = x*y\n\n# This is the detail:\ntype(B)._diff_wrt = True\ndiff(A, B)\n
            \n

            Or in my case, the code looks like:

            \n
            koef = [logt_a, a_0, T_a*a_0, a_1, T_a*a_1, a_2, T_a*a_2]\nM = expand(A)\nK = zeros(len(koef), len(koef))\ndef odvod_mat(par):\n    for j in range(len(par)):\n        for i in range(len(par)):\n            type(par[i])._diff_wrt = True\n            P = diff(M, par[i])/2\n            B = P.coeff(par[j])\n            K[i,j] = B\n\n            #Removal of T_a\n            K[i,j] = K[i,j].subs(T_a, 0)\n    return K  \nodvod_mat(koef)\n
            \n

            Thanks again to all that were taking their time to read this. I hope this helps to anyone, who will have the same problem as I did.

            \n soup wrap:

            So I've managed to solve my problem on my own. The main question was how to symbolically derive a function or equation with another function. As I've gone again slowly over the sympy documentation, I saw a little detail, that I've missed before. In order to derive a function with a function you need to change the settings of the function, that will be used to derive. For example:

            x, y, z = symbols('x, y, z')
            A = x*y*z
            B = x*y
            
            # This is the detail:
            type(B)._diff_wrt = True
            diff(A, B)
            

            Or in my case, the code looks like:

            koef = [logt_a, a_0, T_a*a_0, a_1, T_a*a_1, a_2, T_a*a_2]
            M = expand(A)
            K = zeros(len(koef), len(koef))
            def odvod_mat(par):
                for j in range(len(par)):
                    for i in range(len(par)):
                        type(par[i])._diff_wrt = True
                        P = diff(M, par[i])/2
                        B = P.coeff(par[j])
                        K[i,j] = B
            
                        #Removal of T_a
                        K[i,j] = K[i,j].subs(T_a, 0)
                return K  
            odvod_mat(koef)
            

            Thanks again to all that were taking their time to read this. I hope this helps to anyone, who will have the same problem as I did.

            qid & accept id: (37154201, 37154316) query: Get the count of the each date entry from onr of the raw from CSV file soup:

            you can use itertools.groupby:

            \n
            with open("your_file.csv") as f:\n    for x,y in itertools.groupby(sorted(map(str.split, f.read().strip().split("\n"))), key = lambda x:x[0]):\n        print x,len(list(y))\n
            \n

            output

            \n
            4/14/2016 2\n6/14/2016 1\n
            \n

            Another way: if csv contains empty lines

            \n
            with open("your_file.csv") as f:\n    my_list = []\n    for line in f:\n        if line:\n            my_list.append(line.strip().split())\n    for x,y in itertools.groupby(sorted(my_list, key=lambda x:x[0]), key=lambda x:x[0]):\n        print x, len(list(y))\n
            \n soup wrap:

            you can use itertools.groupby:

            with open("your_file.csv") as f:
                for x,y in itertools.groupby(sorted(map(str.split, f.read().strip().split("\n"))), key = lambda x:x[0]):
                    print x,len(list(y))
            

            output

            4/14/2016 2
            6/14/2016 1
            

            Another way: if csv contains empty lines

            with open("your_file.csv") as f:
                my_list = []
                for line in f:
                    if line:
                        my_list.append(line.strip().split())
                for x,y in itertools.groupby(sorted(my_list, key=lambda x:x[0]), key=lambda x:x[0]):
                    print x, len(list(y))
            
            qid & accept id: (37177688, 37177859) query: Subsetting 2D array based on condition in numpy python soup:

            You can use np.where to preserve the shape:

            \n
            np.where(arr_b > 0.0, arr_a, np.nan)\n
            \n

            It will take the corresponding values from arr_a when arr_b's value is greater than 0, otherwise it will use np.nan.

            \n
            import numpy as np\nN = 5\narr_a = np.random.randn(N,N)\narr_b = np.random.randn(N,N)\nnp.where(arr_b > 0.0, arr_a, np.nan)\n\nOut[107]: \narray([[ 0.5743081 ,         nan, -1.69559034,         nan,  0.4987268 ],\n       [ 0.33038264,         nan, -0.27151598,         nan, -0.73145628],\n       [        nan,  0.46741932,  0.61225086,         nan,  1.08327459],\n       [        nan, -1.20244926,  1.5834266 , -0.04675223, -1.14904974],\n       [        nan,  1.20307104, -0.86777899,         nan,         nan]])\n
            \n soup wrap:

            You can use np.where to preserve the shape:

            np.where(arr_b > 0.0, arr_a, np.nan)
            

            It will take the corresponding values from arr_a when arr_b's value is greater than 0, otherwise it will use np.nan.

            import numpy as np
            N = 5
            arr_a = np.random.randn(N,N)
            arr_b = np.random.randn(N,N)
            np.where(arr_b > 0.0, arr_a, np.nan)
            
            Out[107]: 
            array([[ 0.5743081 ,         nan, -1.69559034,         nan,  0.4987268 ],
                   [ 0.33038264,         nan, -0.27151598,         nan, -0.73145628],
                   [        nan,  0.46741932,  0.61225086,         nan,  1.08327459],
                   [        nan, -1.20244926,  1.5834266 , -0.04675223, -1.14904974],
                   [        nan,  1.20307104, -0.86777899,         nan,         nan]])
            
            qid & accept id: (37212307, 37212403) query: HTML data from Beautiful Soup needs formatting soup:

            You can solve it with BeautifulSoup alone, but I'd use pandas and it's pandas.read_html() to parse the HTML table into a convenient dataframe:

            \n
            from StringIO import StringIO\n\nimport pandas as pd\n\ndata = """\n\n        \n            \n            \n            \n            \n            \n            \n        \n            \n                \n                \n                \n                \n                \n                \n            \n        \n            \n            \n            \n            \n            \n            \n        \n    
            ClassFailErrorSkipSuccessTotal
            Regression_TestCase190219229
            Total190219229
            """\n\ndf = pd.read_html(StringIO(data))\nprint(df)\n
            \n

            Prints:

            \n
            [                     0     1      2     3        4      5\n0                Class  Fail  Error  Skip  Success  Total\n1  Regression_TestCase     1      9     0      219    229\n2                Total     1      9     0      219    229]\n
            \n soup wrap:

            You can solve it with BeautifulSoup alone, but I'd use pandas and it's pandas.read_html() to parse the HTML table into a convenient dataframe:

            from StringIO import StringIO
            
            import pandas as pd
            
            data = """
            
            Class Fail Error Skip Success Total
            Regression_TestCase 1 9 0 219 229
            Total 1 9 0 219 229
            """ df = pd.read_html(StringIO(data)) print(df)

            Prints:

            [                     0     1      2     3        4      5
            0                Class  Fail  Error  Skip  Success  Total
            1  Regression_TestCase     1      9     0      219    229
            2                Total     1      9     0      219    229]
            
            qid & accept id: (37219219, 37219291) query: Go through every possible combination of an array python soup:

            itertools.permutations is just what you're looking for:

            \n
            >>> from itertools import permutations\n>>> [i for i in permutations(range(1, 5), 4)]\n[(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (1, 4, 2, 3), (1, 4, 3, 2), (2, 1, 3, 4), (2, 1, 4, 3), (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 1, 3), (2, 4, 3, 1), (3, 1, 2, 4), (3, 1, 4, 2), (3, 2, 1, 4), (3, 2, 4, 1), (3, 4, 1, 2), (3, 4, 2, 1), (4, 1, 2, 3), (4, 1, 3, 2), (4, 2, 1, 3), (4, 2, 3, 1), (4, 3, 1, 2), (4, 3, 2, 1)]\n
            \n

            EDIT:
            \nOr, as @wflynny pointed out, you can save the list comprehension by just calling list's constructor:

            \n
            >>> from itertools import permutations\n>>> list(permutations(range(1, 5), 4))\n[(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (1, 4, 2, 3), (1, 4, 3, 2), (2, 1, 3, 4), (2, 1, 4, 3), (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 1, 3), (2, 4, 3, 1), (3, 1, 2, 4), (3, 1, 4, 2), (3, 2, 1, 4), (3, 2, 4, 1), (3, 4, 1, 2), (3, 4, 2, 1), (4, 1, 2, 3), (4, 1, 3, 2), (4, 2, 1, 3), (4, 2, 3, 1), (4, 3, 1, 2), (4, 3, 2, 1)]\n
            \n soup wrap:

            itertools.permutations is just what you're looking for:

            >>> from itertools import permutations
            >>> [i for i in permutations(range(1, 5), 4)]
            [(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (1, 4, 2, 3), (1, 4, 3, 2), (2, 1, 3, 4), (2, 1, 4, 3), (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 1, 3), (2, 4, 3, 1), (3, 1, 2, 4), (3, 1, 4, 2), (3, 2, 1, 4), (3, 2, 4, 1), (3, 4, 1, 2), (3, 4, 2, 1), (4, 1, 2, 3), (4, 1, 3, 2), (4, 2, 1, 3), (4, 2, 3, 1), (4, 3, 1, 2), (4, 3, 2, 1)]
            

            EDIT:
            Or, as @wflynny pointed out, you can save the list comprehension by just calling list's constructor:

            >>> from itertools import permutations
            >>> list(permutations(range(1, 5), 4))
            [(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (1, 4, 2, 3), (1, 4, 3, 2), (2, 1, 3, 4), (2, 1, 4, 3), (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 1, 3), (2, 4, 3, 1), (3, 1, 2, 4), (3, 1, 4, 2), (3, 2, 1, 4), (3, 2, 4, 1), (3, 4, 1, 2), (3, 4, 2, 1), (4, 1, 2, 3), (4, 1, 3, 2), (4, 2, 1, 3), (4, 2, 3, 1), (4, 3, 1, 2), (4, 3, 2, 1)]
            
            qid & accept id: (37232279, 37232337) query: BeautifulSoup my for loop is printing all the data from the td tag. I would like to exclude the last section of the td tag soup:

            You can find all rows (tr elements) except the first one (to skip the headers) and the last one - the "total" row. Sample implementation that produces a list of dictionaries as a result:

            \n
            from pprint import pprint\n\nfrom bs4 import BeautifulSoup\n\n\ndata = """\n\n    \n        \n        \n        \n        \n        \n        \n    \n        \n            \n            \n            \n            \n            \n            \n        \n    \n        \n        \n        \n        \n        \n        \n    \n
            ClassFailErrorSkipSuccessTotal
            Regression_TestCase.RegressionProject_TestCase2.RegressionProject_TestCase2190219229
            Total190219229
            """\n\nsoup = BeautifulSoup(data, "html.parser")\n\nheaders = [header.get_text(strip=True) for header in soup.find_all("th")]\nrows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")]))\n for row in soup.find_all("tr")[1:-1]]\n\npprint(rows)\n
            \n

            Prints:

            \n
            [{u'Class': u'Regression_TestCase.RegressionProject_TestCase2.RegressionProject_TestCase2',\n  u'Error': u'9',\n  u'Fail': u'1',\n  u'Skip': u'0',\n  u'Success': u'219',\n  u'Total': u'229'}]\n
            \n soup wrap:

            You can find all rows (tr elements) except the first one (to skip the headers) and the last one - the "total" row. Sample implementation that produces a list of dictionaries as a result:

            from pprint import pprint
            
            from bs4 import BeautifulSoup
            
            
            data = """
            
            Class Fail Error Skip Success Total
            Regression_TestCase.RegressionProject_TestCase2.RegressionProject_TestCase2 1 9 0 219 229
            Total 1 9 0 219 229
            """ soup = BeautifulSoup(data, "html.parser") headers = [header.get_text(strip=True) for header in soup.find_all("th")] rows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")])) for row in soup.find_all("tr")[1:-1]] pprint(rows)

            Prints:

            [{u'Class': u'Regression_TestCase.RegressionProject_TestCase2.RegressionProject_TestCase2',
              u'Error': u'9',
              u'Fail': u'1',
              u'Skip': u'0',
              u'Success': u'219',
              u'Total': u'229'}]
            
            qid & accept id: (37246418, 37246609) query: How to avoid getting imaginary/complex number python soup:

            If I understand your question you either want

            \n
            abs(z)\n
            \n

            or

            \n
            z.real\n
            \n soup wrap:

            If I understand your question you either want

            abs(z)
            

            or

            z.real
            
            qid & accept id: (37256540, 37256579) query: Applying sqrt function on a column soup:

            Just use numpy.sqrt() (see docs) on the resulting pd.Series:

            \n
            import numpy as np\nnp.sqrt(football[['wins', 'losses']].sum(axis=1))\n
            \n

            But there are of course several ways to accomplish the same result - see below for illustration:

            \n
            df = pd.DataFrame.from_dict(data={'col_1': np.random.randint(low=1, high=10, size=10), 'col_2': np.random.randint(low=1, high=10, size=10)}, orient='index').T\n\ndf['sum'] = df[['col_1', 'col_2']].sum(axis=1)\ndf['np'] = np.sqrt(df[['col_1', 'col_2']].sum(axis=1))\ndf['apply'] = df[['col_1', 'col_2']].sum(axis=1).apply(np.sqrt)\ndf['**'] = df[['col_1', 'col_2']].sum(axis=1) ** .5\n\n   col_1  col_2  sum        np     apply        **\n0      8      3   11  3.316625  3.316625  3.316625\n1      4      1    5  2.236068  2.236068  2.236068\n2      6      2    8  2.828427  2.828427  2.828427\n3      4      1    5  2.236068  2.236068  2.236068\n4      4      7   11  3.316625  3.316625  3.316625\n5      7      4   11  3.316625  3.316625  3.316625\n6      5      5   10  3.162278  3.162278  3.162278\n7      1      2    3  1.732051  1.732051  1.732051\n8      6      6   12  3.464102  3.464102  3.464102\n9      5      7   12  3.464102  3.464102  3.464102\n
            \n soup wrap:

            Just use numpy.sqrt() (see docs) on the resulting pd.Series:

            import numpy as np
            np.sqrt(football[['wins', 'losses']].sum(axis=1))
            

            But there are of course several ways to accomplish the same result - see below for illustration:

            df = pd.DataFrame.from_dict(data={'col_1': np.random.randint(low=1, high=10, size=10), 'col_2': np.random.randint(low=1, high=10, size=10)}, orient='index').T
            
            df['sum'] = df[['col_1', 'col_2']].sum(axis=1)
            df['np'] = np.sqrt(df[['col_1', 'col_2']].sum(axis=1))
            df['apply'] = df[['col_1', 'col_2']].sum(axis=1).apply(np.sqrt)
            df['**'] = df[['col_1', 'col_2']].sum(axis=1) ** .5
            
               col_1  col_2  sum        np     apply        **
            0      8      3   11  3.316625  3.316625  3.316625
            1      4      1    5  2.236068  2.236068  2.236068
            2      6      2    8  2.828427  2.828427  2.828427
            3      4      1    5  2.236068  2.236068  2.236068
            4      4      7   11  3.316625  3.316625  3.316625
            5      7      4   11  3.316625  3.316625  3.316625
            6      5      5   10  3.162278  3.162278  3.162278
            7      1      2    3  1.732051  1.732051  1.732051
            8      6      6   12  3.464102  3.464102  3.464102
            9      5      7   12  3.464102  3.464102  3.464102
            
            qid & accept id: (37258152, 37259151) query: More efficient way to make unicode escape codes soup:

            If you open your output file as 'wb', then it accepts a byte stream rather than unicode arguments:

            \n
            s = 'слово'\nwith open('data.txt','wb') as f:\n    f.write(s.encode('unicode_escape'))\n    f.write(b'\n')  # add a line feed\n
            \n

            This seems to do what you want:

            \n
            $ cat data.txt\n\u0441\u043b\u043e\u0432\u043e\n
            \n

            and it avoids both the decode as well as any translation that happens when writing unicode to a text stream.

            \n
            \n

            Updated to use encode('unicode_escape') as per the suggestion of @J.F.Sebastian.

            \n

            %timeit reports that it is quite a bit faster than encode('ascii', 'backslashreplace'):

            \n
            In [18]: f = open('data.txt', 'wb')\n\nIn [19]: %timeit f.write(s.encode('unicode_escape'))\nThe slowest run took 224.43 times longer than the fastest. This could mean that an intermediate result is being cached.\n100000 loops, best of 3: 1.55 µs per loop\n\nIn [20]: %timeit f.write(s.encode('ascii','backslashreplace'))\nThe slowest run took 9.13 times longer than the fastest. This could mean that an intermediate result is being cached.\n100000 loops, best of 3: 2.37 µs per loop\n\nIn [21]: f.close()\n
            \n

            Curiously, the lag from timeit for encode('unicode_escape') is a lot longer than that from encode('ascii', 'backslashreplace') even though the per loop time is faster, so be sure to test both in your environment.

            \n soup wrap:

            If you open your output file as 'wb', then it accepts a byte stream rather than unicode arguments:

            s = 'слово'
            with open('data.txt','wb') as f:
                f.write(s.encode('unicode_escape'))
                f.write(b'\n')  # add a line feed
            

            This seems to do what you want:

            $ cat data.txt
            \u0441\u043b\u043e\u0432\u043e
            

            and it avoids both the decode as well as any translation that happens when writing unicode to a text stream.


            Updated to use encode('unicode_escape') as per the suggestion of @J.F.Sebastian.

            %timeit reports that it is quite a bit faster than encode('ascii', 'backslashreplace'):

            In [18]: f = open('data.txt', 'wb')
            
            In [19]: %timeit f.write(s.encode('unicode_escape'))
            The slowest run took 224.43 times longer than the fastest. This could mean that an intermediate result is being cached.
            100000 loops, best of 3: 1.55 µs per loop
            
            In [20]: %timeit f.write(s.encode('ascii','backslashreplace'))
            The slowest run took 9.13 times longer than the fastest. This could mean that an intermediate result is being cached.
            100000 loops, best of 3: 2.37 µs per loop
            
            In [21]: f.close()
            

            Curiously, the lag from timeit for encode('unicode_escape') is a lot longer than that from encode('ascii', 'backslashreplace') even though the per loop time is faster, so be sure to test both in your environment.

            qid & accept id: (37262062, 37262346) query: PYTHON: How do I create a list of every possible letter mapping using a dictionary that stores every possible letter mapping combination? soup:

            EDIT: I agree with MRule. There would be 51,874,849,202 single-letter mappings in total. Consider the following approach (in python 2.7):

            \n
            import itertools\nfrom collections import OrderedDict\nimport string\nseed = {\n'A' : ['A'],\n'B' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'C' : ['C'],\n'D' : ['D'],\n'E' : ['E'],\n'F' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'G' : ['G', 'W'],\n'H' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'I' : ['I'],\n'J' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'K' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'L' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'M' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'N' : ['N'],\n'O' : ['O'],\n'P' : ['P'],\n'Q' : ['Q'],\n'R' : ['R'],\n'S' : ['S'],\n'T' : ['T'],\n'U' : ['U'],\n'V' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'W' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],\n'X' : ['X'],\n'Y' : ['Y'],\n'Z' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'] \n}\nd = OrderedDict(sorted(seed.items(), key=lambda t: t[0]))\nlistOfList = d.values()\nfor i in itertools.product(* listOfList):\n    # print the possible dict\n    print dict(zip(string.ascii_uppercase, i))\n
            \n

            UPDATE: To only calculate the possible dictionaries where every mapped letter is unique, you could do:

            \n
            import itertools\nimport string\nothers = ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z']\n# this dict is fixed\ndict1 = {k : [k] for k in string.uppercase if k not in others}\n# iterate all possibles in others, then merge two dicts into one\nfor i in itertools.permutations(others):\n    dict2 = dict(zip(others, i))\n    print dict(dict1.items() + dict2.items())\n
            \n soup wrap:

            EDIT: I agree with MRule. There would be 51,874,849,202 single-letter mappings in total. Consider the following approach (in python 2.7):

            import itertools
            from collections import OrderedDict
            import string
            seed = {
            'A' : ['A'],
            'B' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'C' : ['C'],
            'D' : ['D'],
            'E' : ['E'],
            'F' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'G' : ['G', 'W'],
            'H' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'I' : ['I'],
            'J' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'K' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'L' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'M' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'N' : ['N'],
            'O' : ['O'],
            'P' : ['P'],
            'Q' : ['Q'],
            'R' : ['R'],
            'S' : ['S'],
            'T' : ['T'],
            'U' : ['U'],
            'V' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'W' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'],
            'X' : ['X'],
            'Y' : ['Y'],
            'Z' : ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z'] 
            }
            d = OrderedDict(sorted(seed.items(), key=lambda t: t[0]))
            listOfList = d.values()
            for i in itertools.product(* listOfList):
                # print the possible dict
                print dict(zip(string.ascii_uppercase, i))
            

            UPDATE: To only calculate the possible dictionaries where every mapped letter is unique, you could do:

            import itertools
            import string
            others = ['B', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'V', 'W', 'Z']
            # this dict is fixed
            dict1 = {k : [k] for k in string.uppercase if k not in others}
            # iterate all possibles in others, then merge two dicts into one
            for i in itertools.permutations(others):
                dict2 = dict(zip(others, i))
                print dict(dict1.items() + dict2.items())
            
            qid & accept id: (37299064, 37301515) query: calling class with user input soup:

            One good option would be to use a dictionary to reference each instance of football, which would avoid a massive if, elif structure at the end:

            \n
            class football:\n    def __init__(self,qb,num):\n        self.qb = qb\n        self.num = num\n\n    def __str__(self):\n        return self.qb + ", " + self.num\nteams = {\n"Niners" : football("Gabbert", "02" ),\n"Bears" : football("CUTLER, JAY","06"),\n"Bengals" : football ("Dalton, Andy","14"),\n"Bills" : football (" Taylor, Tyrod", "05")} #etc\n#I didn't include the whole dictionary for brevity's sake\n\ndef decor(func):\n    def wrap():\n        print("===============================")\n        func()\n        print("===============================")\n    return wrap\n\ndef print_text():\n    print("Who\s your NFL Quarterback? ")\n\ndecorated = decor(print_text)\ndecorated()\n\nteam = input("Enter your teams name here:").capitalize()\nprint(teams[team])\n
            \n

            You will notice that the teams are now in a dictionary, so that the football instance for each team can be easily accessed by indexing the dictionary using the team name. This is what was done on the last line with the statement print(teams[team]).

            \n

            teams[team] returns the value associated with the key that is stored inside team. If the user enters Chiefs, for example, the string 'Chiefs' would be stored in team. Then when you try to index the dictionary using teams[team], it would access the dictionary entry for 'Chiefs'.

            \n

            But notice that what is returned from teams[team] is a football object. Normally, when you just print an object, it prints something that looks a little like what was going on here, because it is simply printing some raw information about the object. The way to have it return that you want it to is to define a __repr__ or __str__ method in the class (more info about that here). This is exactly what I have done in the football class, so that when an instance of the class in printed, it will print the desired information in the desired format.

            \n
            \n

            Another approach is to do away with the class altogether, and simply have a dictionary whose values are tuples containing the quarterback's name and their number as elements. The code would look a little like this:

            \n
            teams = {\n"Niners" : ("Gabbert", "02" ),\n"Bears" : ("CUTLER, JAY","06"),\n"Bengals" : ("Dalton, Andy","14"),\n"Bills" : (" Taylor, Tyrod", "05")} #etc\n# Again, not including the whole dictionary for brevity's sake\n\ndef decor(func):\n    def wrap():\n        print("===============================")\n        func()\n        print("===============================")\n    return wrap\n\ndef print_text():\n    print("Who\s your NFL Quarterback? ")\n\ndecorated = decor(print_text)\ndecorated()\n\nteam = input("Enter your teams name here:").capitalize()\nprint(teams[team][0], teams[team][1])\n
            \n

            This time teams[team] is a tuple. The last line is printing the first element (the quarterback's name), then the second element (the number).

            \n
            \n

            The second one is cleaner and requires less code, but it really is a matter of personal preference.

            \n

            There is more information about dictionaries in the docs.

            \n

            Also, I shortened the code sample for the sake of brevity, but you can see the full code samples on pastebin.

            \n soup wrap:

            One good option would be to use a dictionary to reference each instance of football, which would avoid a massive if, elif structure at the end:

            class football:
                def __init__(self,qb,num):
                    self.qb = qb
                    self.num = num
            
                def __str__(self):
                    return self.qb + ", " + self.num
            teams = {
            "Niners" : football("Gabbert", "02" ),
            "Bears" : football("CUTLER, JAY","06"),
            "Bengals" : football ("Dalton, Andy","14"),
            "Bills" : football (" Taylor, Tyrod", "05")} #etc
            #I didn't include the whole dictionary for brevity's sake
            
            def decor(func):
                def wrap():
                    print("===============================")
                    func()
                    print("===============================")
                return wrap
            
            def print_text():
                print("Who\s your NFL Quarterback? ")
            
            decorated = decor(print_text)
            decorated()
            
            team = input("Enter your teams name here:").capitalize()
            print(teams[team])
            

            You will notice that the teams are now in a dictionary, so that the football instance for each team can be easily accessed by indexing the dictionary using the team name. This is what was done on the last line with the statement print(teams[team]).

            teams[team] returns the value associated with the key that is stored inside team. If the user enters Chiefs, for example, the string 'Chiefs' would be stored in team. Then when you try to index the dictionary using teams[team], it would access the dictionary entry for 'Chiefs'.

            But notice that what is returned from teams[team] is a football object. Normally, when you just print an object, it prints something that looks a little like what was going on here, because it is simply printing some raw information about the object. The way to have it return that you want it to is to define a __repr__ or __str__ method in the class (more info about that here). This is exactly what I have done in the football class, so that when an instance of the class in printed, it will print the desired information in the desired format.


            Another approach is to do away with the class altogether, and simply have a dictionary whose values are tuples containing the quarterback's name and their number as elements. The code would look a little like this:

            teams = {
            "Niners" : ("Gabbert", "02" ),
            "Bears" : ("CUTLER, JAY","06"),
            "Bengals" : ("Dalton, Andy","14"),
            "Bills" : (" Taylor, Tyrod", "05")} #etc
            # Again, not including the whole dictionary for brevity's sake
            
            def decor(func):
                def wrap():
                    print("===============================")
                    func()
                    print("===============================")
                return wrap
            
            def print_text():
                print("Who\s your NFL Quarterback? ")
            
            decorated = decor(print_text)
            decorated()
            
            team = input("Enter your teams name here:").capitalize()
            print(teams[team][0], teams[team][1])
            

            This time teams[team] is a tuple. The last line is printing the first element (the quarterback's name), then the second element (the number).


            The second one is cleaner and requires less code, but it really is a matter of personal preference.

            There is more information about dictionaries in the docs.

            Also, I shortened the code sample for the sake of brevity, but you can see the full code samples on pastebin.

            qid & accept id: (37316698, 37316842) query: Python binary conversion to hex soup:

            The \xhh format you often see is a debugging aid, the output of the repr() applied to a string with non-ASCII codepoints. Any ASCII codepoints are left a in-place to leave what readable information is there.

            \n

            If you must have a string with all characters replaced by \xhh escapes, you need to do so manually:

            \n
            ''.join(r'\x{0:02x}'.format(ord(c)) for c in value)\n
            \n

            If you need quotes around that, you'd need to add those manually too:

            \n
            "'{0}'".format(''.join(r'\x{:02x}'.format(ord(c)) for c in value))\n
            \n soup wrap:

            The \xhh format you often see is a debugging aid, the output of the repr() applied to a string with non-ASCII codepoints. Any ASCII codepoints are left a in-place to leave what readable information is there.

            If you must have a string with all characters replaced by \xhh escapes, you need to do so manually:

            ''.join(r'\x{0:02x}'.format(ord(c)) for c in value)
            

            If you need quotes around that, you'd need to add those manually too:

            "'{0}'".format(''.join(r'\x{:02x}'.format(ord(c)) for c in value))
            
            qid & accept id: (37340568, 37341349) query: Interactive shell program wrapper in python soup:

            subprocess.Popen will work for this, but to read and then write and then read again you can't use communicate (because this will cause the process to end).

            \n

            Instead, you'll need to work with the process's output pipe (process.stdout below). This is tricky to get right, because reading on the process's stdout is blocking, so you sort of need to know when to stop trying to read (or know how much output the process is going to produce).

            \n

            In this example, the subprocess is a shell script that writes a line of output, and then echoes whatever you give it until it reads EOF.

            \n
            import subprocess\n\nCOMMAND_LINE = 'echo "Hello World!" ; cat'\n\nprocess = subprocess.Popen(COMMAND_LINE, shell=True,\n                           stdin=subprocess.PIPE,\n                           stdout=subprocess.PIPE)\n\ns = process.stdout.readline().strip()\nprint(s)\ns2 = process.communicate(s)[0]\nprint(s2)\n
            \n

            Gives:

            \n
            Hello World!\nHello World!\n
            \n

            For more complicated cases, you might think about looking at something like pexpect.

            \n soup wrap:

            subprocess.Popen will work for this, but to read and then write and then read again you can't use communicate (because this will cause the process to end).

            Instead, you'll need to work with the process's output pipe (process.stdout below). This is tricky to get right, because reading on the process's stdout is blocking, so you sort of need to know when to stop trying to read (or know how much output the process is going to produce).

            In this example, the subprocess is a shell script that writes a line of output, and then echoes whatever you give it until it reads EOF.

            import subprocess
            
            COMMAND_LINE = 'echo "Hello World!" ; cat'
            
            process = subprocess.Popen(COMMAND_LINE, shell=True,
                                       stdin=subprocess.PIPE,
                                       stdout=subprocess.PIPE)
            
            s = process.stdout.readline().strip()
            print(s)
            s2 = process.communicate(s)[0]
            print(s2)
            

            Gives:

            Hello World!
            Hello World!
            

            For more complicated cases, you might think about looking at something like pexpect.

            qid & accept id: (37348050, 37348283) query: Getting file path from command line arguments in python soup:

            As Display Name said, os.path.isabs along with sys.argv is probably the best:

            \n
            import sys\nimport os\n\nfpath = sys.argv[-1]\n\nprint(os.path.isabs(fpath))\nprint(fpath)\n
            \n

            output

            \n
            >>> \nTrue\nC:\Users\310176421\Desktop\Python\print.py\n>>>\n
            \n

            some cmd stuff

            \n
            C:\Users\310176421\Desktop\Python>python print.py C:\Users\310176421\Desktop\tes\nt.txt\nTrue\nC:\Users\310176421\Desktop\test.txt\n\nC:\Users\310176421\Desktop\Python>python print.py whatever\nFalse\nwhatever\n
            \n soup wrap:

            As Display Name said, os.path.isabs along with sys.argv is probably the best:

            import sys
            import os
            
            fpath = sys.argv[-1]
            
            print(os.path.isabs(fpath))
            print(fpath)
            

            output

            >>> 
            True
            C:\Users\310176421\Desktop\Python\print.py
            >>>
            

            some cmd stuff

            C:\Users\310176421\Desktop\Python>python print.py C:\Users\310176421\Desktop\tes
            t.txt
            True
            C:\Users\310176421\Desktop\test.txt
            
            C:\Users\310176421\Desktop\Python>python print.py whatever
            False
            whatever
            
            qid & accept id: (37365033, 37418415) query: How to print framed strings soup:

            Do not try to draw boxes by hand.\nIt will break.

            \n

            I once needed some function to draw boxes, so for documentation reasons I'm posting a cleaned version here:

            \n
            UL, UR = '╔', '╗'\nSL, SR = '╠', '║'\nDL, DR = '╚', '╝'\nAL, AR = '═', '>'\n\n\ndef padded(\n    line, info=None, width=42, intro='>', outro='<', filler='.', chopped='..'\n):\n    # cleanup input\n    line = ''.join([' ', line.strip()]) if line else ''\n    info = info.strip() if info else ''\n\n    # determine available width\n    width -= sum([len(intro), len(outro), len(line), len(info)])\n    if width < 0:\n        # chop off overflowing text\n        line = line[:len(line)+width]\n        if chopped:\n            # place chopped characters (if set)\n            chopped = chopped.strip()\n            line = ' '.join([line[:len(line)-(len(chopped)+1)], chopped])\n\n    return ''.join(e for e in [\n        intro,\n        info,\n        line,\n        ''.join(filler for _ in range(width)),\n        outro\n    ] if e)\n\n\ndef box(rnum, nbeds, *extras):\n    arrow = (AL+AR)\n    res = [\n        # head line\n        padded(\n            'Stanza n. {:03d} <'.format(rnum), (AL+AL+arrow),\n            intro=UL, outro=UR, filler=AL\n        ),\n        # first line\n        padded(\n            'Num letti: {:3d}'.format(nbeds), arrow,\n            intro=SL, outro=SR, filler=' '\n        ),\n    ]\n    # following lines\n    res.extend(padded(e, arrow, intro=SL, outro=SR, filler=' ') for e in extras)\n    # bottom line\n    res.append(padded(None, None, intro=DL, outro=DR, filler=AL))\n\n    return '\n'.join(res)\n\n\nprint(\n    box(485, 3, 'Fumatori', 'Televisione')\n)\nprint(\n    box(123, 4, 'Fumatori', 'Televisione', 'Aria Condizionata')\n)\nprint(\n    box(1, 1, 'this is so much text it will be chopped off')\n)\n
            \n

            The result will look like this:

            \n
            ╔═══> Stanza n. 485 <════════════════════╗\n╠═> Num letti:   3                       ║\n╠═> Fumatori                             ║\n╠═> Televisione                          ║\n╚════════════════════════════════════════╝\n╔═══> Stanza n. 123 <════════════════════╗\n╠═> Num letti:   4                       ║\n╠═> Fumatori                             ║\n╠═> Televisione                          ║\n╠═> Aria Condizionata                    ║\n╚════════════════════════════════════════╝\n╔═══> Stanza n. 001 <════════════════════╗\n╠═> Num letti:   1                       ║\n╠═> this is so much text it will be ch ..║\n╚════════════════════════════════════════╝\n
            \n soup wrap:

            Do not try to draw boxes by hand. It will break.

            I once needed some function to draw boxes, so for documentation reasons I'm posting a cleaned version here:

            UL, UR = '╔', '╗'
            SL, SR = '╠', '║'
            DL, DR = '╚', '╝'
            AL, AR = '═', '>'
            
            
            def padded(
                line, info=None, width=42, intro='>', outro='<', filler='.', chopped='..'
            ):
                # cleanup input
                line = ''.join([' ', line.strip()]) if line else ''
                info = info.strip() if info else ''
            
                # determine available width
                width -= sum([len(intro), len(outro), len(line), len(info)])
                if width < 0:
                    # chop off overflowing text
                    line = line[:len(line)+width]
                    if chopped:
                        # place chopped characters (if set)
                        chopped = chopped.strip()
                        line = ' '.join([line[:len(line)-(len(chopped)+1)], chopped])
            
                return ''.join(e for e in [
                    intro,
                    info,
                    line,
                    ''.join(filler for _ in range(width)),
                    outro
                ] if e)
            
            
            def box(rnum, nbeds, *extras):
                arrow = (AL+AR)
                res = [
                    # head line
                    padded(
                        'Stanza n. {:03d} <'.format(rnum), (AL+AL+arrow),
                        intro=UL, outro=UR, filler=AL
                    ),
                    # first line
                    padded(
                        'Num letti: {:3d}'.format(nbeds), arrow,
                        intro=SL, outro=SR, filler=' '
                    ),
                ]
                # following lines
                res.extend(padded(e, arrow, intro=SL, outro=SR, filler=' ') for e in extras)
                # bottom line
                res.append(padded(None, None, intro=DL, outro=DR, filler=AL))
            
                return '\n'.join(res)
            
            
            print(
                box(485, 3, 'Fumatori', 'Televisione')
            )
            print(
                box(123, 4, 'Fumatori', 'Televisione', 'Aria Condizionata')
            )
            print(
                box(1, 1, 'this is so much text it will be chopped off')
            )
            

            The result will look like this:

            ╔═══> Stanza n. 485 <════════════════════╗
            ╠═> Num letti:   3                       ║
            ╠═> Fumatori                             ║
            ╠═> Televisione                          ║
            ╚════════════════════════════════════════╝
            ╔═══> Stanza n. 123 <════════════════════╗
            ╠═> Num letti:   4                       ║
            ╠═> Fumatori                             ║
            ╠═> Televisione                          ║
            ╠═> Aria Condizionata                    ║
            ╚════════════════════════════════════════╝
            ╔═══> Stanza n. 001 <════════════════════╗
            ╠═> Num letti:   1                       ║
            ╠═> this is so much text it will be ch ..║
            ╚════════════════════════════════════════╝
            
            qid & accept id: (37374947, 37375089) query: Elegant way to split list on particular values soup:

            Code -

            \n
            from collections import defaultdict\n\narr = ['a', 1, 2, 3, 'b', 4, 5, 6]\n\nd = defaultdict(list)\n\ncur_key = arr[0]\n\nfor value in arr[1:]:\n    if type(value) != type(cur_key):\n        d[cur_key].append(value)\n    else:\n        cur_key = value\n\nprint(d)\n
            \n

            Output -

            \n
            defaultdict(, {'b': [4, 5, 6], 'a': [1, 2, 3]})\n
            \n soup wrap:

            Code -

            from collections import defaultdict
            
            arr = ['a', 1, 2, 3, 'b', 4, 5, 6]
            
            d = defaultdict(list)
            
            cur_key = arr[0]
            
            for value in arr[1:]:
                if type(value) != type(cur_key):
                    d[cur_key].append(value)
                else:
                    cur_key = value
            
            print(d)
            

            Output -

            defaultdict(, {'b': [4, 5, 6], 'a': [1, 2, 3]})
            
            qid & accept id: (37397296, 37397374) query: Summing similar elements within a tuple-of-tuples soup:

            Code -

            \n
            from collections import defaultdict\n\nT1 = (('a', 'b', 2),\n ('a', 'c', 4),\n ('b', 'c', 1),\n ('a', 'b', 8),)\n\nd = defaultdict(int)\n\nfor x, y, z in T1:\n    d[(x, y)] += z\n\nT2 = tuple([(*k, v) for k, v in d.items()])\n\nprint(T2)\n
            \n

            Output -

            \n
            (('a', 'c', 4), ('b', 'c', 1), ('a', 'b', 10))\n
            \n

            If you're interested in maintaining the original order, then -

            \n
            from collections import OrderedDict\n\nT1 = (('a', 'b', 2), ('a', 'c', 4), ('b', 'c', 1), ('a', 'b', 8),)\n\nd = OrderedDict()\n\nfor x, y, z in T1:\n    d[(x, y)] = d[(x, y)] + z if (x, y) in d else z\n\nT2 = tuple((*k, v) for k, v in d.items())\n\nprint(T2)\n
            \n

            Output -

            \n
            (('a', 'b', 10), ('a', 'c', 4), ('b', 'c', 1))\n
            \n

            In Python 2, you should use this -

            \n
            T2 = tuple([(x, y, z) for (x, y), z in d.items()])\n
            \n soup wrap:

            Code -

            from collections import defaultdict
            
            T1 = (('a', 'b', 2),
             ('a', 'c', 4),
             ('b', 'c', 1),
             ('a', 'b', 8),)
            
            d = defaultdict(int)
            
            for x, y, z in T1:
                d[(x, y)] += z
            
            T2 = tuple([(*k, v) for k, v in d.items()])
            
            print(T2)
            

            Output -

            (('a', 'c', 4), ('b', 'c', 1), ('a', 'b', 10))
            

            If you're interested in maintaining the original order, then -

            from collections import OrderedDict
            
            T1 = (('a', 'b', 2), ('a', 'c', 4), ('b', 'c', 1), ('a', 'b', 8),)
            
            d = OrderedDict()
            
            for x, y, z in T1:
                d[(x, y)] = d[(x, y)] + z if (x, y) in d else z
            
            T2 = tuple((*k, v) for k, v in d.items())
            
            print(T2)
            

            Output -

            (('a', 'b', 10), ('a', 'c', 4), ('b', 'c', 1))
            

            In Python 2, you should use this -

            T2 = tuple([(x, y, z) for (x, y), z in d.items()])
            
            qid & accept id: (37399461, 37399567) query: vectorized implementation for pseudo pivot table in python soup:

            You could use df.crosstab to create a frequency table:

            \n
            import pandas as pd\n\ndf = pd.DataFrame(\n    {'Component': ['Air conditioner', 'Air conditioner', 'airbag', 'engine with 150 H/P', 'airbag',\n                   '1-year concierge assistance', 'ABS breaks', 'ABS breaks', 'airbag', \n                   'air conditioner', 'engine with 250 H/P'], \n     'Vehicle': ['Ford', 'Ford', 'Ford', 'Ford', 'Toyota', 'Toyota', 'Toyota',\n                 'Chrysler', 'Chrysler', 'Chrysler', 'Chrysler']})\n\nresult = pd.crosstab(index=[df['Vehicle']], columns=[df['Component']]).clip(upper=1)\nprint(result)\n
            \n

            yields

            \n
            Component  1-year concierge assistance  ABS breaks  Air conditioner  \\nVehicle                                                               \nChrysler                             0           1                0   \nFord                                 0           0                1   \nToyota                               1           1                0   \n\nComponent  air conditioner  airbag  engine with 150 H/P  engine with 250 H/P  \nVehicle                                                                       \nChrysler                 1       1                    0                    1  \nFord                     0       1                    1                    0  \nToyota                   0       1                    0                    0  \n
            \n

            Since the frequency table may contain values greater than 1 if df contains duplicate rows, clip(upper=1) is used to reduce those values back to 1.

            \n soup wrap:

            You could use df.crosstab to create a frequency table:

            import pandas as pd
            
            df = pd.DataFrame(
                {'Component': ['Air conditioner', 'Air conditioner', 'airbag', 'engine with 150 H/P', 'airbag',
                               '1-year concierge assistance', 'ABS breaks', 'ABS breaks', 'airbag', 
                               'air conditioner', 'engine with 250 H/P'], 
                 'Vehicle': ['Ford', 'Ford', 'Ford', 'Ford', 'Toyota', 'Toyota', 'Toyota',
                             'Chrysler', 'Chrysler', 'Chrysler', 'Chrysler']})
            
            result = pd.crosstab(index=[df['Vehicle']], columns=[df['Component']]).clip(upper=1)
            print(result)
            

            yields

            Component  1-year concierge assistance  ABS breaks  Air conditioner  \
            Vehicle                                                               
            Chrysler                             0           1                0   
            Ford                                 0           0                1   
            Toyota                               1           1                0   
            
            Component  air conditioner  airbag  engine with 150 H/P  engine with 250 H/P  
            Vehicle                                                                       
            Chrysler                 1       1                    0                    1  
            Ford                     0       1                    1                    0  
            Toyota                   0       1                    0                    0  
            

            Since the frequency table may contain values greater than 1 if df contains duplicate rows, clip(upper=1) is used to reduce those values back to 1.

            qid & accept id: (37417157, 37418284) query: Changing the columns in DataFrame with respect to values in other columns soup:

            You may want to consider reindexing your data based on how you want to utilize it.

            \n

            You can index your data based on the column "Trans" and "Num" like so:

            \n
            #Change how we index the frame\ndf.set_index(["Trans", "Num"], inplace=True)\n
            \n

            Next, we'll grab each index that is unique so we can replace them all (I'm pretty sure this part and the iteration below can be done in bulk, but I just did this quickly. If you are having efficiency problems look into how to not not loop over all the indexes probably.)

            \n
            #Get only unique indexes\nunique_trans = list(set(df.index.get_level_values('Trans')))\n
            \n

            Then we can iterate through and apply what you want.

            \n
            # Access each index\nfor trans in unique_trans:\n\n    # Get the higher number in "Num" for each so we know which to set to NaN\n    max_num = max(df.ix[trans].index.values)\n\n    # Copy your start column as a temp variable\n    start = df.ix[trans]["Start"].copy()\n\n    # Apply the transform to the start column (Equal to end + 10)        \n    df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10\n\n    # Apply the transform to the end column\n    df.loc[trans, "End"] = np.array(start.shift(-1) - 10)\n\n    # By passing a tuple as a row index, we get the element that is both in trans and the max number, \n    #which is the one you want to set to NaN\n    df.loc[(trans, max_num), "End"] = np.nan\n\nprint(df)\n
            \n

            The result I got from this when running your data was:

            \n
                            Head  Chr     Start      End\nTrans      Num                             \nENST473358 1      A    1   30049.0  30554.0\n           2      A    1   30677.0  30966.0\n           3      A    1   31107.0      NaN\nENST417324 1      B    1   35277.0  35481.0\n           2      B    1   34554.0  35174.0\n           3      B    1   35721.0      NaN\nENST461467 1      B    1   35245.0  35481.0\n           2      B    1  120775.0      NaN\n
            \n

            The full code I used to generate your test case is this:

            \n
            import pandas as pd\nimport numpy as np\n# Setup your dataframe\ndf = pd.DataFrame(columns=["Head", "Chr", "Start", "End", "Trans", "Num"])\ndf["Head"] = ["A", "A", "A", "B", "B", "B", "B", "B"]\ndf["Chr"] = [1]*8\ndf["Start"] = [29554, 30564, 30976, 36091, 35491, 35184, 36083, 35491]\ndf["End"] = [30039, 30667, 31097, 35267, 34544, 35711, 35235, 120765]\ndf["Trans"] = ["ENST473358", "ENST473358", "ENST473358",\n               "ENST417324", "ENST417324", "ENST417324",\n               "ENST461467","ENST461467"]\ndf["Num"] = [1, 2, 3, 1, 2, 3, 1, 2]\n\n# Change how we index the frame\ndf.set_index(["Trans", "Num"], inplace=True)\n\n# Get only unique indexes\nunique_trans = list(set(df.index.get_level_values('Trans')))\n\n# Access each index\nfor trans in unique_trans:\n    max_num = max(df.ix[trans].index.values)\n\n    start = df.ix[trans]["Start"].copy()\n    df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10\n    df.loc[trans, "End"] = np.array(start.shift(-1) - 10)\n    df.loc[(trans, max_num), "End"] = np.nan\n\nprint(df)\n
            \n soup wrap:

            You may want to consider reindexing your data based on how you want to utilize it.

            You can index your data based on the column "Trans" and "Num" like so:

            #Change how we index the frame
            df.set_index(["Trans", "Num"], inplace=True)
            

            Next, we'll grab each index that is unique so we can replace them all (I'm pretty sure this part and the iteration below can be done in bulk, but I just did this quickly. If you are having efficiency problems look into how to not not loop over all the indexes probably.)

            #Get only unique indexes
            unique_trans = list(set(df.index.get_level_values('Trans')))
            

            Then we can iterate through and apply what you want.

            # Access each index
            for trans in unique_trans:
            
                # Get the higher number in "Num" for each so we know which to set to NaN
                max_num = max(df.ix[trans].index.values)
            
                # Copy your start column as a temp variable
                start = df.ix[trans]["Start"].copy()
            
                # Apply the transform to the start column (Equal to end + 10)        
                df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10
            
                # Apply the transform to the end column
                df.loc[trans, "End"] = np.array(start.shift(-1) - 10)
            
                # By passing a tuple as a row index, we get the element that is both in trans and the max number, 
                #which is the one you want to set to NaN
                df.loc[(trans, max_num), "End"] = np.nan
            
            print(df)
            

            The result I got from this when running your data was:

                            Head  Chr     Start      End
            Trans      Num                             
            ENST473358 1      A    1   30049.0  30554.0
                       2      A    1   30677.0  30966.0
                       3      A    1   31107.0      NaN
            ENST417324 1      B    1   35277.0  35481.0
                       2      B    1   34554.0  35174.0
                       3      B    1   35721.0      NaN
            ENST461467 1      B    1   35245.0  35481.0
                       2      B    1  120775.0      NaN
            

            The full code I used to generate your test case is this:

            import pandas as pd
            import numpy as np
            # Setup your dataframe
            df = pd.DataFrame(columns=["Head", "Chr", "Start", "End", "Trans", "Num"])
            df["Head"] = ["A", "A", "A", "B", "B", "B", "B", "B"]
            df["Chr"] = [1]*8
            df["Start"] = [29554, 30564, 30976, 36091, 35491, 35184, 36083, 35491]
            df["End"] = [30039, 30667, 31097, 35267, 34544, 35711, 35235, 120765]
            df["Trans"] = ["ENST473358", "ENST473358", "ENST473358",
                           "ENST417324", "ENST417324", "ENST417324",
                           "ENST461467","ENST461467"]
            df["Num"] = [1, 2, 3, 1, 2, 3, 1, 2]
            
            # Change how we index the frame
            df.set_index(["Trans", "Num"], inplace=True)
            
            # Get only unique indexes
            unique_trans = list(set(df.index.get_level_values('Trans')))
            
            # Access each index
            for trans in unique_trans:
                max_num = max(df.ix[trans].index.values)
            
                start = df.ix[trans]["Start"].copy()
                df.loc[trans, "Start"] = np.array(df.ix[trans]["End"]) + 10
                df.loc[trans, "End"] = np.array(start.shift(-1) - 10)
                df.loc[(trans, max_num), "End"] = np.nan
            
            print(df)
            
            qid & accept id: (37423445, 37423513) query: Python prettytable Sort by Multiple Columns soup:

            You can call operator.itemgetter() as a sort_key value. Note that sortby still needs to be given for the sort_key to be applied:

            \n
            import operator\nfrom prettytable import PrettyTable\n\n\ntable = PrettyTable(["Name", "Grade"])\ntable.add_row(["Joe", 90])\ntable.add_row(["Sally", 100])\ntable.add_row(["Bill", 90])\ntable.add_row(["Alice", 90])\nprint table.get_string(sort_key=operator.itemgetter(1, 0), sortby="Grade")\n
            \n

            Prints:

            \n
            +-------+-------+\n|  Name | Grade |\n+-------+-------+\n| Alice |   90  |\n|  Bill |   90  |\n|  Joe  |   90  |\n| Sally |  100  |\n+-------+-------+\n
            \n soup wrap:

            You can call operator.itemgetter() as a sort_key value. Note that sortby still needs to be given for the sort_key to be applied:

            import operator
            from prettytable import PrettyTable
            
            
            table = PrettyTable(["Name", "Grade"])
            table.add_row(["Joe", 90])
            table.add_row(["Sally", 100])
            table.add_row(["Bill", 90])
            table.add_row(["Alice", 90])
            print table.get_string(sort_key=operator.itemgetter(1, 0), sortby="Grade")
            

            Prints:

            +-------+-------+
            |  Name | Grade |
            +-------+-------+
            | Alice |   90  |
            |  Bill |   90  |
            |  Joe  |   90  |
            | Sally |  100  |
            +-------+-------+
            
            qid & accept id: (37425477, 37425791) query: remove newline and whitespace parse XML with python Xpath soup:

            Just call normalize-space(.) on each node.

            \n
            import lxml.etree as et\n\nxml = et.parse("feed.xml")\nns = {"ns": 'http://www.w3.org/2005/Atom'}\nfor n in xml.xpath("//ns:category", namespaces=ns):\n    t  = n.xpath("./../ns:summary", namespaces=ns)[0]\n    print(t.xpath("normalize-space(.)"))\n
            \n

            Output:

            \n
            Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.\nPutting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.\nPutting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.\nThe accessibility orthodoxy does not permit people to question the value of features that are rarely useful and rarely used.\nThese notes will eventually become part of a tech talk on video encoding.\nThese notes will eventually become part of a tech talk on video encoding.\nThese notes will eventually become part of a tech talk on video encoding.\nThese notes will eventually become part of a tech talk on video encoding.\nThese notes will eventually become part of a tech talk on video encoding.\nThese notes will eventually become part of a tech talk on video encoding.\nThese notes will eventually become part of a tech talk on video encoding.\nThese notes will eventually become part of a tech talk on video encoding.\n
            \n

            All your newlines have been removed and multiple spaces replaced with a single space.

            \n

            Part two of your question is asking for the title tag as that is the only tag with the text you are looking for, but to specifically find the title with that exact text, that is simply:

            \n
            xml.xpath("//ns:title[text()='dive into mark']", namespaces=ns)\n
            \n

            If you wanted any node that contained the text, you would just replace ns:title with a wildcard:

            \n
            xml.xpath("//*[text()='dive into mark']", namespaces=ns)\n
            \n soup wrap:

            Just call normalize-space(.) on each node.

            import lxml.etree as et
            
            xml = et.parse("feed.xml")
            ns = {"ns": 'http://www.w3.org/2005/Atom'}
            for n in xml.xpath("//ns:category", namespaces=ns):
                t  = n.xpath("./../ns:summary", namespaces=ns)[0]
                print(t.xpath("normalize-space(.)"))
            

            Output:

            Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
            Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
            Putting an entire chapter on one page sounds bloated, but consider this — my longest chapter so far would be 75 printed pages, and it loads in under 5 seconds… On dialup.
            The accessibility orthodoxy does not permit people to question the value of features that are rarely useful and rarely used.
            These notes will eventually become part of a tech talk on video encoding.
            These notes will eventually become part of a tech talk on video encoding.
            These notes will eventually become part of a tech talk on video encoding.
            These notes will eventually become part of a tech talk on video encoding.
            These notes will eventually become part of a tech talk on video encoding.
            These notes will eventually become part of a tech talk on video encoding.
            These notes will eventually become part of a tech talk on video encoding.
            These notes will eventually become part of a tech talk on video encoding.
            

            All your newlines have been removed and multiple spaces replaced with a single space.

            Part two of your question is asking for the title tag as that is the only tag with the text you are looking for, but to specifically find the title with that exact text, that is simply:

            xml.xpath("//ns:title[text()='dive into mark']", namespaces=ns)
            

            If you wanted any node that contained the text, you would just replace ns:title with a wildcard:

            xml.xpath("//*[text()='dive into mark']", namespaces=ns)
            
            qid & accept id: (37444512, 37445015) query: Print from txt file soup:

            You can use an instance of the built-in string.Template class. Note the $user1 I added.

            \n
            from string import Template\n\ntemplate = Template('''\\nURL GOTO=https://www.url.com/$user1\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27''')\n\nwith open('users.txt') as file:\n    for line in file:\n        print(template.substitute({'user1': line.strip()}))\n
            \n

            Update

            \n

            An even simpler way is to use the str.format method common to all strings. The syntax for replacement fields is slightly different ({user1} instead of $user1), but it has the advantage that you don't have to import anything to use it and it plays well with all the other format string options.

            \n
            template = '''\\nURL GOTO=https://www.url.com/{user1}\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27'''\n\nwith open('users.txt') as file:\n    for line in file:\n        print(template.format(user1=line.strip()))\n
            \n

            Both will product the following output when run with the data in your sample users.txt file:

            \n
            URL GOTO=https://www.url.com/rrralu\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\nURL GOTO=https://www.url.com/rebeccamacavei\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\nURL GOTO=https://www.url.com/corinnaco_\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\nURL GOTO=https://www.url.com/andrew1996_\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\nURL GOTO=https://www.url.com/thisisme_r\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\nURL GOTO=https://www.url.com/zabiburuziga\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\nURL GOTO=https://www.url.com/be_real_00\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\nURL GOTO=https://www.url.com/officiel_14_leo\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\nURL GOTO=https://www.url.com/thefullersgroup\nTAG POS=1 TYPE=BUTTON ATTR=TXT:Follow\nWAIT SECONDS= 27\n
            \n soup wrap:

            You can use an instance of the built-in string.Template class. Note the $user1 I added.

            from string import Template
            
            template = Template('''\
            URL GOTO=https://www.url.com/$user1
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27''')
            
            with open('users.txt') as file:
                for line in file:
                    print(template.substitute({'user1': line.strip()}))
            

            Update

            An even simpler way is to use the str.format method common to all strings. The syntax for replacement fields is slightly different ({user1} instead of $user1), but it has the advantage that you don't have to import anything to use it and it plays well with all the other format string options.

            template = '''\
            URL GOTO=https://www.url.com/{user1}
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27'''
            
            with open('users.txt') as file:
                for line in file:
                    print(template.format(user1=line.strip()))
            

            Both will product the following output when run with the data in your sample users.txt file:

            URL GOTO=https://www.url.com/rrralu
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            URL GOTO=https://www.url.com/rebeccamacavei
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            URL GOTO=https://www.url.com/corinnaco_
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            URL GOTO=https://www.url.com/andrew1996_
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            URL GOTO=https://www.url.com/thisisme_r
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            URL GOTO=https://www.url.com/zabiburuziga
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            URL GOTO=https://www.url.com/be_real_00
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            URL GOTO=https://www.url.com/officiel_14_leo
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            URL GOTO=https://www.url.com/thefullersgroup
            TAG POS=1 TYPE=BUTTON ATTR=TXT:Follow
            WAIT SECONDS= 27
            
            qid & accept id: (37482313, 37484115) query: Comparing List and get indices in python soup:

            This is the initial dataframe:

            \n
            mac_list\n\n    mac_address  frequency\n0  20c9d0892feb          2\n1  28e34789c4c2          1\n2  3480b3d51d5f          1\n3  4480ebb4e28c          1\n4  4c60de5dad72          1\n5  4ca56dab4550          1\n
            \n

            And the new list:

            \n
            new_mac_list = ['20c9d0892feb', '3480b3d51d5f', '20c9d0892feb', '249cji39fj4g']\n
            \n

            I'd first set the index of mac_list as mac_address:

            \n
            mac_list = mac_list.set_index("mac_address")\n
            \n

            And then calculate the frequencies in the new list:

            \n
            new_freq = pd.Series(new_mac_list).value_counts()\n
            \n

            You can then use the add method on the series:

            \n
            res = mac_list["frequency"].add(new_freq, fill_value=0)\n\n20c9d0892feb    4.0\n249cji39fj4g    1.0\n28e34789c4c2    1.0\n3480b3d51d5f    2.0\n4480ebb4e28c    1.0\n4c60de5dad72    1.0\n4ca56dab4550    1.0\ndtype: float64\n
            \n

            Back to the original format:

            \n
            mac_list = pd.DataFrame(res, columns = ["frequency"])\nprint(mac_list)\n\n              frequency\n20c9d0892feb        4.0\n249cji39fj4g        1.0\n28e34789c4c2        1.0\n3480b3d51d5f        2.0\n4480ebb4e28c        1.0\n4c60de5dad72        1.0\n4ca56dab4550        1.0\n
            \n soup wrap:

            This is the initial dataframe:

            mac_list
            
                mac_address  frequency
            0  20c9d0892feb          2
            1  28e34789c4c2          1
            2  3480b3d51d5f          1
            3  4480ebb4e28c          1
            4  4c60de5dad72          1
            5  4ca56dab4550          1
            

            And the new list:

            new_mac_list = ['20c9d0892feb', '3480b3d51d5f', '20c9d0892feb', '249cji39fj4g']
            

            I'd first set the index of mac_list as mac_address:

            mac_list = mac_list.set_index("mac_address")
            

            And then calculate the frequencies in the new list:

            new_freq = pd.Series(new_mac_list).value_counts()
            

            You can then use the add method on the series:

            res = mac_list["frequency"].add(new_freq, fill_value=0)
            
            20c9d0892feb    4.0
            249cji39fj4g    1.0
            28e34789c4c2    1.0
            3480b3d51d5f    2.0
            4480ebb4e28c    1.0
            4c60de5dad72    1.0
            4ca56dab4550    1.0
            dtype: float64
            

            Back to the original format:

            mac_list = pd.DataFrame(res, columns = ["frequency"])
            print(mac_list)
            
                          frequency
            20c9d0892feb        4.0
            249cji39fj4g        1.0
            28e34789c4c2        1.0
            3480b3d51d5f        2.0
            4480ebb4e28c        1.0
            4c60de5dad72        1.0
            4ca56dab4550        1.0
            
            qid & accept id: (37492239, 37492464) query: python plot distribution across mean soup:

            As Andy Hayden already suggested, pandas could be a very good option here:

            \n
            from datetime import date, timedelta as td, datetime\nd1 = datetime.strptime('1/1/2015', "%m/%d/%Y")\nd2 = datetime.strptime('12/31/2015', "%m/%d/%Y")\n\nAllDays = []\nwhile(d1<=d2):\n    AllDays.append(d1)\n    d1 = d1 + td(days=1)\n\ntemps = np.random.normal( 20, 0.5, size=(500,365) )\ntemps = pd.DataFrame( temps.T, index=AllDays )\n\nfig, ax = plt.subplots( 1, 1, figsize=(16,8) )\nax.plot( temps.index, temps.T.mean(), color='blue', linewidth=2 )\n
            \n

            Edit:

            \n

            Added the next line to plot the area you put in the example. Notice that for each x-value, you plot only 3 y-values: max, min & mean. Or whatever, you may of course want to plot the Q1 & Q3, or confidence intervals. My point is that you don't actually need the 500 points anymore (summary statistics are so great ^_^)

            \n
            ax.fill_between( temps.index, y1=temps.T.max(), y2=temps.T.min(), color='gray', alpha=0.5)\n\nax.set_ylabel('temperature [°C]')\nax.set_xlabel('measuring date')\nax.set_ylim([15,25])\n\nplt.savefig('plot.png')\n
            \n

            enter image description here

            \n

            Note: \nAs already shown, you don't really need pandas for this, but it still great for a number of things and you may want to give it a try ;)

            \n soup wrap:

            As Andy Hayden already suggested, pandas could be a very good option here:

            from datetime import date, timedelta as td, datetime
            d1 = datetime.strptime('1/1/2015', "%m/%d/%Y")
            d2 = datetime.strptime('12/31/2015', "%m/%d/%Y")
            
            AllDays = []
            while(d1<=d2):
                AllDays.append(d1)
                d1 = d1 + td(days=1)
            
            temps = np.random.normal( 20, 0.5, size=(500,365) )
            temps = pd.DataFrame( temps.T, index=AllDays )
            
            fig, ax = plt.subplots( 1, 1, figsize=(16,8) )
            ax.plot( temps.index, temps.T.mean(), color='blue', linewidth=2 )
            

            Edit:

            Added the next line to plot the area you put in the example. Notice that for each x-value, you plot only 3 y-values: max, min & mean. Or whatever, you may of course want to plot the Q1 & Q3, or confidence intervals. My point is that you don't actually need the 500 points anymore (summary statistics are so great ^_^)

            ax.fill_between( temps.index, y1=temps.T.max(), y2=temps.T.min(), color='gray', alpha=0.5)
            
            ax.set_ylabel('temperature [°C]')
            ax.set_xlabel('measuring date')
            ax.set_ylim([15,25])
            
            plt.savefig('plot.png')
            

            enter image description here

            Note: As already shown, you don't really need pandas for this, but it still great for a number of things and you may want to give it a try ;)

            qid & accept id: (37501075, 37501076) query: How to get parameter arguments from a frozen spicy.stats distribution? soup:

            Accessing rv frozen parameters

            \n

            Yes, the parameters used to create a frozen distribution are available within the instance of the distribution. They are stored within the args & kwds attribute. This will be dependent on if the distribution's instance was created with positional arguments or keyword arguments.

            \n
            import scipy.stats as stats\n\n# Parameters for this particular alpha distribution\na, loc, scale = 3.14, 5.0, 2.0\n\n# Create frozen distribution\nrv1 = stats.gamma(a, loc, scale)\nrv2 = stats.gamma(a, loc=loc, scale=scale)\n\n# Do something with frozen parameters\nprint 'positional and keyword'\nprint 'frozen args : {}'.format(rv1.args)\nprint 'frozen kwds : {}'.format(rv1.kwds)\nprint\nprint 'positional only'\nprint 'frozen args : {}'.format(rv2.args)\nprint 'frozen kwds : {}'.format(rv2.kwds)\n
            \n
            \n
            positional and keyword\nfrozen args : (3.14, 5.0, 2.0)\nfrozen kwds : {}\n\npositional only\nfrozen args : (3.14,)\nfrozen kwds : {'loc': 5.0, 'scale': 2.0}\n
            \n

            Bonus: Private method that handles both args and kwds

            \n

            There is an private method, .dist._parse_args(), which handles both cases of parameter passing and will return a consistent result.

            \n
            # Get the original parameters regardless of argument type\nshape1, loc1, scale1 = rv1.dist._parse_args(*rv1.args, **rv1.kwds)\nshape2, loc2, scale2 = rv2.dist._parse_args(*rv2.args, **rv2.kwds)\n\nprint 'positional and keyword'\nprint 'frozen parameters: shape={}, loc={}, scale={}'.format(shape1, loc1, scale1)\nprint\nprint 'positional only'\nprint 'frozen parameters: shape={}, loc={}, scale={}'.format(shape2, loc2, scale2)\n
            \n
            \n
            positional and keyword\nfrozen parameters: shape=(3.14,), loc=5.0, scale=2.0\n\npositional only\nfrozen parameters: shape=(3.14,), loc=5.0, scale=2.0\n
            \n

            Caveat

            \n

            Granted, using private methods is typically bad practice because technically internal APIs can always change, however, sometimes they provide nice features, would be easy to re-implement should things change and nothing is really private in Python :).

            \n soup wrap:

            Accessing rv frozen parameters

            Yes, the parameters used to create a frozen distribution are available within the instance of the distribution. They are stored within the args & kwds attribute. This will be dependent on if the distribution's instance was created with positional arguments or keyword arguments.

            import scipy.stats as stats
            
            # Parameters for this particular alpha distribution
            a, loc, scale = 3.14, 5.0, 2.0
            
            # Create frozen distribution
            rv1 = stats.gamma(a, loc, scale)
            rv2 = stats.gamma(a, loc=loc, scale=scale)
            
            # Do something with frozen parameters
            print 'positional and keyword'
            print 'frozen args : {}'.format(rv1.args)
            print 'frozen kwds : {}'.format(rv1.kwds)
            print
            print 'positional only'
            print 'frozen args : {}'.format(rv2.args)
            print 'frozen kwds : {}'.format(rv2.kwds)
            

            positional and keyword
            frozen args : (3.14, 5.0, 2.0)
            frozen kwds : {}
            
            positional only
            frozen args : (3.14,)
            frozen kwds : {'loc': 5.0, 'scale': 2.0}
            

            Bonus: Private method that handles both args and kwds

            There is an private method, .dist._parse_args(), which handles both cases of parameter passing and will return a consistent result.

            # Get the original parameters regardless of argument type
            shape1, loc1, scale1 = rv1.dist._parse_args(*rv1.args, **rv1.kwds)
            shape2, loc2, scale2 = rv2.dist._parse_args(*rv2.args, **rv2.kwds)
            
            print 'positional and keyword'
            print 'frozen parameters: shape={}, loc={}, scale={}'.format(shape1, loc1, scale1)
            print
            print 'positional only'
            print 'frozen parameters: shape={}, loc={}, scale={}'.format(shape2, loc2, scale2)
            

            positional and keyword
            frozen parameters: shape=(3.14,), loc=5.0, scale=2.0
            
            positional only
            frozen parameters: shape=(3.14,), loc=5.0, scale=2.0
            

            Caveat

            Granted, using private methods is typically bad practice because technically internal APIs can always change, however, sometimes they provide nice features, would be easy to re-implement should things change and nothing is really private in Python :).

            qid & accept id: (37546552, 37546621) query: Make a variable from what's in a text file soup:

            The file contains.

            \n
            rahul@HP-EliteBook ~/Projects/Stackoverflow $ cat abc.txt \nhai am here\n
            \n

            Here is the python code.It's very simple logic.

            \n
            fo = open("abc.txt", "r+")\na = fo.read()\n
            \n soup wrap:

            The file contains.

            rahul@HP-EliteBook ~/Projects/Stackoverflow $ cat abc.txt 
            hai am here
            

            Here is the python code.It's very simple logic.

            fo = open("abc.txt", "r+")
            a = fo.read()
            
            qid & accept id: (37605612, 37607766) query: PyImport_ImportModule, possible to load module from memory? soup:

            The following example shows how to define a module from a C string:

            \n
            #include \n#include \nint main(int argc, char *argv[])\n{\n    Py_Initialize();\n    PyRun_SimpleString("print('hello from python')");\n\n    // fake module\n    char *source = "__version__ = '2.0'";\n    char *filename = "test_module.py";\n\n    // perform module load\n    PyObject *builtins = PyEval_GetBuiltins();\n    PyObject *compile = PyDict_GetItemString(builtins, "compile");\n    PyObject *code = PyObject_CallFunction(compile, "sss", source, filename, "exec");\n    PyObject *module = PyImport_ExecCodeModule("test_module", code);\n\n    PyRun_SimpleString("import test_module; print(test_module.__version__)");\n\n    Py_Finalize();\n    return 0;\n}\n
            \n

            output:

            \n
            hello from python\nversion: 2.0\n
            \n

            You can read about import hooks in the docs. You will need to define a class with find_module and load_module methods. Something like the following should work:

            \n
            PyObject* find_module(PyObject* self, PyObject* args) {\n    // ... lookup args in available special modules ...\n    return Py_BuildValue("B", found);\n}\n\nPyObject* load_module(PyObject* self, PyObject* args) {\n    // ... convert args into filname, source ...\n    PyObject *builtins = PyEval_GetBuiltins();\n    PyObject *compile = PyDict_GetItemString(builtins, "compile");\n    PyObject *code = PyObject_CallFunction(compile, "sss", source, filename, "exec");\n    PyObject *module = PyImport_ExecCodeModule("test_module", code);\n    return Py_BuildValue("O", module);\n}\n\nstatic struct PyMethodDef methods[] = {\n    { "find_module", find_module, METH_VARARGS, "Returns module_loader if this is an encrypted module"},\n    { "load_module", load_module, METH_VARARGS, "Load an encrypted module" },\n    { NULL, NULL, 0, NULL }\n};\n\nstatic struct PyModuleDef modDef = {\n    PyModuleDef_HEAD_INIT, "embedded", NULL, -1, methods, \n    NULL, NULL, NULL, NULL\n};\n\nstatic PyObject* PyInit_embedded(void)\n{\n    return PyModule_Create(&modDef);\n}\n\nint main() {\n    ...\n    PyImport_AppendInittab("embedded", &PyInit_embedded);\n    PyRun_SimpleString("\\nimport embedded, sys\n\\nclass Importer:\n\\n    def find_module(self, fullpath):\n\\n        return self if embedded.find_module(fullpath) else None\n\\n    def load_module(self, fullpath):\n\\n        return embedded.load_module(fullpath)\n\\nsys.path_hooks.insert(0, Importer())\n\\n");\n    ...\n}\n
            \n soup wrap:

            The following example shows how to define a module from a C string:

            #include 
            #include 
            int main(int argc, char *argv[])
            {
                Py_Initialize();
                PyRun_SimpleString("print('hello from python')");
            
                // fake module
                char *source = "__version__ = '2.0'";
                char *filename = "test_module.py";
            
                // perform module load
                PyObject *builtins = PyEval_GetBuiltins();
                PyObject *compile = PyDict_GetItemString(builtins, "compile");
                PyObject *code = PyObject_CallFunction(compile, "sss", source, filename, "exec");
                PyObject *module = PyImport_ExecCodeModule("test_module", code);
            
                PyRun_SimpleString("import test_module; print(test_module.__version__)");
            
                Py_Finalize();
                return 0;
            }
            

            output:

            hello from python
            version: 2.0
            

            You can read about import hooks in the docs. You will need to define a class with find_module and load_module methods. Something like the following should work:

            PyObject* find_module(PyObject* self, PyObject* args) {
                // ... lookup args in available special modules ...
                return Py_BuildValue("B", found);
            }
            
            PyObject* load_module(PyObject* self, PyObject* args) {
                // ... convert args into filname, source ...
                PyObject *builtins = PyEval_GetBuiltins();
                PyObject *compile = PyDict_GetItemString(builtins, "compile");
                PyObject *code = PyObject_CallFunction(compile, "sss", source, filename, "exec");
                PyObject *module = PyImport_ExecCodeModule("test_module", code);
                return Py_BuildValue("O", module);
            }
            
            static struct PyMethodDef methods[] = {
                { "find_module", find_module, METH_VARARGS, "Returns module_loader if this is an encrypted module"},
                { "load_module", load_module, METH_VARARGS, "Load an encrypted module" },
                { NULL, NULL, 0, NULL }
            };
            
            static struct PyModuleDef modDef = {
                PyModuleDef_HEAD_INIT, "embedded", NULL, -1, methods, 
                NULL, NULL, NULL, NULL
            };
            
            static PyObject* PyInit_embedded(void)
            {
                return PyModule_Create(&modDef);
            }
            
            int main() {
                ...
                PyImport_AppendInittab("embedded", &PyInit_embedded);
                PyRun_SimpleString("\
            import embedded, sys\n\
            class Importer:\n\
                def find_module(self, fullpath):\n\
                    return self if embedded.find_module(fullpath) else None\n\
                def load_module(self, fullpath):\n\
                    return embedded.load_module(fullpath)\n\
            sys.path_hooks.insert(0, Importer())\n\
            ");
                ...
            }
            
            qid & accept id: (37630714, 37630790) query: Creating a slice object in python soup:

            Use None everywhere the syntax-based slice uses a blank value:

            \n
            someseq[slice(2, None)]\n
            \n

            is equivalent to:

            \n
            someseq[2:]\n
            \n

            Similarly, someseq[:10:2] can use a preconstructed slice defined with slice(None, 10, 2), etc.

            \n soup wrap:

            Use None everywhere the syntax-based slice uses a blank value:

            someseq[slice(2, None)]
            

            is equivalent to:

            someseq[2:]
            

            Similarly, someseq[:10:2] can use a preconstructed slice defined with slice(None, 10, 2), etc.

            qid & accept id: (37659598, 37659890) query: Extarct particulr part of json string using python regex soup:

            There is a problem in your JSON, it encloses another json object in the double quotes and is causing json.loads to fail. Try doing some transformation on json string before passing to json.loads.

            \n

            As following works perfectly.

            \n
            >>> p = json.loads('''{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{\"id\":205782,\"name\":\"Robert Shriwas\",\"gender\":\"F\",\"practicing_since\":null,\"years\":21,\"specializations\":[\"Mentor\"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true}''')\n
            \n

            And you extract the requited part as

            \n
            >>> p["list"]\n{u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}\n
            \n

            Check this out I could manage to correct the json you provided.

            \n
            >>> p = '''{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":" {\"id\":205782,\"name\":\"Robert Shriwas\",\"gender\":\"F\",\"practicing_since\":null,\"years\":21,\"specializations\":[\"Mentor\"]}","form":{"q":"","city":"Delhi","locality":null},"cerebro":true}'''\n>>> q = re.sub(r'(:)\s*"\s*(\{[^\}]+\})\s*"',r'\1\2', p[1:-1])\n>>> q\n'"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{"id":205782,"name":"Robert Shriwas","gender":"F","practicing_since":null,"years":21,"specializations":["Mentor"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true'\n>>> r = p[0] + q + p[-1]\n>>> r\n'{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{"id":205782,"name":"Robert Shriwas","gender":"F","practicing_since":null,"years":21,"specializations":["Mentor"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true}'\n>>> json.loads(r)\n{u'product': u'XYZ', u'form': {u'q': u'', u'city': u'Delhi', u'locality': None}, u'sweep_enabled': True, u'list': {u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}, u'cerebro': True, u'page': u'XYZ Profile'}\n>>> s = json.loads(r)\n>>> s['list']\n{u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}\n>>> \n
            \n soup wrap:

            There is a problem in your JSON, it encloses another json object in the double quotes and is causing json.loads to fail. Try doing some transformation on json string before passing to json.loads.

            As following works perfectly.

            >>> p = json.loads('''{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{\"id\":205782,\"name\":\"Robert Shriwas\",\"gender\":\"F\",\"practicing_since\":null,\"years\":21,\"specializations\":[\"Mentor\"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true}''')
            

            And you extract the requited part as

            >>> p["list"]
            {u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}
            

            Check this out I could manage to correct the json you provided.

            >>> p = '''{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":" {\"id\":205782,\"name\":\"Robert Shriwas\",\"gender\":\"F\",\"practicing_since\":null,\"years\":21,\"specializations\":[\"Mentor\"]}","form":{"q":"","city":"Delhi","locality":null},"cerebro":true}'''
            >>> q = re.sub(r'(:)\s*"\s*(\{[^\}]+\})\s*"',r'\1\2', p[1:-1])
            >>> q
            '"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{"id":205782,"name":"Robert Shriwas","gender":"F","practicing_since":null,"years":21,"specializations":["Mentor"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true'
            >>> r = p[0] + q + p[-1]
            >>> r
            '{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{"id":205782,"name":"Robert Shriwas","gender":"F","practicing_since":null,"years":21,"specializations":["Mentor"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true}'
            >>> json.loads(r)
            {u'product': u'XYZ', u'form': {u'q': u'', u'city': u'Delhi', u'locality': None}, u'sweep_enabled': True, u'list': {u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}, u'cerebro': True, u'page': u'XYZ Profile'}
            >>> s = json.loads(r)
            >>> s['list']
            {u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}
            >>> 
            
            qid & accept id: (37682284, 37683073) query: Mask a 3d array with a 2d mask in numpy soup:

            Without the loop you could write it as:

            \n
            field3d_mask[:,:,:] = field2d[np.newaxis,:,:] > 0.3\n
            \n

            For example:

            \n
            field3d_mask_1 = np.zeros(field3d.shape, dtype=bool)\nfield3d_mask_2 = np.zeros(field3d.shape, dtype=bool)\n\nfor t in range(nt):\n    field3d_mask_1[t,:,:] = field2d > 0.3\n\nfield3d_mask_2[:,:,:] = field2d[np.newaxis,:,:] > 0.3\n\nprint((field3d_mask_1 == field3d_mask_2).all())\n
            \n

            gives:

            \n
            \n

            True

            \n
            \n soup wrap:

            Without the loop you could write it as:

            field3d_mask[:,:,:] = field2d[np.newaxis,:,:] > 0.3
            

            For example:

            field3d_mask_1 = np.zeros(field3d.shape, dtype=bool)
            field3d_mask_2 = np.zeros(field3d.shape, dtype=bool)
            
            for t in range(nt):
                field3d_mask_1[t,:,:] = field2d > 0.3
            
            field3d_mask_2[:,:,:] = field2d[np.newaxis,:,:] > 0.3
            
            print((field3d_mask_1 == field3d_mask_2).all())
            

            gives:

            True

            qid & accept id: (37685718, 37686045) query: Finding specific links with Beautiful Soup soup:

            The text argument (which is now called string) would not search inside the children elements texts of an element (why? - see the last note inside this documentation paragraph, .string would be effectively None for each of the presented li elements). What I would do is to locate the b element by text, then get all the a siblings:

            \n
            b = soup.find("b", text=lambda text: text and "data I DO care about:" in text)\nlinks = [a["href"] for a in b.find_next_siblings("a", href=True)]\nprint(links)\n
            \n

            Or, you can go up the tree from b to li and then use find_all() to find all links inside li:

            \n
            b = soup.find("b", text=lambda text: text and "data I DO care about:" in text)\nli = b.find_parent("li")\nlinks = [a["href"] for a in li.find_all("a", href=True)]\nprint(links)\n
            \n

            There are, of course, other ways to locate the desired a elements.

            \n soup wrap:

            The text argument (which is now called string) would not search inside the children elements texts of an element (why? - see the last note inside this documentation paragraph, .string would be effectively None for each of the presented li elements). What I would do is to locate the b element by text, then get all the a siblings:

            b = soup.find("b", text=lambda text: text and "data I DO care about:" in text)
            links = [a["href"] for a in b.find_next_siblings("a", href=True)]
            print(links)
            

            Or, you can go up the tree from b to li and then use find_all() to find all links inside li:

            b = soup.find("b", text=lambda text: text and "data I DO care about:" in text)
            li = b.find_parent("li")
            links = [a["href"] for a in li.find_all("a", href=True)]
            print(links)
            

            There are, of course, other ways to locate the desired a elements.

            qid & accept id: (37758227, 37759538) query: Stop a command line command in script soup:

            os.system() does not return the control of the subshell spawned by it, instead it returns only the exit code when the subshell is done executing the command. This can be verified by:

            \n
            x = os.system("echo 'shankar'")\nprint(x)\n
            \n

            What you need is the subprocess library.\nYou can use the subprocess.Popen() function to start a subprocess. This function returns the control of the subprocess as an object which can be manipulated to control the subprocess.

            \n

            The subprocess module provides more powerful facilities for spawning new processes and retrieving their results.

            \n

            Run it:

            \n
            import subprocess \nproc = subprocess.Popen(['foo', 'bar', 'bar'], stdout=subprocess.PIPE, shell=True)\n
            \n

            Hereproc is the returned object which provides control over the spawned subprocess. You can retrieve information about the process or manipulate it with this object.

            \n
            proc.pid # returns the id of process\n
            \n

            Stop it:

            \n
            proc.terminate() # terminate the process.\n
            \n

            Popen.terminate() is the equivalent of sending ctrl+c (SIGTERM) to the subprocess.

            \n

            You can get the output using Popen.communicate() function.

            \n

            Get output:

            \n
            out, err = proc.communicate()\n
            \n

            Note: Popen.communicate() returns output only when the subprocess has exited successfully or has been terminated or killed.

            \n soup wrap:

            os.system() does not return the control of the subshell spawned by it, instead it returns only the exit code when the subshell is done executing the command. This can be verified by:

            x = os.system("echo 'shankar'")
            print(x)
            

            What you need is the subprocess library. You can use the subprocess.Popen() function to start a subprocess. This function returns the control of the subprocess as an object which can be manipulated to control the subprocess.

            The subprocess module provides more powerful facilities for spawning new processes and retrieving their results.

            Run it:

            import subprocess 
            proc = subprocess.Popen(['foo', 'bar', 'bar'], stdout=subprocess.PIPE, shell=True)
            

            Hereproc is the returned object which provides control over the spawned subprocess. You can retrieve information about the process or manipulate it with this object.

            proc.pid # returns the id of process
            

            Stop it:

            proc.terminate() # terminate the process.
            

            Popen.terminate() is the equivalent of sending ctrl+c (SIGTERM) to the subprocess.

            You can get the output using Popen.communicate() function.

            Get output:

            out, err = proc.communicate()
            

            Note: Popen.communicate() returns output only when the subprocess has exited successfully or has been terminated or killed.

            qid & accept id: (37760124, 37760181) query: How to input a line word by word in Python? soup:

            Read the line, split the line, copy the array result into a set. If the size of the set is less than the size of the array, the file contains repeated elements

            \n
            with open('filename', 'r') as f:\n    for line in f:\n        # Here is where you do what I said above\n
            \n

            To read the file word by word, try this

            \n
            import itertools\n\ndef readWords(file_object):\n    word = ""\n    for ch in itertools.takewhile(lambda c: bool(c), itertools.imap(file_object.read, itertools.repeat(1))):\n        if ch.isspace():\n            if word: # In case of multiple spaces\n                yield word\n                word = ""\n            continue\n        word += ch\n    if word:\n        yield word # Handles last word before EOF\n
            \n

            Then you can do:

            \n
            with open('filename', 'r') as f:\n    for num in itertools.imap(int, readWords(f)):\n        # Store the numbers in a set, and use the set to check if the number already exists\n
            \n

            This method should also work for streams because it only reads one byte at a time and outputs a single space delimited string from the input stream.

            \n
            \n

            After giving this answer, I've updated this method quite a bit. Have a look

            \n

            \n
            \n
            \n
            \n
            \n soup wrap:

            Read the line, split the line, copy the array result into a set. If the size of the set is less than the size of the array, the file contains repeated elements

            with open('filename', 'r') as f:
                for line in f:
                    # Here is where you do what I said above
            

            To read the file word by word, try this

            import itertools
            
            def readWords(file_object):
                word = ""
                for ch in itertools.takewhile(lambda c: bool(c), itertools.imap(file_object.read, itertools.repeat(1))):
                    if ch.isspace():
                        if word: # In case of multiple spaces
                            yield word
                            word = ""
                        continue
                    word += ch
                if word:
                    yield word # Handles last word before EOF
            

            Then you can do:

            with open('filename', 'r') as f:
                for num in itertools.imap(int, readWords(f)):
                    # Store the numbers in a set, and use the set to check if the number already exists
            

            This method should also work for streams because it only reads one byte at a time and outputs a single space delimited string from the input stream.


            After giving this answer, I've updated this method quite a bit. Have a look